Longitudinal Speaker Clustering and Verification Corpus with Code-switching Frisian-Dutch Speech

In this paper, we present a new longitudinal and bilingual broadcast database designed for speaker clustering and text- independent verification research. The broadcast data is ex- tracted from the archives of Omrop Fryslaˆn which is the re- gional broadcaster in the province of Fryslaˆn, located in the north of the Netherlands. Two speaker verification tasks are provided in a standard enrollment-test setting with language consistent trials. The first task contains target trials from all speakers available appearing in at least two different programs, while the second task contains target tria... Mehr ...

Verfasser: Yilmaz, E.
Dijkstra, J.E.
Van de Velde, Hans
Kampstra, F.
Algra, J.
van den Heuvel, H.
van Leeuwen, D.
Dokumenttyp: conferenceObject
Erscheinungsdatum: 2017
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27026576
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://pure.knaw.nl/portal/en/publications/a5039563-bc60-4d72-98b2-689a561369c1

In this paper, we present a new longitudinal and bilingual broadcast database designed for speaker clustering and text- independent verification research. The broadcast data is ex- tracted from the archives of Omrop Fryslaˆn which is the re- gional broadcaster in the province of Fryslaˆn, located in the north of the Netherlands. Two speaker verification tasks are provided in a standard enrollment-test setting with language consistent trials. The first task contains target trials from all speakers available appearing in at least two different programs, while the second task contains target trials from a subgroup of speakers appearing in programs recorded in multiple years. The second task is designed to investigate the effects of ageing on the accuracy of speaker verification systems. This database also contains unlabeled spoken segments from different radio pro- grams for speaker clustering research. We provide the output of an existing speaker diarization system for baseline verification experiments. Finally, we present the baseline speaker verifi- cation results using the Kaldi GMM- and DNN-UBM speaker verification system. This database will be an extension to the recently presented open source Frisian data collection and it is publicly available for research purposes.