ASRLUX: AUTOMATIC SPEECH RECOGNITION FOR THE LOW-RESOURCE LANGUAGE LUXEMBOURGISH

peer reviewed ; We have developed an automatic speech recognition (ASR) system tailored to Luxembourgish, a low-resource language that poses distinct challenges for conventional ASR approaches due to the limited availability of training data and inherent multilingual nature. By employing transfer learning, we meticulously fine-tuned an array of models derived from pre-trained wav2vec 2.0 and Whisper checkpoints. These models have been trained on an extensive corpus of various languages and several hundred thousand hours of audio data, utilizing unsupervised and weak supervised methodologies, r... Mehr ...

Verfasser: Gilles, Peter
Hillah, Léopold Edem Ayité
Hosseini Kivanani, Nina
Dokumenttyp: conference paper
Erscheinungsdatum: 2023
Verlag/Hrsg.: Guarant International
Schlagwörter: Luxembourgish / automatic speech recognition (ASR) / low-resource language / Engineering / computing & technology / Computer science / Ingénierie / informatique & technologie / Sciences informatiques
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27522459
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://orbilu.uni.lu/handle/10993/55819

peer reviewed ; We have developed an automatic speech recognition (ASR) system tailored to Luxembourgish, a low-resource language that poses distinct challenges for conventional ASR approaches due to the limited availability of training data and inherent multilingual nature. By employing transfer learning, we meticulously fine-tuned an array of models derived from pre-trained wav2vec 2.0 and Whisper checkpoints. These models have been trained on an extensive corpus of various languages and several hundred thousand hours of audio data, utilizing unsupervised and weak supervised methodologies, respectively. This includes linguistically related languages such as German, Dutch, and French, which expedite the cross-lingual training process for Luxembourgish-specific models. Fine-tuning was executed utilizing 67 hours of annotated Luxembourgish speech data sourced from a diverse range of speakers. The optimal word error rate (WER) achieved for wav2vec 2.0 and Whisper models were 9.5 and 12.1, respectively. The remarkably low WERs obtained serve to substantiate the efficacy of transfer learning in the context of ASR for low-resource languages.