Improving Luxembourgish Speech Recognition with Cross-Lingual Speech Representations
Luxembourgish is a West Germanic language spoken by roughly 390,000 people, mainly in Luxembourg. It is one of Europe's under-described and under-resourced languages, not extensively investigated in the context of speech recognition. We explore the self-supervised multilingual learning of Luxembourgish speech representations for the speech recognition downstream task. We show that learning cross-lingual representations is essential for low-resourced languages such as Luxembourgish. Learning cross-lingual representations and rescoring the output transcriptions with language modelling while usin... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | contributionToPeriodical |
Erscheinungsdatum: | 2023 |
Verlag/Hrsg.: |
Institute of Electrical and Electronics Engineers Inc.
|
Schlagwörter: | language modelling / Luxembourgish / multilingual speech recognition / under-resourced language / wav2vec 2.0 XLSR-53 |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-29106962 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://hdl.handle.net/11370/1c12763a-5a68-4811-ae82-a3808714d457 |
Luxembourgish is a West Germanic language spoken by roughly 390,000 people, mainly in Luxembourg. It is one of Europe's under-described and under-resourced languages, not extensively investigated in the context of speech recognition. We explore the self-supervised multilingual learning of Luxembourgish speech representations for the speech recognition downstream task. We show that learning cross-lingual representations is essential for low-resourced languages such as Luxembourgish. Learning cross-lingual representations and rescoring the output transcriptions with language modelling while using only 4 hours of labelled speech achieves a word error rate of 15.1% and improves our Transfer Learning baseline model relatively by 33.1% and absolutely by 7.5%. Increasing the amount of labelled speech to 14 hours yields a significant performance gain resulting in a 9.3% word error rate.11Models and datasets are available at https://hugging£ace.co/lemswasabi