Machine learning for modeling Dutch pronunciation variation

This paper describes the use of rule induction techniques for the automatic extraction of phonemic knowledge and rules from pairs of pronunciation lexicons. This extracted knowledge allows the adaptation of speech processing systems to regional variants of a language. As a case study, we apply the approach to Northern Dutch and Flemish (the variant of Dutch spoken in Flanders, a part of Belgium) , based on Celex and Fonilex, pronunciation lexicons for Northern Dutch and Flemish, respectively. In our study, we compare two rule induction techniques, TransformationBased Error-Driven Learning (TBE... Mehr ...

Verfasser: Hoste, Veronique
Gillis, Steven
Daelemans, Walter
Dokumenttyp: conference
Erscheinungsdatum: 2000
Verlag/Hrsg.: Utrecht Institute of Linguistics OTS
Schlagwörter: Languages and Literatures
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29449325
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://biblio.ugent.be/publication/597848

This paper describes the use of rule induction techniques for the automatic extraction of phonemic knowledge and rules from pairs of pronunciation lexicons. This extracted knowledge allows the adaptation of speech processing systems to regional variants of a language. As a case study, we apply the approach to Northern Dutch and Flemish (the variant of Dutch spoken in Flanders, a part of Belgium) , based on Celex and Fonilex, pronunciation lexicons for Northern Dutch and Flemish, respectively. In our study, we compare two rule induction techniques, TransformationBased Error-Driven Learning (TBEDL) (Brill, 1995) and C5.0 (Quinlan, 1993), and evaluate the extracted knowledge quantitatively (accuracy) and qualitatively (linguistic relevance of the rules). We conclude that, whereas classication-based rule induction with C5.0 is more accurate, the transformation rules learned with TBEDL can be more easily interpreted.