Domain bias in distinguishing Flemish and Dutch subtitles

Abstract This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch–Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and then investigated how the relation between training and test domains influences the recognition quality. I show that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another. Apart from the quantitative results,... Mehr ...

Verfasser: van Halteren, Hans
Dokumenttyp: Artikel
Erscheinungsdatum: 2019
Reihe/Periodikum: Natural Language Engineering ; volume 26, issue 5, page 493-510 ; ISSN 1351-3249 1469-8110
Verlag/Hrsg.: Cambridge University Press (CUP)
Schlagwörter: Artificial Intelligence / Linguistics and Language / Language and Linguistics / Software
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27080655
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : http://dx.doi.org/10.1017/s1351324919000445

Abstract This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch–Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and then investigated how the relation between training and test domains influences the recognition quality. I show that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another. Apart from the quantitative results, I also present a qualitative analysis, by investigating in detail the most distinguishing features in the various scenarios. Here too, it is with the out-of-domain recognition that some genuine differences between Flemish and Netherlandic Dutch can be found.