Domain bias in distinguishing Flemish and Dutch subtitles
Abstract This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch–Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and then investigated how the relation between training and test domains influences the recognition quality. I show that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another. Apart from the quantitative results,... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | Artikel |
Erscheinungsdatum: | 2019 |
Reihe/Periodikum: | Natural Language Engineering ; volume 26, issue 5, page 493-510 ; ISSN 1351-3249 1469-8110 |
Verlag/Hrsg.: |
Cambridge University Press (CUP)
|
Schlagwörter: | Artificial Intelligence / Linguistics and Language / Language and Linguistics / Software |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-27080655 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | http://dx.doi.org/10.1017/s1351324919000445 |
Abstract This paper describes experiments in which I tried to distinguish between Flemish and Netherlandic Dutch subtitles, as originally proposed in the VarDial 2018 Dutch–Flemish Subtitle task. However, rather than using all data as a monolithic block, I divided them into two non-overlapping domains and then investigated how the relation between training and test domains influences the recognition quality. I show that the best estimate of the level of recognizability of the language varieties is derived when training on one domain and testing on another. Apart from the quantitative results, I also present a qualitative analysis, by investigating in detail the most distinguishing features in the various scenarios. Here too, it is with the out-of-domain recognition that some genuine differences between Flemish and Netherlandic Dutch can be found.