Word level language identification in online multilingual communication
Multilingual speakers switch between languages in online and spoken communication. Analyses of large scale multilingual data re- quire automatic language identification at the word level. For our experiments with multilingual online discussions, we first tag the language of individual words using language models and dictionaries. Secondly, we incorporate context to improve the performance. We achieve an accuracy of 98%. Besides word level accuracy, we use two new metrics to evaluate this task.
Verfasser: | |
---|---|
Dokumenttyp: | conference |
Erscheinungsdatum: | 2013 |
Verlag/Hrsg.: |
Association for Computational Linguistics (ACL)
|
Schlagwörter: | Languages and Literatures / multilingual / Turkish / Dutch / code-switching / automatic language identification / social media / Lt3 |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-29033499 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://biblio.ugent.be/publication/8694791 |