The CLIN27 Shared Task : Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation

The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human transla... Mehr ...

Verfasser: Tjong Kim Sang, Erik
Bollmann, Marcel
Boschker, Remko
Casacuberta, Francisco
Dietz, Feike
Dipper, Stefanie
Domingo, Miguel
van der Goot, Robe
van Koppen, Marjo
Ljubešić, Nikola
Östling, Robert
Petran, Florian
Pettersson, Eva
Scherrer, Yves
Schraagen, Marijn
Sevens, Leen
Tiedemann, Jörg
Vanallemeersch, Tom
Zervanou, Kalliopi
Dokumenttyp: article in journal
Erscheinungsdatum: 2017
Verlag/Hrsg.: Stockholms universitet
Avdelningen för datorlingvistik
Schlagwörter: historical text / text normalization / neural networks / machine translation / dutch language / Language Technology (Computational Linguistics) / Språkteknologi (språkvetenskaplig databehandling)
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29435832
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-148207

The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human translation (57%).