Soft features for statistical machine translation of spoken and signed languages

The goal of statistical machine translation is a transfer of unknown sentences from a source language into a target language. For this purpose, automatic rules are derived from bilingual data collections. Through a probabilistic principle, many alternative sentences are generated which are evaluated by several feature functions. The alternative with the highest probability is selected as the actual translation. In this dissertation, the influence of several, mostly linguistically motivated, feature functions on the translation quality of statistical machine translation is evaluated. With these... Mehr ...

Verfasser: Stein, Daniel
Dokumenttyp: doctoralThesis
Erscheinungsdatum: 2012
Verlag/Hrsg.: Publikationsserver der RWTH Aachen University
Schlagwörter: info:eu-repo/classification/ddc/004 / Maschinelle Übersetzung / Nederlandse Gebarentaal / Deutsche Gebärdensprache / Automatische Sprachanalyse / Informatik / statistical machine translation / German sign language / automatic language analysis
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29123056
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://publications.rwth-aachen.de/record/64579

The goal of statistical machine translation is a transfer of unknown sentences from a source language into a target language. For this purpose, automatic rules are derived from bilingual data collections. Through a probabilistic principle, many alternative sentences are generated which are evaluated by several feature functions. The alternative with the highest probability is selected as the actual translation. In this dissertation, the influence of several, mostly linguistically motivated, feature functions on the translation quality of statistical machine translation is evaluated. With these functions, no alternative will be rendered void, in order to preserve the variability of the translation process. Several language pairs like Chinese-English and German-French will be analyzed. This dissertation also deals with sign languages as a special case of statistical machine translation. Sign languages introduce, due to their distinct modality, several challenges into the overall architecture. Existing data collections are evaluated, and the RWTH-Phoenix corpus and the Corpus NGT are introduced. Because of their relatively small size, an adaption of conventional approaches is useful. For example, the usage of cross-valdiation, which is more uncommon in machine translation, a significant improvement of the translation quality can be acheived. With morpho-syntactic pre- and post-processing, the translation fluency improves and the compound words can be worked in more smoothly. We compare two translation paradigms, and employ system combination for an overall architecture, as well.