Soft features for statistical machine translation of spoken and signed languages
The goal of statistical machine translation is a transfer of unknown sentences from a source language into a target language. For this purpose, automatic rules are derived from bilingual data collections. Through a probabilistic principle, many alternative sentences are generated which are evaluated by several feature functions. The alternative with the highest probability is selected as the actual translation. In this dissertation, the influence of several, mostly linguistically motivated, feature functions on the translation quality of statistical machine translation is evaluated. With these... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | doctoralThesis |
Erscheinungsdatum: | 2012 |
Verlag/Hrsg.: |
Publikationsserver der RWTH Aachen University
|
Schlagwörter: | info:eu-repo/classification/ddc/004 / Maschinelle Übersetzung / Nederlandse Gebarentaal / Deutsche Gebärdensprache / Automatische Sprachanalyse / Informatik / statistical machine translation / German sign language / automatic language analysis |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-29123056 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://publications.rwth-aachen.de/record/64579 |
The goal of statistical machine translation is a transfer of unknown sentences from a source language into a target language. For this purpose, automatic rules are derived from bilingual data collections. Through a probabilistic principle, many alternative sentences are generated which are evaluated by several feature functions. The alternative with the highest probability is selected as the actual translation. In this dissertation, the influence of several, mostly linguistically motivated, feature functions on the translation quality of statistical machine translation is evaluated. With these functions, no alternative will be rendered void, in order to preserve the variability of the translation process. Several language pairs like Chinese-English and German-French will be analyzed. This dissertation also deals with sign languages as a special case of statistical machine translation. Sign languages introduce, due to their distinct modality, several challenges into the overall architecture. Existing data collections are evaluated, and the RWTH-Phoenix corpus and the Corpus NGT are introduced. Because of their relatively small size, an adaption of conventional approaches is useful. For example, the usage of cross-valdiation, which is more uncommon in machine translation, a significant improvement of the translation quality can be acheived. With morpho-syntactic pre- and post-processing, the translation fluency improves and the compound words can be worked in more smoothly. We compare two translation paradigms, and employ system combination for an overall architecture, as well.