Dutch compound splitting for bilingual terminology extraction

Compounds pose a problem for applications that rely on precise word alignments such as bilingual terminology extraction. We therefore developed a state-of-the-art hybrid compound splitter for Dutch that makes use of corpus frequency information and linguistic knowledge. Domain-adaptation techniques are used to combine large out-of-domain and dynamically compiled in-domain frequency lists. We perform an extensive intrinsic evaluation on a Gold Standard set of 50,000 Dutch compounds and a set of 5,000 Dutch compounds belonging to the automotive domain. We also propose a novel methodology for wor... Mehr ...

Verfasser: Macken, Lieve
Tezcan, Arda
Dokumenttyp: bookChapter
Erscheinungsdatum: 2018
Verlag/Hrsg.: John Benjamins
Schlagwörter: Languages and Literatures / word alignment / translation / Compound splitting / Dutch / multi-word units / bilingual terminology extraction / LT3
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-26675492
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://biblio.ugent.be/publication/7126122