Dutch compound splitting for bilingual terminology extraction
Compounds pose a problem for applications that rely on precise word alignments such as bilingual terminology extraction. We therefore developed a state-of-the-art hybrid compound splitter for Dutch that makes use of corpus frequency information and linguistic knowledge. Domain-adaptation techniques are used to combine large out-of-domain and dynamically compiled in-domain frequency lists. We perform an extensive intrinsic evaluation on a Gold Standard set of 50,000 Dutch compounds and a set of 5,000 Dutch compounds belonging to the automotive domain. We also propose a novel methodology for wor... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | bookChapter |
Erscheinungsdatum: | 2018 |
Verlag/Hrsg.: |
John Benjamins
|
Schlagwörter: | Languages and Literatures / word alignment / translation / Compound splitting / Dutch / multi-word units / bilingual terminology extraction / LT3 |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-26675492 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://biblio.ugent.be/publication/7126122 |