Identifying cognates in English-Dutch and French-Dutch by means of orthographic information and cross-lingual word embeddings
This paper investigates the validity of combining more traditional orthographic information with cross-lingual word embeddings to identify cognate pairs in English-Dutch and French-Dutch. In a first step, lists of potential cognate pairs in English-Dutch and French-Dutch are manually labelled. The resulting gold standard is used to train and evaluate a multi-layer perceptron that can distinguish cognates from non-cognates. Fifteen orthographic features capture string similarities between source and target words, while the cosine similarity between their word embeddings represents the semantic... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | conference |
Erscheinungsdatum: | 2020 |
Verlag/Hrsg.: |
European Language Resources Association (ELRA)
|
Schlagwörter: | Languages and Literatures / LT3 / cognate detection / multi-layer perceptron / orthographic similarity / cross-lingual word embeddings |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-29033471 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://biblio.ugent.be/publication/8662200 |