Identifying cognates in English-Dutch and French-Dutch by means of orthographic information and cross-lingual word embeddings

This paper investigates the validity of combining more traditional orthographic information with cross-lingual word embeddings to identify cognate pairs in English-Dutch and French-Dutch. In a first step, lists of potential cognate pairs in English-Dutch and French-Dutch are manually labelled. The resulting gold standard is used to train and evaluate a multi-layer perceptron that can distinguish cognates from non-cognates. Fifteen orthographic features capture string similarities between source and target words, while the cosine similarity between their word embeddings represents the semantic... Mehr ...

Verfasser: Lefever, Els
Labat, Sofie
Singh, Pranaydeep
Dokumenttyp: conference
Erscheinungsdatum: 2020
Verlag/Hrsg.: European Language Resources Association (ELRA)
Schlagwörter: Languages and Literatures / LT3 / cognate detection / multi-layer perceptron / orthographic similarity / cross-lingual word embeddings
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29033471
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://biblio.ugent.be/publication/8662200