SimLex-999 for Dutch

Word embeddings revolutionised natural language processing by effectively representing words as dense vectors. Although many datasets exist to evaluate English embeddings, few cater to Dutch. We developed a Dutch variant of the SimLex-999 word similarity dataset by gathering similarity judgements from 235 native Dutch speakers. Subsequently, we evaluated two popular Dutch language models, Bertje and RobBERT, finding that Bertje showed superior alignment with human semantic similarity judgments compared to RobBERT. This study provides the first intrinsic Dutch word embedding evaluation dataset,... Mehr ...

Verfasser: Brans, L.
Bloem, J.
Dokumenttyp: contributionToPeriodical
Erscheinungsdatum: 2024
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29031206
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://dare.uva.nl/personal/pure/en/publications/simlex999-for-dutch(e97273d0-b887-467f-8b5d-5476f6a285bf).html