BLiMP-NL: A corpus of Dutch minimal pairs and acceptability judgements for language model evaluation

We present a corpus of 8400 Dutch sentence pairs, intended for the grammatical evaluation of language models. Each pair consists of a grammatical sentence and a minimally different ungrammatical sentence. The corpus covers 84 paradigms, classified into 22 syntactic phenomena. Nine sentences of each paradigm are rated for acceptability by at least 30 participants each, and for the same 9 sentences reading times are recorded per word, through self-paced reading. Ten of the sentence-pairs were created by hand, while the remaining ninety were created semi-automatically with the help of ChatGPT. He... Mehr ...

Verfasser: Suijkerbuijk, Michelle
Prins, Zoë
de Heer Kloots, Marianne
Zuidema, Jelle
Frank, Stefan L.
Dokumenttyp: posted-content
Erscheinungsdatum: 2024
Verlag/Hrsg.: Center for Open Science
Sprache: unknown
Permalink: https://search.fid-benelux.de/Record/base-28643583
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : http://dx.doi.org/10.31234/osf.io/mhjbx

We present a corpus of 8400 Dutch sentence pairs, intended for the grammatical evaluation of language models. Each pair consists of a grammatical sentence and a minimally different ungrammatical sentence. The corpus covers 84 paradigms, classified into 22 syntactic phenomena. Nine sentences of each paradigm are rated for acceptability by at least 30 participants each, and for the same 9 sentences reading times are recorded per word, through self-paced reading. Ten of the sentence-pairs were created by hand, while the remaining ninety were created semi-automatically with the help of ChatGPT. Here, we report on the construction of the dataset, the measured acceptability ratings and reading times, as well as the extent to which a variety of language models can be used to predict both the ground-truth grammaticality and human acceptability ratings.