Probing for Dutch Relative Pronoun Choice

We propose a linguistically motivated version of the relative pronoun probing task for Dutch (where a model has to predict whether a masked token is either die or dat), collect realistic data for it using a parsed corpus, and probe the performance of four context-sensitive bert-based neural language models. Whereas the original task, which simply masked all occurrences of the words die and dat, was relatively easy, the linguistically motivated task turns out to be much harder. Models differ considerably in their performance, but a monolingual model trained on a heterogeneous corpus appears to... Mehr ...

Verfasser: Bouma, Gosse
Dokumenttyp: Artikel
Erscheinungsdatum: 2021
Reihe/Periodikum: Bouma , G 2021 , ' Probing for Dutch Relative Pronoun Choice ' , Computational Linguistics in the Netherlands Journal , vol. 11 , pp. 59–70 . < https://www.clinjournal.org/clinj/article/view/121 >
Schlagwörter: PROBING ATTACHMENT LOSS / language models / Dutch
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27058989
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://hdl.handle.net/11370/85466c81-9208-4032-b6bd-cc41a5177d3f

We propose a linguistically motivated version of the relative pronoun probing task for Dutch (where a model has to predict whether a masked token is either die or dat), collect realistic data for it using a parsed corpus, and probe the performance of four context-sensitive bert-based neural language models. Whereas the original task, which simply masked all occurrences of the words die and dat, was relatively easy, the linguistically motivated task turns out to be much harder. Models differ considerably in their performance, but a monolingual model trained on a heterogeneous corpus appears to be most robust.