Natiolectal variation in Dutch morphosyntax:A large-scale, data-driven perspective

In this article, we report a large-scale corpus study aimed at tackling the (controversial) question to what extent the European national varieties of Dutch, that is, Belgian and Netherlandic Dutch, exhibit morpho-syntactic differences. Instead of relying on a manual selection of cases of morphosyntactic variation, we first marshal large bilingual parallel corpora and machine translation software to identify semiautomatically, in an extensively data-driven fashion, loci of variation from various “corners” of Dutch grammar. We then gauge the distribution of con-structional alternatives in a nat... Mehr ...

Verfasser: De Troij, Robbert
Grondelaers, Stef
Speelman, Dirk
Dokumenttyp: Artikel
Erscheinungsdatum: 2023
Reihe/Periodikum: De Troij , R , Grondelaers , S & Speelman , D 2023 , ' Natiolectal variation in Dutch morphosyntax : A large-scale, data-driven perspective ' , Journal of Germanic Linguistics , vol. 35 , no. 1 , pp. 1-68 . https://doi.org/10.1017/S1470542722000071
Schlagwörter: computational linguistics / corpus linguistics / Dutch / grammatical variation / natiolectal variation / parallel corpus
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-28586889
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://pure.knaw.nl/portal/en/publications/7b10a209-af93-4537-8ddd-9dede858cc51

In this article, we report a large-scale corpus study aimed at tackling the (controversial) question to what extent the European national varieties of Dutch, that is, Belgian and Netherlandic Dutch, exhibit morpho-syntactic differences. Instead of relying on a manual selection of cases of morphosyntactic variation, we first marshal large bilingual parallel corpora and machine translation software to identify semiautomatically, in an extensively data-driven fashion, loci of variation from various “corners” of Dutch grammar. We then gauge the distribution of con-structional alternatives in a nationally as well as stylistically stratified corpus for a representative selection of twenty alternation patterns. We find that natiolectal variation in the grammar of Dutch is far more prevalent than often assumed, especially in less edited text types, and that it shows up in inflection phenomena, lexically conditioned syntactic variation, and pure word order permutations. Another key finding is that many cases of synchronic probabilistic asymmetries reflect a diachronic difference between the two varieties: Netherlandic Dutch often tends to be ahead in cases of ongoing grammatical change, with Belgian Dutch holding on somewhat longer to obsolescent features of the grammar.