Creating a Universal Dependencies Treebank of Spoken Frisian-Dutch Code-switched Data

This paper explores the difficulties of annotating transcribed spoken Dutch-Frisian codeswitch utterances into Universal Dependencies. We make use of data from the FAME! corpus, which consists of transcriptions and audio data. Besides the usual annotation difficulties, this dataset is extra challenging because of Frisian being low-resource, the informal nature of the data, code-switching and non-standard sentence segmentation. As a starting point, two annotators annotated 150 random utterances in three stages of 50 utterances. After each stage, disagreements where discussed and resolved. An in... Mehr ...

Verfasser:	Braggaar, Anouck van der Goot, Rob
Dokumenttyp:	conferenceObject
Erscheinungsdatum:	2021
Schlagwörter:	annotating transcribed speech / Dutch-Frisian codeswitching / Universal Dependencies / low-resource languages / informal data challenges
Sprache:	Englisch
Permalink:	https://search.fid-benelux.de/Record/base-28585217
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	https://pure.itu.dk/portal/da/publications/61dcfef2-7d02-4ef0-b490-550df7b09123