Dutch Historical Spelling Normalization for Parsing and Coreference Resolution

Non-canonical language can be handled in an NLP pipeline using normalization of the input (e.g., MoNoise; van der Goot & van Noord, CLINjournal 2017) or domain adaptation of the pipeline (e.g., Hupkes & Bod, LREC 2016); we focus on the former. MoNoise shows that normalization is effective for social media language. We consider a different domain: Dutch literature from Project Gutenberg. We work with 9 fragments that make up the OpenBoek corpus (van den Berg et al., CLIN 2021). The fragments consist of 10,000+ tokens from texts first published 1860-1920, both translated and originally D... Mehr ...

Verfasser: Postma, Priscilla
Donker, Rina
Stam, Ruth
Roorda, Athalia
van Cranenburgh, Andreas
van Noord, Gertjan
Dokumenttyp: conferenceObject
Erscheinungsdatum: 2022
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27057986
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : http://hdl.handle.net/11370/0b2d486f-fcf7-4fa3-b7ce-c8cbb2103d16