Dutch Historical Spelling Normalization for Parsing and Coreference Resolution
Non-canonical language can be handled in an NLP pipeline using normalization of the input (e.g., MoNoise; van der Goot & van Noord, CLINjournal 2017) or domain adaptation of the pipeline (e.g., Hupkes & Bod, LREC 2016); we focus on the former. MoNoise shows that normalization is effective for social media language. We consider a different domain: Dutch literature from Project Gutenberg. We work with 9 fragments that make up the OpenBoek corpus (van den Berg et al., CLIN 2021). The fragments consist of 10,000+ tokens from texts first published 1860-1920, both translated and originally D... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | conferenceObject |
Erscheinungsdatum: | 2022 |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-27057986 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | http://hdl.handle.net/11370/0b2d486f-fcf7-4fa3-b7ce-c8cbb2103d16 |