Dutch Historical Spelling Normalization for Parsing and Coreference Resolution

Non-canonical language can be handled in an NLP pipeline using normalization of the input (e.g., MoNoise; van der Goot & van Noord, CLINjournal 2017) or domain adaptation of the pipeline (e.g., Hupkes & Bod, LREC 2016); we focus on the former. MoNoise shows that normalization is effective for social media language. We consider a different domain: Dutch literature from Project Gutenberg. We work with 9 fragments that make up the OpenBoek corpus (van den Berg et al., CLIN 2021). The fragments consist of 10,000+ tokens from texts first published 1860-1920, both translated and originally D... Mehr ...

Verfasser:	Postma, Priscilla Donker, Rina Stam, Ruth Roorda, Athalia van Cranenburgh, Andreas van Noord, Gertjan
Dokumenttyp:	conferenceObject
Erscheinungsdatum:	2022
Sprache:	Englisch
Permalink:	https://search.fid-benelux.de/Record/base-27057986
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	http://hdl.handle.net/11370/0b2d486f-fcf7-4fa3-b7ce-c8cbb2103d16

Suche in Bibliothekskatalogen:

	Prüfen Sie die Verfügbarkeit in Ihrer Heimatbibliothek
	Suche deutschlandweit und international (KVK – Karlsruher Virtueller Katalog)
	Suche weltweit im Worldcatworldwide_worldcat

Suche via Google:

Suche via Google

Suche in Google Scholar

Suche in Google Books