TwNC: a Multifaceted Dutch News Corpus

This contribution describes the Twente News Corpus (TwNC), a multifaceted corpus for Dutch that is being deployed in a number of NLP research projects among which tracks within the Dutch national research programme MultimediaN, the NWO programme CATCH, and the Dutch-Flemish programme STEVIN. The development of the corpus started in 1998 within a predecessor project DRUID and has currently a size of 530M words. The text part has been built from texts of four different sources: Dutch national newspapers, television subtitles, teleprompter (auto-cues) files, and both manually and automatically ge... Mehr ...

Verfasser: Ordelman, Roeland
Jong, Franciska de
Hessen, Arjan van
Hondorp, Hendri
Dokumenttyp: article / Letter to editor
Erscheinungsdatum: 2007
Verlag/Hrsg.: ELRA
Sprache: unknown
Permalink: https://search.fid-benelux.de/Record/base-27066351
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : http://purl.utwente.nl/publications/68090