An efficient memory-based morphosyntactic tagger and parser for Dutch

We describe TADPOLE, a modular memory-based morphosyntactic tagger and dependency parser for Dutch. Though primarily aimed at being accurate, the design of the system is also driven by optimizing speed and memory usage, using a trie-based approximation of k-nearest neighbor classification as the basis of each module. We perform an evaluation of its three main modules: a part-of-speech tagger, a morphological analyzer, and a dependency parser, trained on manually annotated material available for Dutch – the parser is additionally trained on automatically parsed data. A global analysis of the sy... Mehr ...

Verfasser: Bosch, Antal van den
Busser, Bertjan
Canisius, Sander
Daelemans, Walter
Dokumenttyp: Part of book or chapter of book
Erscheinungsdatum: 2007
Schlagwörter: Taalwetenschap
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27068104
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://dspace.library.uu.nl/handle/1874/296756

We describe TADPOLE, a modular memory-based morphosyntactic tagger and dependency parser for Dutch. Though primarily aimed at being accurate, the design of the system is also driven by optimizing speed and memory usage, using a trie-based approximation of k-nearest neighbor classification as the basis of each module. We perform an evaluation of its three main modules: a part-of-speech tagger, a morphological analyzer, and a dependency parser, trained on manually annotated material available for Dutch – the parser is additionally trained on automatically parsed data. A global analysis of the system shows that it is able to process text in linear time close to an estimated 2,500 words per second, while maintaining sufficient accuracy.