Dutch Dependency Parser Performance Across Domains

In the past decade several natural language parsing systems have emerged, which use different methods and formalisms. For instance, systems that employ a hand-crafted grammar with a statistical disambiguation component versus purely statistical data-driven systems. What they have in common is the lack of portability to new domains: their performance might decrease substantially as the distance between test and training domain increases. Yet, to which degree do they suffer from this problem, i.e. which kind of parsing system is more affected by domain shifts? To address this question, we evalua... Mehr ...

Verfasser: Plank, Barbara
Noord, Gertjan van
Dokumenttyp: Part of book or chapter of book
Erscheinungsdatum: 2010
Schlagwörter: Taalwetenschap
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-26680278
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://dspace.library.uu.nl/handle/1874/297155

In the past decade several natural language parsing systems have emerged, which use different methods and formalisms. For instance, systems that employ a hand-crafted grammar with a statistical disambiguation component versus purely statistical data-driven systems. What they have in common is the lack of portability to new domains: their performance might decrease substantially as the distance between test and training domain increases. Yet, to which degree do they suffer from this problem, i.e. which kind of parsing system is more affected by domain shifts? To address this question, we evaluate the performance variation of two kinds of dependency parsing systems for Dutch (grammar-driven versus data-driven) across several domains. We examine (1) how parser performance correlates to simple statistical properties of the text and (2) how sensitive a given system is to the text domain. This will give us an estimate of which kind of system is more affected by domain shifts, and thus more in need for domain adaptation techniques. To this end, we extend the statistical measures used by Zhang andWang (2009a) for English and propose a new simple measure to quantify domain sensitivity.