Bag & Tag'em - A new Dutch stemmer

We propose a novel stemming algorithm that is both robust and accurate compared to state-of-the-art solutions, yet addresses several of the problems that current stemmers face in the Dutch language. The main issue is that most current stemmers cannot handle 3 rd person singular forms of verbs and many irregular words and conjugations, unless a (nearly) brute-force approach is used. Our algorithm combines a new tagging module with a stemmer that uses tag-specific sets of rigid rules: the Bag & Tag'em (BT) algorithm. The tagging module is developed and evaluated using three algorithms: Multi... Mehr ...

Verfasser: Jonker, Anne
de Ruijt, Corné
de Gruijl, Jornt R.
Dokumenttyp: contributionToPeriodical
Erscheinungsdatum: 2020
Verlag/Hrsg.: European Language Resources Association (ELRA)
Schlagwörter: Dutch / PoS tagging / Stemming
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-26686561
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://research.vu.nl/en/publications/341e40c7-466a-473c-8436-96cd71ae7eb3