Bag & Tag'em - A new Dutch stemmer
We propose a novel stemming algorithm that is both robust and accurate compared to state-of-the-art solutions, yet addresses several of the problems that current stemmers face in the Dutch language. The main issue is that most current stemmers cannot handle 3 rd person singular forms of verbs and many irregular words and conjugations, unless a (nearly) brute-force approach is used. Our algorithm combines a new tagging module with a stemmer that uses tag-specific sets of rigid rules: the Bag & Tag'em (BT) algorithm. The tagging module is developed and evaluated using three algorithms: Multi... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | contributionToPeriodical |
Erscheinungsdatum: | 2020 |
Verlag/Hrsg.: |
European Language Resources Association (ELRA)
|
Schlagwörter: | Dutch / PoS tagging / Stemming |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-26686561 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://research.vu.nl/en/publications/341e40c7-466a-473c-8436-96cd71ae7eb3 |