DALC:the Dutch Abusive Language Corpus

As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explic... Mehr ...

Verfasser: Caselli, Tommaso
Schelhaas, Arjan
Weultjes, Marieke
Leistra, Folkert
van der Veen, Hylke
Timmerman, Gerben
Nissim, Malvina
Dokumenttyp: contributionToPeriodical
Erscheinungsdatum: 2021
Verlag/Hrsg.: Association for Computational Linguistics (ACL)
Schlagwörter: language models / hate speech / offensive language
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27059415
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : http://hdl.handle.net/11370/bfbfefcd-5bd5-4292-8345-6e9a190e0bd9

As socially unacceptable language become pervasive in social media platforms, the need for automatic content moderation become more pressing. This contribution introduces the Dutch Abusive Language Corpus (DALC v1.0), a new dataset with tweets manually an- notated for abusive language. The resource ad- dress a gap in language resources for Dutch and adopts a multi-layer annotation scheme modeling the explicitness and the target of the abusive messages. Baselines experiments on all annotation layers have been conducted, achieving a macro F1 score of 0.748 for binary classification of the explicitness layer and .489 for target classification.