An Ensemble Approach for Dutch Cross-Domain Hate Speech Detection

Verfasser:	Markov, Ilia Gevers, Ine Daelemans, Walter
Dokumenttyp:	contributionToPeriodical
Erscheinungsdatum:	2022
Verlag/Hrsg.:	Springer Science and Business Media Deutschland GmbH
Schlagwörter:	Hate speech / Dutch / Cross-domain / Ensemble / /dk/atira/pure/sustainabledevelopmentgoals/peace_justice_and_strong_institutions / name=SDG 16 - Peace / Justice and Strong Institutions
Sprache:	Englisch
Permalink:	https://search.fid-benelux.de/Record/base-26687160
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	https://research.vu.nl/en/publications/bdb80697-2249-4289-8466-c8522f440f97

Over the past years, the amount of online hate speech has been growing steadily. Among multiple approaches to automatically detect hateful content online, ensemble learning is considered one of the best strategies, as shown by several studies on English and other languages. In this paper, we evaluate state-of-the-art approaches for Dutch hate speech detection both under in-domain and cross-domain hate speech detection conditions, and introduce a new ensemble approach with additional features for detecting hateful content in Dutch social media. The ensemble consists of the gradient boosting classifier that incorporates state-of-the-art transformer-based pre-trained language models for Dutch (i.e., BERTje and RobBERT), a robust SVM approach, and additional input information such as the number of emotion-conveying and hateful words, the number of personal pronouns, and the length of the message. The ensemble significantly outperforms all the individual models both in the in-domain and cross-domain hate speech detection settings. We perform an in-depth error analysis focusing on the explicit and implicit hate speech instances, providing various insights into open challenges in Dutch hate speech detection and directions for future research.