An Ensemble Approach for Dutch Cross-Domain Hate Speech Detection
Over the past years, the amount of online hate speech has been growing steadily. Among multiple approaches to automatically detect hateful content online, ensemble learning is considered one of the best strategies, as shown by several studies on English and other languages. In this paper, we evaluate state-of-the-art approaches for Dutch hate speech detection both under in-domain and cross-domain hate speech detection conditions, and introduce a new ensemble approach with additional features for detecting hateful content in Dutch social media. The ensemble consists of the gradient boosting cla... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | contributionToPeriodical |
Erscheinungsdatum: | 2022 |
Verlag/Hrsg.: |
Springer Science and Business Media Deutschland GmbH
|
Schlagwörter: | Hate speech / Dutch / Cross-domain / Ensemble / /dk/atira/pure/sustainabledevelopmentgoals/peace_justice_and_strong_institutions / name=SDG 16 - Peace / Justice and Strong Institutions |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-26687160 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://research.vu.nl/en/publications/bdb80697-2249-4289-8466-c8522f440f97 |
Over the past years, the amount of online hate speech has been growing steadily. Among multiple approaches to automatically detect hateful content online, ensemble learning is considered one of the best strategies, as shown by several studies on English and other languages. In this paper, we evaluate state-of-the-art approaches for Dutch hate speech detection both under in-domain and cross-domain hate speech detection conditions, and introduce a new ensemble approach with additional features for detecting hateful content in Dutch social media. The ensemble consists of the gradient boosting classifier that incorporates state-of-the-art transformer-based pre-trained language models for Dutch (i.e., BERTje and RobBERT), a robust SVM approach, and additional input information such as the number of emotion-conveying and hateful words, the number of personal pronouns, and the length of the message. The ensemble significantly outperforms all the individual models both in the in-domain and cross-domain hate speech detection settings. We perform an in-depth error analysis focusing on the explicit and implicit hate speech instances, providing various insights into open challenges in Dutch hate speech detection and directions for future research.