A supervised machine learning method to classify Dutch-language news items ...

Please contact s.a.m.vermeer@uva.nl for questions or further information. Background Based on a supervised machine learning method, we developed a classifier in Python (version 3.5.2) that returns the news topic of Dutch-language news items (as a string). To train the classifier, we collected more than 1 million news items from approximately 150 different Dutch-language news websites, as well as search engines and social media, collected over 8 months in 2017/18.This tool can be used for mapping Dutch-news items into different news categories, namely: (1) Politics, which covers items about int... Mehr ...

Verfasser: Vermeer, Susan
Dokumenttyp: dataset
Erscheinungsdatum: 2018
Verlag/Hrsg.: figshare
Schlagwörter: 80308 Programming Languages / FOS: Computer and information sciences
Sprache: unknown
Permalink: https://search.fid-benelux.de/Record/base-28983348
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://dx.doi.org/10.6084/m9.figshare.7314896

Please contact s.a.m.vermeer@uva.nl for questions or further information. Background Based on a supervised machine learning method, we developed a classifier in Python (version 3.5.2) that returns the news topic of Dutch-language news items (as a string). To train the classifier, we collected more than 1 million news items from approximately 150 different Dutch-language news websites, as well as search engines and social media, collected over 8 months in 2017/18.This tool can be used for mapping Dutch-news items into different news categories, namely: (1) Politics, which covers items about internal politics, international politics, and military and defense; (2) Business, includes economy, education, and health, welfare and social services; (3) Entertainment, covers sports, culture, fashion and human interest; and (4) Other, includes science and technology, environment, communication, weather and religion and beliefs. Performance We used three different pre-processing steps, resulting in three different .pkl ...