A supervised machine learning method to classify Dutch-language news items ...
Please contact s.a.m.vermeer@uva.nl for questions or further information. Background Based on a supervised machine learning method, we developed a classifier in Python (version 3.5.2) that returns the news topic of Dutch-language news items (as a string). To train the classifier, we collected more than 1 million news items from approximately 150 different Dutch-language news websites, as well as search engines and social media, collected over 8 months in 2017/18.This tool can be used for mapping Dutch-news items into different news categories, namely: (1) Politics, which covers items about int... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | dataset |
Erscheinungsdatum: | 2018 |
Verlag/Hrsg.: |
figshare
|
Schlagwörter: | 80308 Programming Languages / FOS: Computer and information sciences |
Sprache: | unknown |
Permalink: | https://search.fid-benelux.de/Record/base-28983348 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://dx.doi.org/10.6084/m9.figshare.7314896 |
Please contact s.a.m.vermeer@uva.nl for questions or further information. Background Based on a supervised machine learning method, we developed a classifier in Python (version 3.5.2) that returns the news topic of Dutch-language news items (as a string). To train the classifier, we collected more than 1 million news items from approximately 150 different Dutch-language news websites, as well as search engines and social media, collected over 8 months in 2017/18.This tool can be used for mapping Dutch-news items into different news categories, namely: (1) Politics, which covers items about internal politics, international politics, and military and defense; (2) Business, includes economy, education, and health, welfare and social services; (3) Entertainment, covers sports, culture, fashion and human interest; and (4) Other, includes science and technology, environment, communication, weather and religion and beliefs. Performance We used three different pre-processing steps, resulting in three different .pkl ...