Psycholinguistic dataset on language use in 1145 novels published in English and Dutch

Verfasser:	Severi Luoto Andreas van Cranenburgh
Dokumenttyp:	Artikel
Erscheinungsdatum:	2021
Reihe/Periodikum:	Data in Brief, Vol 34, Iss , Pp 106655- (2021)
Verlag/Hrsg.:	Elsevier
Schlagwörter:	Stylometry / Literature / LIWC / Psycholinguistics / Corpus linguistics / Digital humanities / Computer applications to medicine. Medical informatics / R858-859.7 / Science (General) / Q1-390
Sprache:	Englisch
Permalink:	https://search.fid-benelux.de/Record/base-28990464
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	https://doi.org/10.1016/j.dib.2020.106655

This dataset includes psycholinguistic data on 694 English-language and 451 Dutch-language novels, acquired with computerised analysis of digitised novels published mainly between 1800 and 2018. The English-language novels have a total word count of 66.9 million words, while the Dutch-language novels comprise 49.6 million words, therefore offering large, representative samples for both languages. The data provided in this article include 93 linguistic and psycholinguistic outcome variables for the English-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2015, and 68 linguistic and psycholinguistic outcome variables for the Dutch-language novels, acquired using Linguistic Inquiry and Word Count (LIWC) version 2001. The dataset also includes word frequencies (unigram and bigram) for each novel. The metadata for each novel include year of publication, authors’ nationality, sex, age at publication, and sexual orientation (the latter only in the English-language dataset), making it possible for researchers to study the data along these parameters. The use of these data can help researchers illuminate how word use reflects psychological processes in more than two centuries of literary art in English and in contemporary Dutch novels.