Multilingual comparable corpora of parliamentary debates ParlaMint 4.0

Verfasser:	Erjavec, Tomaž Kopp, Matyáš Ogrodniczuk, Maciej Osenova, Petya Agirrezabal, Manex Agnoloni, Tommaso Aires, José Albini, Monica Alkorta, Jon Antiba-Cartazo, Iván Arrieta, Ekain Barcala, Mario Bardanca, Daniel Barkarson, Starkaður Bartolini, Roberto Battistoni, Roberto Bel, Nuria Bonet Ramos, Maria del Mar Calzada Pérez, María Cardoso, Aida Çöltekin, Çağrı Coole, Matthew Darģis, Roberts de Libano, Ruben Depoorter, Griet Diwersy, Sascha Dodé, Réka Fernandez, Kike Fernández Rei, Elisa Frontini, Francesca Garcia, Marcos García Díaz, Noelia García Louzao, Pedro Gavriilidou, Maria Gkoumas, Dimitris Grigorov, Ilko Grigorova, Vladislava Haltrup Hansen, Dorte Iruskieta, Mikel Jarlbrink, Johan Jelencsik-Mátyus, Kinga Jongejan, Bart Kahusk, Neeme Kirnbauer, Martin Kryvenko, Anna Ligeti-Nagy, Noémi Ljubešić, Nikola Luxardo, Giancarlo Magariños, Carmen Magnusson, Måns Marchetti, Carlo Marx, Maarten Meden, Katja Mendes, Amália Mochtak, Michal Mölder, Martin Montemagni, Simonetta Navarretta, Costanza Nitoń, Bartłomiej Norén, Fredrik Mohammadi Nwadukwe, Amanda Ojsteršek, Mihael Pančur, Andrej Papavassiliou, Vassilis Pereira, Rui Pérez Lago, María Piperidis, Stelios Pirker, Hannes Pisani, Marilina Pol, Henk van der Prokopidis, Prokopis Quochi, Valeria Rayson, Paul Regueira, Xosé Luís Rudolf, Michał Ruisi, Manuela Rupnik, Peter Schopper, Daniel Simov, Kiril Sinikallio, Laura Skubic, Jure Tungland, Lars Magne Tuominen, Jouni van Heusden, Ruben Varga, Zsófia Vázquez Abuín, Marta Venturi, Giulia Vidal Miguéns, Adrián Vider, Kadri Vivel Couso, Ainhoa Vladu, Adina Ioana Wissik, Tanja Yrjänäinen, Väinö Zevallos, Rodolfo Fišer, Darja
Dokumenttyp:	corpus
Erscheinungsdatum:	2023
Verlag/Hrsg.:	CLARIN ERIC
Schlagwörter:	parliamentary debates / COVID-19 / TEI / Parla-CLARIN / Czech Parliament / Icelandic Parliament / Belgian Parliament / Danish Parliament / Dutch Parliament / Turkish Parliament / Italian Parliament / Hungarian Parliament / Latvian Parliament / Bulgarian Parliament / Croatian Parliament / Polish Parliament / Slovenian Parliament / French Parliament / Austrian Parliament / Bosnian Parliament / Catalonian Parliament / Galician Parliament / Greek Parliament / Norwegian Parliament / Serbian Parliament / Swedish Parliament / Ukrainian Parliament / Finnish Parliament / Spanish Parliament / Estonian Parliament / Basque Parliament / Portuguese Parliament / UK Parliament
Sprache:	Bulgarian Croatian Polish Slovenian Tschechisch ice Französisch Niederländisch Danish Spanish Turkish Englisch Italian Hungarian Latvian Bosnian Catalan Deutsch Greek Estonian Portuguese Serbian Swedish Ukrainian Norwegian Galician Russian Finnish baq
Permalink:	https://search.fid-benelux.de/Record/base-26509047
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	http://hdl.handle.net/11356/1859

ParlaMint 4.0 is a set of comparable corpora containing transcriptions of parliamentary debates of 29 European countries and autonomous regions, mostly starting in 2015 and extending to mid-2022. The individual corpora comprise between 9 and 126 million words and the complete set contains over 1.1 billion words. The transcriptions are divided by days with information on the term, session and meeting, and contain speeches marked by the speaker and their role (e.g. chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. The corpora have extensive metadata, most importantly on speakers (name, gender, MP and minister status, party affiliation), the political parties and parliamentary groups (name, coalition/opposition status, Wikipedia-sourced left-to-right political orientation, and CHES variables, https://www.chesdata.eu/). Note that some corpora have further metadata, e.g. the year of birth of the speakers, links to their Wikipedia articles, their membership in various committees, etc. The transcriptions are also marked with the subcorpus they belong to ("reference", until 2020-01-30, "covid", from 2020-01-31, and "war", from 2022-02-24). The corpora are encoded according to the Parla-CLARIN TEI recommendation (https://clarin-eric.github.io/parla-clarin/), but have been encoded against the compatible, but much stricter ParlaMint encoding guidelines (https://clarin-eric.github.io/ParlaMint/) and schemas (included in the distribution). This entry contains the ParlaMint TEI-encoded corpora and their derived plain text versions along with TSV metadata of the speeches. Also included is the 4.0 release of the sample data and scripts available at the GitHub repository of the ParlaMint project at https://github.com/clarin-eric/ParlaMint. Note that there also exists the linguistically marked-up version of the 4.0 ParlaMint corpus, also linked with concordancers, which is available at http://hdl.handle.net/11356/1860. Another ...