Evaluating the performance and usability of a Tesseract-based OCR workflow on French-Dutch bilingual historical sources ...

The study of texts using a qualitative approach remains the dominant modus operandi in humanities research (D. Nguyen et al., 2020) . While most humanities researchers emphasize the critical examination of texts, digital research methodologies are gradually being adopted as complementary options (Levenberg et al., 2018) . These computational practices allow researchers to process, aggregate and analyze large quantities of texts. Analytical techniques can help humanities scholars uncover principles and patterns that were previously hidden or identify salient sources for further qualitative rese... Mehr ...

Verfasser: Van den broeck, Alec
Dejaeghere, Tess
Foket, Lise
Ducatteeuw, Vincent
Landuyt, Julie
Birkholz, Julie
Verbruggen, Christophe
Lamsens, Frederic
Chambers, Sally
Dokumenttyp: Scholarlyarticle
Erscheinungsdatum: 2022
Verlag/Hrsg.: Zenodo
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-28981467
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://dx.doi.org/10.5281/zenodo.6602981

The study of texts using a qualitative approach remains the dominant modus operandi in humanities research (D. Nguyen et al., 2020) . While most humanities researchers emphasize the critical examination of texts, digital research methodologies are gradually being adopted as complementary options (Levenberg et al., 2018) . These computational practices allow researchers to process, aggregate and analyze large quantities of texts. Analytical techniques can help humanities scholars uncover principles and patterns that were previously hidden or identify salient sources for further qualitative research (Bod, 2013; Aiello & Simeone, 2019) . However, to support these and more advanced use cases such as Natural Language Processing (NLP), sources must be digitized and transformed into a machine-readable format through Optical Character Recognition (OCR) (Lopresti, 2009) . Despite the fact that OCR software is frequently used to convert analogue sources into digital texts, off-the-shelf OCR tools are usually less ...