ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch

We release ARletta, a series of open-source models for the automated transcription of historic Dutch-language handwritten sources, which has remained a desideratum in the scholarly community until now. All models presented were trained on publicly available data using the open-source kraken engine. Our endeavor focuses on the digitization of a large-scale collection of local police reports (1876–1945). Additionally, we include a supermodel trained on the union of other Dutch-language datasets (extending back to the 17th century) which we hope will be useful as a foundational model for future p... Mehr ...

Verfasser: Lefranc, Lith
Van Damme, Ilja
Clérice, Thibault
Kestemont, Mike
Dokumenttyp: Artikel
Erscheinungsdatum: 2024
Verlag/Hrsg.: Ubiquity Press
Schlagwörter: handwritten text recognition / layout analysis / line segmentation / historic Dutch
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-28994733
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/225

We release ARletta, a series of open-source models for the automated transcription of historic Dutch-language handwritten sources, which has remained a desideratum in the scholarly community until now. All models presented were trained on publicly available data using the open-source kraken engine. Our endeavor focuses on the digitization of a large-scale collection of local police reports (1876–1945). Additionally, we include a supermodel trained on the union of other Dutch-language datasets (extending back to the 17th century) which we hope will be useful as a foundational model for future projects. Our results demonstrate performance that is competitive with proprietary software solutions.