ARletta. Open-Source Handwritten Text Recognition Models for Historic Dutch
We release ARletta, a series of open-source models for the automated transcription of historic Dutch-language handwritten sources, which has remained a desideratum in the scholarly community until now. All models presented were trained on publicly available data using the open-source kraken engine. Our endeavor focuses on the digitization of a large-scale collection of local police reports (1876–1945). Additionally, we include a supermodel trained on the union of other Dutch-language datasets (extending back to the 17th century) which we hope will be useful as a foundational model for future p... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | Artikel |
Erscheinungsdatum: | 2024 |
Reihe/Periodikum: | Journal of Open Humanities Data, Vol 10, Pp 43-43 (2024) |
Verlag/Hrsg.: |
Ubiquity Press
|
Schlagwörter: | handwritten text recognition / layout analysis / line segmentation / historic dutch / History of scholarship and learning. The humanities / AZ20-999 / Language and Literature / P |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-28987186 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://doi.org/10.5334/johd.225 |
We release ARletta, a series of open-source models for the automated transcription of historic Dutch-language handwritten sources, which has remained a desideratum in the scholarly community until now. All models presented were trained on publicly available data using the open-source kraken engine. Our endeavor focuses on the digitization of a large-scale collection of local police reports (1876–1945). Additionally, we include a supermodel trained on the union of other Dutch-language datasets (extending back to the 17th century) which we hope will be useful as a foundational model for future projects. Our results demonstrate performance that is competitive with proprietary software solutions.