Text Recognition Model for Yiddish in 'Vaybertaytsh' Typeface, Based on Community Regulations

We present a public text recognition PyLaia model accompanied by a baseline model for the layout of community regulations in Yiddish and a dataset for Yiddish texts printed in Vaybertaytsh typeface. The model was built using legal documents, namely regulations written by the Ashkenazi Jewish community in Amsterdam during the 18th century. The necessity of such a model for Vaybertaytsh typeface stems from the substantial differences between it and other Yiddish or Hebrew typefaces. Existing text recognition models for Yiddish are dedicated to handwritten texts or substantially other typefaces,... Mehr ...

Verfasser: Ronny Reshef
Mirjam Gutschow
Dokumenttyp: Artikel
Erscheinungsdatum: 2024
Reihe/Periodikum: Journal of Open Humanities Data, Vol 10, Pp 35-35 (2024)
Verlag/Hrsg.: Ubiquity Press
Schlagwörter: yiddish printing / vaybertaytsh / transkribus / printing history / western yiddish / amsterdam / netherlands / History of scholarship and learning. The humanities / AZ20-999 / Language and Literature / P
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29171517
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://doi.org/10.5334/johd.194

We present a public text recognition PyLaia model accompanied by a baseline model for the layout of community regulations in Yiddish and a dataset for Yiddish texts printed in Vaybertaytsh typeface. The model was built using legal documents, namely regulations written by the Ashkenazi Jewish community in Amsterdam during the 18th century. The necessity of such a model for Vaybertaytsh typeface stems from the substantial differences between it and other Yiddish or Hebrew typefaces. Existing text recognition models for Yiddish are dedicated to handwritten texts or substantially other typefaces, followed by a short description of the dataset, its unique characteristics, and how it can be used further. The process of training the text recognition model is explained, and challenges encountered are specified, as well as strategies for coping with them. The model is publicly accessible via Transkribus, and the complete dataset used to train the model is available via Figshare. The models and dataset offer valuable contributions to the digital humanities, specifically for research on linguistics, Jewish History and related fields.