Scans and transcriptions of the VOC and the Haarlem notarial deeds archives
The National Archives of the Netherlands and the Noord-Hollands Archief started a colloboration with the Transkribus HTR (Handwritten Text Recognition) platform in order to semi automatically transcribe 2 million pages of old Dutch texts. The archives are 17th and 18th century material from the Dutch East-Asia Company (VOC) and 19th century notarial deeds from the city of Haarlem. In order to train the HTR software, human made transciptions had to be made. These datasets contain scans (.jpg images) with the transcriptions in ALTO xml format (word level). The first set contains scans and transc... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | other |
Erscheinungsdatum: | 2019 |
Verlag/Hrsg.: |
Zenodo
|
Schlagwörter: | Transciptions / Verenigde Oost-Indische Compagnie / Notarial deeds / Nationaal Archief / Noord-Hollands Archief / Transkribus |
Sprache: | unknown |
Permalink: | https://search.fid-benelux.de/Record/base-29506947 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://doi.org/10.5281/zenodo.3517777 |
The National Archives of the Netherlands and the Noord-Hollands Archief started a colloboration with the Transkribus HTR (Handwritten Text Recognition) platform in order to semi automatically transcribe 2 million pages of old Dutch texts. The archives are 17th and 18th century material from the Dutch East-Asia Company (VOC) and 19th century notarial deeds from the city of Haarlem. In order to train the HTR software, human made transciptions had to be made. These datasets contain scans (.jpg images) with the transcriptions in ALTO xml format (word level). The first set contains scans and transcriptions from the Verenigde Oost-Indische Compagnie (VOC) archive, it's inventory can be found here: http://www.gahetna.nl/archievenoverzicht/pdf/NL-HaNA_1.04.02.ead.pdf Inventory numbers The transcipts are samples of the following inventory numbers: 7528-8827 Country/place Dutch Indies (modern day Indonesia) / Batavia(modern day Jakarta) Language Dutch Number of transcriptions 2273 (split) --- The second set contains scans and transcriptions from the Notarial deeds of Haarlem, it's inventories can be found here: https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1972&milang=nl&miview=inv2 https://noord-hollandsarchief.nl/bronnen/archieven?mivast=236&mizig=210&miadt=236&micode=1617&milang=nl&miview=inv2 Inventory numbers The transcipts are samples of the following inventory numbers: 1617_1600 until 1617_1805 and 1972_10 until 1972_99 Country/place The Netherlands/ Haarlem Language Dutch and sometimes French Number of transcriptions 1203 (spread) --- More data will follow when ready