Reflections on Encoding Languages in Historical Data: Working With the Multilingual Dimension of the Dutch East India Company Archives

This article investigates the challenges of encoding languages in historical data through the example of a reference dataset: a thesaurus in SKOS format of commodities traded by the Dutch East India Company (VOC). The VOC archives, from which this thesaurus draws a lot of its data, are far from purely Dutch. The company’s multilingual workforce and interactions across Asia resulted in records influenced by a multitude of languages, full of loanwords and citations. This is further complicated by the VOC’s role in colonising regions and suppressing local languages, resulting in some languages po... Mehr ...

Verfasser: Pepping, K. W.
Dokumenttyp: Artikel
Erscheinungsdatum: 2024
Verlag/Hrsg.: Ubiquity Press
Schlagwörter: Dutch East India Company (VOC) / skos / htr / reference data / language tagging / code-switching
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-27025792
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://account.openhumanitiesdata.metajnl.com/index.php/up-j-johd/article/view/176

This article investigates the challenges of encoding languages in historical data through the example of a reference dataset: a thesaurus in SKOS format of commodities traded by the Dutch East India Company (VOC). The VOC archives, from which this thesaurus draws a lot of its data, are far from purely Dutch. The company’s multilingual workforce and interactions across Asia resulted in records influenced by a multitude of languages, full of loanwords and citations. This is further complicated by the VOC’s role in colonising regions and suppressing local languages, resulting in some languages potentially only surviving in these ‘Dutch’ archives. This means that when working with a large corpus like the VOC archives, various challenges arise regarding historical language evolution, vocabulary borrowing, extinct languages, technical standards that are not geared towards historical context, and political sensitivities around identity-bound language. The article demonstrates how the GLOBALISE project navigates these issues by prioritising transparency, flexibility, and iterative refinement. It argues that as long as researchers are aware of the challenges, language complexities are not a roadblock but offer opportunities for further research and critical engagement with the past, encouraging broader discussions and creative solutions for encoding historical multilingualism and development of language.