An Approach to Geotag a Web Sized Corpus of Documents with Addresses in Randstad, Netherlands

This paper describes a cluster compute workflow about how a web sized corpus of documents (3.6 ×10^9 documents, 260 TiB of data) can be geotagged and how semantic similarities of documents geotagged to the same address could be used to verify these tags.

Verfasser: Czech, Alexander
Dokumenttyp: conferenceObject
Erscheinungsdatum: 2018
Verlag/Hrsg.: ETH Zurich
Schlagwörter: Geotagging / Data Science / Data Mining / Natural Language Processing
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-29174225
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://hdl.handle.net/20.500.11850/225615