An Approach to Geotag a Web Sized Corpus of Documents with Addresses in Randstad, Netherlands
This paper describes a cluster compute workflow about how a web sized corpus of documents (3.6 ×10^9 documents, 260 TiB of data) can be geotagged and how semantic similarities of documents geotagged to the same address could be used to verify these tags.
Verfasser: | |
---|---|
Dokumenttyp: | conferenceObject |
Erscheinungsdatum: | 2018 |
Verlag/Hrsg.: |
ETH Zurich
|
Schlagwörter: | Geotagging / Data Science / Data Mining / Natural Language Processing |
Sprache: | Englisch |
Permalink: | https://search.fid-benelux.de/Record/base-29174225 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://hdl.handle.net/20.500.11850/225615 |