The Netherlands Biodiversity Data Services and the R package nbaR: Automated workflows for biodiversity data analysis

Verfasser:	Hannes Hettling Maarten Schermer Rutger Vos Daphne Duin
Dokumenttyp:	speech
Erscheinungsdatum:	2018
Schlagwörter:	article / ddc:004 / ddc:570 / ddc:580 / ddc:590 / ddc:600 / ddc:630 / biodiversity data / automated access / API / automated analysis workflows / R package
Sprache:	Englisch
Permalink:	https://search.fid-benelux.de/Record/base-29168008
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	https://doi.org/10.22032/dbt.37814

The value of data present in natural history collections for research in biodiversity, ecology and evolution cannot be overstated. Naturalis Biodiversity Center of the Netherlands, home to one of the largest natural history collections in the world, launched a large-scale digitisation project resulting in the registration of more than 38 million specimen objects, many of them annotated with descriptive metadata, such as geographic coordinates or multimedia content. Other resources hosted at Naturalis include species occurrence records and comprehensive taxonomic checklists, such as the Catalogue of Life. As our institution strongly believes in the Open Science paradigm, we seek to make our data available to the global biodiversity research community, enhancing data analysis workflows, as for example (i) the modelling of present, past and future species distributions using specimen occurrence data, (ii) time calibration of (molecular) phylogenies using dated specimen occurrences, (iii) taxonomic name resolution or (iv) image data mining. To this end, we developed the Netherlands Biodiversity Data services [1], providing centralized access to biodiversity data via state of the art, open access interfaces and a mechanism to assign persistent identifiers to all records. Data are retrieved from heterogeneous sources and harmonized into a document store that complies with international data standards such as ABCD (Access to Biological Collection Data [2]). Employing the Elasticsearch engine, our infrastructure features complex query options, near real-time queries, and scaling possibilities to secure foreseen data growth. Focusing on availability and accessibility, the services were designed as a versatile, low-level REST API to allow the use of our data in a broad variety of applications and services. For programmatic access to our data services, we developed client libraries for several programming languages. Here we present the R package ‘nbaR’ [3], a client especially targeted to an audience of biodiversity ...