LifeWatch observatory data: phytoplankton annotated trainingset by FlowCam imaging in the Belgian Part of the North Sea

Training dataset The images were collected in the framework of the Belgian Lifewatch Research Infrastructure. During multidisciplinary campaigns, a number of fixed stations in the Belgian Part of the North Sea (BPNS) are visited on a monthly (onshore stations) or seasonal (offshore stations) basis. Samples are taken using a 55µm mesh size Apstein net and fixed in Lugol's iodine solution. In the lab, the samples are processed using a VS-4 FlowCAM model at 4X magnification targeting a particle size range of 55-300µm. The identification of the image data is done with the use of a CNN and followed... Mehr ...

Verfasser: Decrop, Wout
Lagaisse, Rune
Jonas, Mortelmans
Muyle, Julie
Amadei Martínez, Luz
Deneudt, Klaas
Dokumenttyp: other
Erscheinungsdatum: 2024
Verlag/Hrsg.: Zenodo
Schlagwörter: Simon Stevin / Imagine / phytoplankton / Biodiversity / LifeWatch / training-data / Belgium / Belgian Continental Shelf (BCS) / EurOBIS calculated BBOX / EGI / Bacillariophyceae / Ciliophora / Dictyochophyceae / Dinophyceae / Prymnesiophyceae / Biological monitoring / Marine/Coastal / ML / CNN
Sprache: Englisch
Permalink: https://search.fid-benelux.de/Record/base-28883246
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://doi.org/10.5281/zenodo.10554845

Training dataset The images were collected in the framework of the Belgian Lifewatch Research Infrastructure. During multidisciplinary campaigns, a number of fixed stations in the Belgian Part of the North Sea (BPNS) are visited on a monthly (onshore stations) or seasonal (offshore stations) basis. Samples are taken using a 55µm mesh size Apstein net and fixed in Lugol's iodine solution. In the lab, the samples are processed using a VS-4 FlowCAM model at 4X magnification targeting a particle size range of 55-300µm. The identification of the image data is done with the use of a CNN and followed by a manual validation step. Since May 2017, this dataset has provided micro- and phytoplankton observations, mainly covering diatoms, dinoflagellates and cilliates, for the Belgian Part of the North Sea (BPNS). This dataset comprises a trainings datasplit of 337,613 images distributed across 95 classes, with each class containing a minimum of 100 and a maximum of 10,000 images. The goal of this dataset is to be able to facilitate model training, here we have organized the data into a standard split, with 80% allocated for training, 10% for validation, and another 10% for testing purposes. This dataset structure ensures a balanced representation and supports scientific rigor in subsequent analyses. Technical details Data preprocessing Raw FlowCam output data is fully processed using in-house datapipelines, the VisualSpreadsheet software is only used for data acquisition during the lab run of the sample. Raw images and binary images are never saved during the FlowCam run, we only work on the image collages saved at the end of the run. Single images are cut from these collages using each image coordinates width and height pulled from the .lst file using in-house python code. The background of the images is not removed. These images are then predicted and annotated in-house at VLIZ. Data splitting The training dataset is 80% used for training, 10% for validation and 10% for prediction. Classes, labels and annotations The ...