explosion/spaCy: v2.2.0: Norwegian & Lithuanian models, better Dutch NER, smaller install, faster matching & more

⚠️ This version of spaCy requires downloading new models . You can use the <code>spacy validate</code> command to find out which models need updating, and print update instructions. If you've been training your own models , you'll need to retrain them with the new version. ✨ New features and improvements NEW: Pretrained core models for Norwegian (MIT) and Lithuanian (CC BY-SA). NEW: Better pre-trained Dutch NER using custom labelled UD corpus instead of WikiNER. NEW: Make spaCy roughly 5-10× smaller on disk (depending on your platform) by compressing and moving lookups to a separat... Mehr ...

Verfasser: Matthew Honnibal
Ines Montani
Sofie Van Landeghem
Henning Peters
Maxim Samsonov
Jim Geovedi
Jim Regan
György Orosz",adrianeboyd,"Paul O'Leary McCann
Søren Lind Kristiansen
Duygu Altinok",Roman,"Grégory Howard
Wannaphong Phatthiyaphaibun
Sam Bozek
Explosion Bot
Björn Böing
Mark Amery
Leif Uwe Vogelsang
Pradeep Kumar Tippa",jeannefukumaru,GregDubbin,"Vadim Mazaev
Ramanan Balakrishnan
Jens Dahl Møllerhøj",wbwseeker,"Magnus Burton
Avadh Patel
Dokumenttyp: other
Erscheinungsdatum: 2019
Verlag/Hrsg.: Zenodo
Sprache: unknown
Permalink: https://search.fid-benelux.de/Record/base-29049846
Datenquelle: BASE; Originalkatalog
Powered By: BASE
Link(s) : https://doi.org/10.5281/zenodo.3470035

⚠️ This version of spaCy requires downloading new models . You can use the <code>spacy validate</code> command to find out which models need updating, and print update instructions. If you've been training your own models , you'll need to retrain them with the new version. ✨ New features and improvements NEW: Pretrained core models for Norwegian (MIT) and Lithuanian (CC BY-SA). NEW: Better pre-trained Dutch NER using custom labelled UD corpus instead of WikiNER. NEW: Make spaCy roughly 5-10× smaller on disk (depending on your platform) by compressing and moving lookups to a separate package . NEW: <code>EntityLinker</code> and <code>KnowledgeBase</code> API to train and access entity linking models, plus scripts to train your own Wikidata models. NEW: 10× faster <code>PhraseMatcher</code> and improved phrase matching algorithm. NEW: <code>DocBin</code> class to efficiently serialize collections of <code>Doc</code> objects. NEW: Train text classification models on the command line with <code>spacy train</code> and get <code>textcat</code> results via the <code>Scorer</code>. NEW: <code>debug-data</code> command to validate your training and development data, get useful stats, and find problems like invalid entity annotations, cyclic dependencies, low data labels and more. NEW: Efficient <code>Lookups</code> class using Bloom filters that allows storing, accessing and serializing large dictionaries via <code>vocab.lookups</code>. Data augmentation in <code>spacy train</code> via the <code>--orth-variant-level</code> flag, which defines the percentage of occurrences of some tokens subject to replacement during training. Add <code>nlp.pipe_labels</code> (labels assigned by pipeline components) and include <code>"labels"</code> in <code>nlp.meta</code>. Support <code>spacy_displacy_colors</code> entry ...