explosion/spaCy: v2.2.0: Norwegian & Lithuanian models, better Dutch NER, smaller install, faster matching & more

Verfasser:	Matthew Honnibal Ines Montani Sofie Van Landeghem Henning Peters Maxim Samsonov Jim Geovedi Jim Regan György Orosz",adrianeboyd,"Paul O'Leary McCann Søren Lind Kristiansen Duygu Altinok",Roman,"Grégory Howard Wannaphong Phatthiyaphaibun Sam Bozek Explosion Bot Björn Böing Mark Amery Leif Uwe Vogelsang Pradeep Kumar Tippa",jeannefukumaru,GregDubbin,"Vadim Mazaev Ramanan Balakrishnan Jens Dahl Møllerhøj",wbwseeker,"Magnus Burton Avadh Patel
Dokumenttyp:	other
Erscheinungsdatum:	2019
Sprache:	unknown
Permalink:	https://search.fid-benelux.de/Record/base-27078179
Datenquelle:	BASE; Originalkatalog
Powered By:	BASE
Link(s) :	https://zenodo.org/record/3470035

⚠️ This version of spaCy requires downloading new models. You can use the spacy validate command to find out which models need updating, and print update instructions. If you've been training your own models, you'll need to retrain them with the new version. ✨ New features and improvements NEW: Pretrained core models for Norwegian (MIT) and Lithuanian (CC BY-SA). NEW: Better pre-trained Dutch NER using custom labelled UD corpus instead of WikiNER. NEW: Make spaCy roughly 5-10× smaller on disk (depending on your platform) by compressing and moving lookups to a separate package. NEW: EntityLinker and KnowledgeBase API to train and access entity linking models, plus scripts to train your own Wikidata models. NEW: 10× faster PhraseMatcher and improved phrase matching algorithm. NEW: DocBin class to efficiently serialize collections of Doc objects. NEW: Train text classification models on the command line with spacy train and get textcat results via the Scorer. NEW: debug-data command to validate your training and development data, get useful stats, and find problems like invalid entity annotations, cyclic dependencies, low data labels and more. NEW: Efficient Lookups class using Bloom filters that allows storing, accessing and serializing large dictionaries via vocab.lookups. Data augmentation in spacy train via the --orth-variant-level flag, which defines the percentage of occurrences of some tokens subject to replacement during training. Add nlp.pipe_labels (labels assigned by pipeline components) and include "labels" in nlp.meta. Support spacy_displacy_colors entry point to allow packages to add entity colors to displacy. Allow template config option in displacy to customize entity HTML template. Improve match pattern validation and handling of unsupported attributes. Add lookup lemmatization data for Croatian and Serbian. Update and improve language data for Chinese, Croatian, Thai, Romanian, Hindi and English. 🔴 Bug fixes Fix issue #3258: Reduce package size on disk by moving and compressing large dictionaries. ...