explosion/spaCy: v2.2.0: Norwegian & Lithuanian models, better Dutch NER, smaller install, faster matching & more
⚠️ This version of spaCy requires downloading new models . You can use the <code>spacy validate</code> command to find out which models need updating, and print update instructions. If you've been training your own models , you'll need to retrain them with the new version. ✨ New features and improvements NEW: Pretrained core models for Norwegian (MIT) and Lithuanian (CC BY-SA). NEW: Better pre-trained Dutch NER using custom labelled UD corpus instead of WikiNER. NEW: Make spaCy roughly 5-10× smaller on disk (depending on your platform) by compressing and moving lookups to a separat... Mehr ...
Verfasser: | |
---|---|
Dokumenttyp: | other |
Erscheinungsdatum: | 2019 |
Verlag/Hrsg.: |
Zenodo
|
Sprache: | unknown |
Permalink: | https://search.fid-benelux.de/Record/base-29049846 |
Datenquelle: | BASE; Originalkatalog |
Powered By: | BASE |
Link(s) : | https://doi.org/10.5281/zenodo.3470035 |
⚠️ This version of spaCy requires downloading new models . You can use the <code>spacy validate</code> command to find out which models need updating, and print update instructions. If you've been training your own models , you'll need to retrain them with the new version. ✨ New features and improvements NEW: Pretrained core models for Norwegian (MIT) and Lithuanian (CC BY-SA). NEW: Better pre-trained Dutch NER using custom labelled UD corpus instead of WikiNER. NEW: Make spaCy roughly 5-10× smaller on disk (depending on your platform) by compressing and moving lookups to a separate package . NEW: <code>EntityLinker</code> and <code>KnowledgeBase</code> API to train and access entity linking models, plus scripts to train your own Wikidata models. NEW: 10× faster <code>PhraseMatcher</code> and improved phrase matching algorithm. NEW: <code>DocBin</code> class to efficiently serialize collections of <code>Doc</code> objects. NEW: Train text classification models on the command line with <code>spacy train</code> and get <code>textcat</code> results via the <code>Scorer</code>. NEW: <code>debug-data</code> command to validate your training and development data, get useful stats, and find problems like invalid entity annotations, cyclic dependencies, low data labels and more. NEW: Efficient <code>Lookups</code> class using Bloom filters that allows storing, accessing and serializing large dictionaries via <code>vocab.lookups</code>. Data augmentation in <code>spacy train</code> via the <code>--orth-variant-level</code> flag, which defines the percentage of occurrences of some tokens subject to replacement during training. Add <code>nlp.pipe_labels</code> (labels assigned by pipeline components) and include <code>"labels"</code> in <code>nlp.meta</code>. Support <code>spacy_displacy_colors</code> entry ...