Releases: explosion/spacy-models
el_core_news_sm-2.1.0a6
Details: https://spacy.io/models/el#el_core_news_sm
File checksum:
4ca49e6fafabff31df82e53df79f545f6bf78b12fa22146e867ce757aeb55ee4
Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.
Feature | Description |
---|---|
Name | el_core_news_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Greek Dependency Treebank, Daras GSOC 2018 |
License | CC BY-NC 4.0 |
Author | Giannis Daras |
Accuracy
Type | Score |
---|---|
ENTS_F |
73.10 |
ENTS_P |
71.49 |
ENTS_R |
74.79 |
LAS |
81.47 |
TAGS_ACC |
94.77 |
TOKEN_ACC |
100.00 |
UAS |
84.97 |
Installation
pip install spacy-nightly
spacy download el_core_news_sm
el_core_news_md-2.1.0a6
Details: https://spacy.io/models/el#el_core_news_md
File checksum:
bbbc474cc51dec46018abf06f6b8b61f3f35756d389fee8053bb533e1ec610ee
Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.
Feature | Description |
---|---|
Name | el_core_news_md |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 126 MB |
Pipeline | tagger , parser , ner |
Vectors | 1999938 keys, 20000 unique vectors (300 dimensions) |
Sources | Common Crawl, Greek Dependency Treebank, Daras GSOC 2018 |
License | CC BY-NC 4.0 |
Author | Giannis Daras |
Accuracy
Type | Score |
---|---|
ENTS_F |
81.00 |
ENTS_P |
80.50 |
ENTS_R |
81.51 |
LAS |
85.20 |
TAGS_ACC |
96.59 |
TOKEN_ACC |
100.00 |
UAS |
88.36 |
Installation
pip install spacy-nightly
spacy download el_core_news_md
de_core_news_sm-2.1.0a6
Details: https://spacy.io/models/de#de_core_news_sm
File checksum:
60c70639a46b0888154815ebb932bbfe3366134be41b959b62047698bd654f45
German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | de_core_news_sm |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | TIGER Corpus, Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.34 |
ENTS_P |
84.21 |
ENTS_R |
82.48 |
LAS |
89.55 |
TAGS_ACC |
97.20 |
TOKEN_ACC |
99.48 |
UAS |
91.62 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download de_core_news_sm
de_core_news_md-2.1.0a6
Details: https://spacy.io/models/de#de_core_news_md
File checksum:
4338d30cbf5f8c2c25d05e2d830d5d2783024c16f436fabcb397b61fa3a40a92
German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | de_core_news_md |
Version | 2.1.0a6 |
spaCy | >=2.1.0a4 |
Model size | 210 MB |
Pipeline | tagger , parser , ner |
Vectors | 276087 keys, 20000 unique vectors (300 dimensions) |
Sources | TIGER Corpus, Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
83.90 |
ENTS_P |
84.73 |
ENTS_R |
83.08 |
LAS |
90.33 |
TAGS_ACC |
97.46 |
TOKEN_ACC |
99.48 |
UAS |
92.19 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download de_core_news_md
xx_ent_wiki_sm-2.1.0a5
Details: https://spacy.io/models/xx#xx_ent_wiki_sm
File checksum:
9772447233890c869e833d188060de256492faf4402a6112200bc66a49738995
Multi-lingual CNN trained on Nothman et al. (2010) Wikipedia corpus. Assigns named entities. Supports identification of PER, LOC, ORG and MISC entities for English, German, Spanish, French, Italian, Portuguese and Russian.
Feature | Description |
---|---|
Name | xx_ent_wiki_sm |
Version | 2.1.0a5 |
spaCy | >=2.1.0a4 |
Model size | 3 MB |
Pipeline | ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Wikipedia |
License | MIT |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
82.08 |
ENTS_P |
82.59 |
ENTS_R |
81.58 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text.
Installation
pip install spacy-nightly
spacy download xx_ent_wiki_sm
pt_core_news_sm-2.1.0a5
Details: https://spacy.io/models/pt#pt_core_news_sm
File checksum:
d9a81075d944646d739de34855e4ae17fd4f7c63cfe665d324ebcbb3e1a274e7
Portuguese multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | pt_core_news_sm |
Version | 2.1.0a5 |
spaCy | >=2.1.0a4 |
Model size | 12 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-SA 4.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
87.84 |
ENTS_P |
87.94 |
ENTS_R |
87.73 |
LAS |
85.98 |
TAGS_ACC |
78.46 |
TOKEN_ACC |
100.00 |
UAS |
89.32 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download pt_core_news_sm
nl_core_news_sm-2.1.0a5
Details: https://spacy.io/models/nl#nl_core_news_sm
File checksum:
8c783276048ebd53362a4598f03667ab2a9a1c762842685837f721283f7ff16b
Dutch multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | nl_core_news_sm |
Version | 2.1.0a5 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-SA 4.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
85.36 |
ENTS_P |
84.84 |
ENTS_R |
85.89 |
LAS |
77.42 |
TAGS_ACC |
90.93 |
TOKEN_ACC |
100.00 |
UAS |
83.69 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download nl_core_news_sm
it_core_news_sm-2.1.0a5
Details: https://spacy.io/models/it#it_core_news_sm
File checksum:
be0adce67fa6aa2752d26285a4a91cbf7e7e6e8753992b9710aa5978669584be
Italian multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | it_core_news_sm |
Version | 2.1.0a5 |
spaCy | >=2.1.0a4 |
Model size | 10 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Universal Dependencies, Wikipedia |
License | CC BY-NC-SA 3.0 |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
84.77 |
ENTS_P |
85.05 |
ENTS_R |
84.49 |
LAS |
86.95 |
TAGS_ACC |
95.72 |
TOKEN_ACC |
100.00 |
UAS |
90.82 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download it_core_news_sm
fr_core_news_sm-2.1.0a5
Details: https://spacy.io/models/fr#fr_core_news_sm
File checksum:
6b78d375f4706d527f6a750f941e5fa96616fc1f8e81cce9e43abb4d406bb403
French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | fr_core_news_sm |
Version | 2.1.0a5 |
spaCy | >=2.1.0a4 |
Model size | 14 MB |
Pipeline | tagger , parser , ner |
Vectors | 0 keys, 0 unique vectors (0 dimensions) |
Sources | Sequoia Corpus (UD), Wikipedia |
License | LGPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
80.95 |
ENTS_P |
81.09 |
ENTS_R |
80.81 |
LAS |
84.44 |
TAGS_ACC |
94.40 |
TOKEN_ACC |
100.00 |
UAS |
87.33 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download fr_core_news_sm
fr_core_news_md-2.1.0a5
Details: https://spacy.io/models/fr#fr_core_news_md
File checksum:
31a1297e8f031d529f441a518e71593681ce102010bb3237d2b6abb5afe235b8
French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.
Feature | Description |
---|---|
Name | fr_core_news_md |
Version | 2.1.0a5 |
spaCy | >=2.1.0a4 |
Model size | 82 MB |
Pipeline | tagger , parser , ner |
Vectors | 579447 keys, 20000 unique vectors (300 dimensions) |
Sources | Sequoia Corpus (UD), Wikipedia |
License | LGPL |
Author | Explosion AI |
Accuracy
Type | Score |
---|---|
ENTS_F |
82.17 |
ENTS_P |
82.34 |
ENTS_R |
82.00 |
LAS |
86.09 |
TAGS_ACC |
94.90 |
TOKEN_ACC |
100.00 |
UAS |
88.81 |
Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.
Installation
pip install spacy-nightly
spacy download fr_core_news_md