Skip to content

Releases: explosion/spacy-models

el_core_news_sm-2.1.0a6

21 Jan 17:50
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/el#el_core_news_sm

File checksum: 4ca49e6fafabff31df82e53df79f545f6bf78b12fa22146e867ce757aeb55ee4

Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.

Feature Description
Name el_core_news_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Greek Dependency Treebank, Daras GSOC 2018
License CC BY-NC 4.0
Author Giannis Daras

Accuracy

Type Score
ENTS_F  73.10
ENTS_P  71.49
ENTS_R  74.79
LAS  81.47
TAGS_ACC  94.77
TOKEN_ACC  100.00
UAS  84.97

Installation

pip install spacy-nightly
spacy download el_core_news_sm

el_core_news_md-2.1.0a6

21 Jan 17:51
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/el#el_core_news_md

File checksum: bbbc474cc51dec46018abf06f6b8b61f3f35756d389fee8053bb533e1ec610ee

Greek pipeline with word vectors, POS tags, dependencies and named entities. Word vectors use Facebook's FastText Common Crawl vectors, pruned to a vocabulary of 20,000 items. Words outside the most frequent were mapped to the nearest neighbouring vector within the 20,000 rows retained. Syntax (dependencies and POS tags) trained from the Universal Dependencies conversion of the Greek Dependency Treebank (v2.2). Named entity annotations were created by Giannis Daras using Prodigy, using the OntoNotes 5 annotation schema.

Feature Description
Name el_core_news_md
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 126 MB
Pipeline  tagger, parser, ner
Vectors 1999938 keys, 20000 unique vectors (300 dimensions)
Sources Common Crawl, Greek Dependency Treebank, Daras GSOC 2018
License CC BY-NC 4.0
Author Giannis Daras

Accuracy

Type Score
ENTS_F  81.00
ENTS_P  80.50
ENTS_R  81.51
LAS  85.20
TAGS_ACC  96.59
TOKEN_ACC  100.00
UAS  88.36

Installation

pip install spacy-nightly
spacy download el_core_news_md

de_core_news_sm-2.1.0a6

21 Jan 17:51
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/de#de_core_news_sm

File checksum: 60c70639a46b0888154815ebb932bbfe3366134be41b959b62047698bd654f45

German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name de_core_news_sm
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources TIGER Corpus, Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.34
ENTS_P  84.21
ENTS_R  82.48
LAS  89.55
TAGS_ACC  97.20
TOKEN_ACC  99.48
UAS  91.62

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download de_core_news_sm

de_core_news_md-2.1.0a6

21 Jan 17:51
9d44cac
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/de#de_core_news_md

File checksum: 4338d30cbf5f8c2c25d05e2d830d5d2783024c16f436fabcb397b61fa3a40a92

German multi-task CNN trained on the TIGER and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name de_core_news_md
Version 2.1.0a6
spaCy >=2.1.0a4
Model size 210 MB
Pipeline  tagger, parser, ner
Vectors 276087 keys, 20000 unique vectors (300 dimensions)
Sources TIGER Corpus, Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  83.90
ENTS_P  84.73
ENTS_R  83.08
LAS  90.33
TAGS_ACC  97.46
TOKEN_ACC  99.48
UAS  92.19

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download de_core_news_md

xx_ent_wiki_sm-2.1.0a5

18 Dec 13:33
0582c15
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/xx#xx_ent_wiki_sm

File checksum: 9772447233890c869e833d188060de256492faf4402a6112200bc66a49738995

Multi-lingual CNN trained on Nothman et al. (2010) Wikipedia corpus. Assigns named entities. Supports identification of PER, LOC, ORG and MISC entities for English, German, Spanish, French, Italian, Portuguese and Russian.

Feature Description
Name xx_ent_wiki_sm
Version 2.1.0a5
spaCy >=2.1.0a4
Model size 3 MB
Pipeline  ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Wikipedia
License MIT
Author Explosion AI

Accuracy

Type Score
ENTS_F  82.08
ENTS_P  82.59
ENTS_R  81.58

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text.

Installation

pip install spacy-nightly
spacy download xx_ent_wiki_sm

pt_core_news_sm-2.1.0a5

18 Dec 13:34
0582c15
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/pt#pt_core_news_sm

File checksum: d9a81075d944646d739de34855e4ae17fd4f7c63cfe665d324ebcbb3e1a274e7

Portuguese multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name pt_core_news_sm
Version 2.1.0a5
spaCy >=2.1.0a4
Model size 12 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-SA 4.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  87.84
ENTS_P  87.94
ENTS_R  87.73
LAS  85.98
TAGS_ACC  78.46
TOKEN_ACC  100.00
UAS  89.32

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download pt_core_news_sm

nl_core_news_sm-2.1.0a5

18 Dec 13:34
0582c15
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/nl#nl_core_news_sm

File checksum: 8c783276048ebd53362a4598f03667ab2a9a1c762842685837f721283f7ff16b

Dutch multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name nl_core_news_sm
Version 2.1.0a5
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-SA 4.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  85.36
ENTS_P  84.84
ENTS_R  85.89
LAS  77.42
TAGS_ACC  90.93
TOKEN_ACC  100.00
UAS  83.69

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download nl_core_news_sm

it_core_news_sm-2.1.0a5

18 Dec 13:35
0582c15
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/it#it_core_news_sm

File checksum: be0adce67fa6aa2752d26285a4a91cbf7e7e6e8753992b9710aa5978669584be

Italian multi-task CNN trained on the Universal Dependencies and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name it_core_news_sm
Version 2.1.0a5
spaCy >=2.1.0a4
Model size 10 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Universal Dependencies, Wikipedia
License CC BY-NC-SA 3.0
Author Explosion AI

Accuracy

Type Score
ENTS_F  84.77
ENTS_P  85.05
ENTS_R  84.49
LAS  86.95
TAGS_ACC  95.72
TOKEN_ACC  100.00
UAS  90.82

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download it_core_news_sm

fr_core_news_sm-2.1.0a5

18 Dec 13:35
0582c15
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/fr#fr_core_news_sm

File checksum: 6b78d375f4706d527f6a750f941e5fa96616fc1f8e81cce9e43abb4d406bb403

French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name fr_core_news_sm
Version 2.1.0a5
spaCy >=2.1.0a4
Model size 14 MB
Pipeline  tagger, parser, ner
Vectors 0 keys, 0 unique vectors (0 dimensions)
Sources Sequoia Corpus (UD), Wikipedia
License LGPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  80.95
ENTS_P  81.09
ENTS_R  80.81
LAS  84.44
TAGS_ACC  94.40
TOKEN_ACC  100.00
UAS  87.33

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download fr_core_news_sm

fr_core_news_md-2.1.0a5

18 Dec 13:36
0582c15
Compare
Choose a tag to compare
Pre-release

Details: https://spacy.io/models/fr#fr_core_news_md

File checksum: 31a1297e8f031d529f441a518e71593681ce102010bb3237d2b6abb5afe235b8

French multi-task CNN trained on the French Sequoia (Universal Dependencies) and WikiNER corpus. Assigns context-specific token vectors, POS tags, dependency parse and named entities. Supports identification of PER, LOC, ORG and MISC entities.

Feature Description
Name fr_core_news_md
Version 2.1.0a5
spaCy >=2.1.0a4
Model size 82 MB
Pipeline  tagger, parser, ner
Vectors 579447 keys, 20000 unique vectors (300 dimensions)
Sources Sequoia Corpus (UD), Wikipedia
License LGPL
Author Explosion AI

Accuracy

Type Score
ENTS_F  82.17
ENTS_P  82.34
ENTS_R  82.00
LAS  86.09
TAGS_ACC  94.90
TOKEN_ACC  100.00
UAS  88.81

Because the model is trained on Wikipedia, it may perform inconsistently on many genres, such as social media text. The NER accuracy refers to the "silver standard" annotations in the WikiNER corpus. Accuracy on these annotations tends to be higher than correct human annotations.

Installation

pip install spacy-nightly
spacy download fr_core_news_md