sakelariev's picture
Update README.md
ad6a9a8
|
raw
history blame contribute delete
No virus
724 Bytes
metadata
tags:
  - spacy
  - floret
  - token-classification
language:
  - bg
license: mit

Bulgarian word vectors for a Bulgarian Spacy model.

The floret vectors are trained on the Oscar 21.09 corpus and Bulgarian Wikipedia pages using with the following hyperparameters: floret cbow -dim 300 -mode floret -bucket 200000 -minn 4 -maxn 5 -minCount 20 -neg 10 -hashCount 2 -lr 0.05 -thread 8

Feature Description
Name bg_floret_vectors_lg
Version 1.0
Vectors 200000 keys (300 dimensions)
Sources OSCAR Corpus 21.09 (Julien Abadji, Pedro Ortiz Suarez), Wikipedia (bgwiki-latest-pages-articles from June 11th)
License MIT
Author Ivaylo Sakelariev