File size: 724 Bytes
97d095c
0441395
 
 
 
 
 
97d095c
 
0441395
 
ad6a9a8
0441395
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
tags:
- spacy
- floret
- token-classification
language:
- bg
license: mit
---
Bulgarian word vectors for a Bulgarian Spacy model.

The floret vectors are trained on the Oscar 21.09 corpus and Bulgarian Wikipedia pages using with the following hyperparameters: `floret cbow -dim 300 -mode floret -bucket 200000 -minn 4 -maxn 5 -minCount 20 -neg 10 -hashCount 2 -lr 0.05 -thread 8`

| Feature | Description |
| --- | --- |
| **Name** | `bg_floret_vectors_lg` |
| **Version** | `1.0` |
| **Vectors** | 200000 keys (300 dimensions) |
| **Sources** | OSCAR Corpus 21.09 (Julien Abadji, Pedro Ortiz Suarez), Wikipedia (bgwiki-latest-pages-articles from June 11th) |
| **License** | `MIT` |
| **Author** | Ivaylo Sakelariev |