File size: 1,284 Bytes
a40bda6
 
 
d34beaf
96fd4c0
 
d34beaf
 
 
 
 
 
 
 
 
8fb1c05
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
license: mit
---
# Bangla FastText Model
This is a FastText pre-trained model for the Bengali language.

This model is build for [bnlp](https://github.com/sagorbrur/bnlp) package.

## Datasets
- [Wikipedia dump datasets](https://dumps.wikimedia.org/bnwiki/latest/)

## Training Details
- Fasttext trained with total words = 20M, vocab size = 1171011, epoch=50, embedding dimension = 300

## Evaluation Details
- training loss = 0.318668

## Usage
- `pip install -U bnlp_toolkit`
- `pip install fasttext==0.9.2` 
- Generate Vector Using Pretrained Model
 ```py
 from bnlp.embedding.fasttext import BengaliFasttext

 bft = BengaliFasttext()
 word = "গ্রাম"
 model_path = "bengali_fasttext_wiki.bin"
 word_vector = bft.generate_word_vector(model_path, word)
 print(word_vector.shape)
 print(word_vector)
 ```
      
- Train Bengali FastText Model

```py
from bnlp.embedding.fasttext import BengaliFasttext

bft = BengaliFasttext()
data = "raw_text.txt"
model_name = "saved_model.bin"
epoch = 50
bft.train(data, model_name, epoch)
```

- Generate Vector File from Fasttext Binary Model
```py
from bnlp.embedding.fasttext import BengaliFasttext

bft = BengaliFasttext()

model_path = "mymodel.bin"
out_vector_name = "myvector.txt"
bft.bin2vec(model_path, out_vector_name)
```