Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- fa
|
4 |
+
pipeline_tag: feature-extraction
|
5 |
+
---
|
6 |
+
This is the Persian word2vec embedding model trained with CBOW algorithm on the wikipedia data.
|
7 |
+
|
8 |
+
In order to use this model in Hezar you can simply use this piece of code:
|
9 |
+
```bash
|
10 |
+
pip install hezar
|
11 |
+
```
|
12 |
+
```python
|
13 |
+
from hezar import Embedding
|
14 |
+
|
15 |
+
w2v = Embedding.load("hezarai/word2vec-cbow-fa-wikipedia")
|
16 |
+
# Get embedding vector
|
17 |
+
vector = w2v("هزار")
|
18 |
+
# Find the word that doesn't match with the rest
|
19 |
+
doesnt_match = w2v.doesnt_match(["خانه", "اتاق", "ماشین"])
|
20 |
+
# Find the top-n most similar words to the given word
|
21 |
+
most_similar = w2v.most_similar("هزار", top_n=5)
|
22 |
+
# Find the cosine similarity value between two words
|
23 |
+
similarity = w2v.similarity("مهندس", "دکتر")
|
24 |
+
```
|