|
--- |
|
language: |
|
- fa |
|
library_name: hezar |
|
tags: |
|
- feature-extraction |
|
- hezar |
|
pipeline_tag: feature-extraction |
|
--- |
|
This is the Persian word2vec embedding model trained with CBOW algorithm on the wikipedia data. |
|
|
|
In order to use this model in Hezar you can simply use this piece of code: |
|
```bash |
|
pip install hezar |
|
``` |
|
```python |
|
from hezar.embeddings import Embedding |
|
|
|
w2v = Embedding.load("hezarai/word2vec-cbow-fa-wikipedia") |
|
# Get embedding vector |
|
vector = w2v("هزار") |
|
# Find the word that doesn't match with the rest |
|
doesnt_match = w2v.doesnt_match(["خانه", "اتاق", "ماشین"]) |
|
# Find the top-n most similar words to the given word |
|
most_similar = w2v.most_similar("هزار", top_n=5) |
|
# Find the cosine similarity value between two words |
|
similarity = w2v.similarity("مهندس", "دکتر") |
|
``` |