fastText
Serbian

FastText Sr

Обучаван над корпусом српског језика - 9.5 милијарди речи

Међу датотекама се налазе модели у Gensim, али и оригиналном формату

Trained on the Serbian language corpus - 9.5 billion words

The files include models in both Gensim and the original format.

from gensim.models import FastText
model = Word2Vec.load("TeslaFT")
examples = [
    ("dim", "zavesa"),
    ("staklo", "zavesa"),
    ("ormar", "zavesa"),
    ("prozor", "zavesa"),
    ("draperija", "zavesa")
]
for e in examples:
    model.wv.cosine_similarities(ft.wv[e[0]], ft.wv[[e[1]]])[0]
0.5305264
0.7095266
0.6041575
0.5771946
0.8870213
from gensim.models.fasttext import load_facebook_model
model = load_facebook_model("TeslaFT.bin")
examples = [
    ("dim", "zavesa"),
    ("staklo", "zavesa"),
    ("ormar", "zavesa"),
    ("prozor", "zavesa"),
    ("draperija", "zavesa")
]
for e in examples:
    model.wv.cosine_similarities(ft.wv[e[0]], ft.wv[[e[1]]])[0]
0.5305264
0.7095266
0.6041575
0.5771946
0.8870213
Author
Mihailo Škorić
Computation
TESLA project


Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA

This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA

Downloads last month
17
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Datasets used to train te-sla/FastTextSr