Serbian
dict2vec

SerbDict2vec

Обучаван над корпусом српског језика Википедија, СрпКор2013 и део СрпКор2021 - 350 милиона речи

Trained on the Serbian language corpus compiled from srWikipedia, SrpKor2013, and part of SrpKor2021 - 350 million words

from gensim.models import KeyedVectors

# Load the vectors
d2v_vectors = KeyedVectors.load("D:/modeli/dict2vec/SerbDict2vec")

# Check word vector
print(d2v_vectors["klijent"])  
[-3.1600e-01 -3.4110e+00  1.2158e+01  3.7950e+00  6.1200e-01 -3.1000e-01
 -9.7000e-02 -5.0000e-02 -5.2000e-02 -9.4000e-01  3.5600e-01 -6.0400e-01
 -2.3700e-01  1.1600e-01 -4.5500e-01  1.6100e-01  2.2500e-01 -6.4700e-01
  5.4600e-01 -7.8000e-02  3.5500e-01  5.8000e-02 -3.0000e-02  3.3000e-01
 -1.5700e-01 -5.9700e-01  1.5000e-02  1.9600e-01  1.0000e-03  1.5800e-01
  4.3300e-01 -5.0000e-03 -3.0700e-01 -2.6000e-01 -5.2500e-01  7.4000e-02
 -2.7000e-02  1.8800e-01  5.6000e-02 -2.5200e-01  3.0700e-01 -4.3000e-02
  5.9000e-02 -6.6000e-02 -1.0000e-02  1.3900e-01  7.1000e-02 -4.2000e-02
 -3.2000e-02 -1.3100e-01  1.4000e-02 -8.9000e-02 -3.2200e-01 -6.2000e-02
 -1.0500e-01  1.0800e-01  1.6100e-01 -1.3600e-01 -1.5400e-01  4.0000e-02
 -5.1000e-02  1.1000e-02  2.6600e-01  3.0000e-03 -1.3800e-01  2.3400e-01
 -2.9300e-01  1.5500e-01  2.5600e-01  2.7200e-01  1.2600e-01  1.9000e-01
 -7.2000e-02  7.3000e-02  1.1700e-01 -1.1100e-01  5.9000e-02 -2.1100e-01
 -1.8700e-01 -2.0000e-03 -3.6000e-02 -2.0400e-01  3.1300e-01  1.1600e-01
  1.4800e-01  1.3000e-02  2.5200e-01  1.9700e-01 -6.7000e-02  4.5000e-02
  1.3100e-01 -8.0000e-03  5.9000e-02  3.0800e-01 -3.2200e-01 -5.3000e-02
 -1.5500e-01 -2.2100e-01 -7.6000e-02  1.3600e-01]
# Find most similar words
print(d2v_vectors.most_similar("klijent", topn=5))
[('interfejs', 0.9971136450767517), ('mušterija', 0.996911883354187), ('provajder', 0.9968076348304749), ('sugrađanin', 0.9967014789581299), ('komšija', 0.9965119361877441)]
Author
Ranka Stanković
Author
Jovana Rađenović
Computation
TESLA project


## Cit.
@inproceedings{stankovic-dict2vec,
  author    = {Ranka Stanković, Jovana Rađenović, Mihailo Škorić, Marko Putniković},
  title     = {Learning Word Embeddings using Lexical Resources and Corpora},
  booktitle   = {15th International Conference on Information Society and Technology, ISIST 2025, Kopaonik},
  year      = {2025},
  address = {Kopaonik, Belgrade}
  publisher = {SASA, Belgrade},
  url       = {https://doi.org/10.5281/zenodo.15093900}
}

Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA

This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for te-sla/SerbDict2vec

Base model

te-sla/Word2VecSr
Finetuned
(1)
this model

Dataset used to train te-sla/SerbDict2vec