clarin-pl
/

word2vec-kgr10

Model card Files Files and versions Community

KGR10 word2vec Polish word embeddings

Distributional language models for Polish trained on the KGR10 corpora.

Models

In the repository you can find two selected models, that were selected after evaluation (see table below). A model that performed the best is the default model/config (see default_config.json).

method	dimension	hs	mwe
cbow	300	false	true	<-- default
skipgram	300	true	true

Usage

To use these embedding models easily, it is required to install embeddings.

pip install clarinpl-embeddings

Utilising the default model (the easiest way)

Word embedding:

from embeddings.embedding.auto_flair import AutoFlairWordEmbedding
from flair.data import Sentence

sentence = Sentence("Myśl z duszy leci bystro, Nim się w słowach złamie.")

embedding = AutoFlairWordEmbedding.from_hub("clarin-pl/word2vec-kgr10")
embedding.embed([sentence])

for token in sentence:
    print(token)
    print(token.embedding)

Document embedding (averaged over words):

from embeddings.embedding.auto_flair import AutoFlairDocumentEmbedding
from flair.data import Sentence

sentence = Sentence("Myśl z duszy leci bystro, Nim się w słowach złamie.")

embedding = AutoFlairDocumentEmbedding.from_hub("clarin-pl/word2vec-kgr10")
embedding.embed([sentence])

print(sentence.embedding)

Customisable way

Word embedding:

from embeddings.embedding.static.embedding import AutoStaticWordEmbedding
from embeddings.embedding.static.word2vec import KGR10Word2VecConfig
from flair.data import Sentence

config = KGR10Word2VecConfig(method='skipgram', hs=False)
embedding = AutoStaticWordEmbedding.from_config(config)

sentence = Sentence("Myśl z duszy leci bystro, Nim się w słowach złamie.")
embedding.embed([sentence])

for token in sentence:
    print(token)
    print(token.embedding)

Document embedding (averaged over words):

from embeddings.embedding.static.embedding import AutoStaticDocumentEmbedding
from embeddings.embedding.static.word2vec import KGR10Word2VecConfig
from flair.data import Sentence

config = KGR10Word2VecConfig(method='skipgram', hs=False)
embedding = AutoStaticDocumentEmbedding.from_config(config)

sentence = Sentence("Myśl z duszy leci bystro, Nim się w słowach złamie.")
embedding.embed([sentence])

print(sentence.embedding)

Citation

Piasecki, Maciej; Janz, Arkadiusz; Kaszewski, Dominik; et al., 2017,  Word Embeddings for Polish, CLARIN-PL digital repository, http://hdl.handle.net/11321/442.

or

@misc{11321/442,	
 title = {Word Embeddings for Polish},	
 author = {Piasecki, Maciej and Janz, Arkadiusz and Kaszewski, Dominik and Czachor, Gabriela},	
 url = {http://hdl.handle.net/11321/442},	
 note = {{CLARIN}-{PL} digital repository},	
 copyright = {{GNU} {GPL3}},	
 year = {2017}	
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using clarin-pl/word2vec-kgr10 1