mteb-pt/average_pt_nilc_word2vec_skip_s1000

This is an adaptation of pre-trained Portuguese Word2Vec Word Embeddings to a sentence-transformers model.

The original pre-trained word embeddings can be found at: http://nilc.icmc.usp.br/nilc/index.php/repositorio-de-word-embeddings-do-nilc.

This model maps sentences & paragraphs to a 1000 dimensional dense vector space and can be used for tasks like clustering or semantic search.

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('mteb-pt/average_pt_nilc_word2vec_skip_s1000')
embeddings = model.encode(sentences)
print(embeddings)

Evaluation Results

For an automated evaluation of this model, see the Portuguese MTEB Leaderboard: mteb-pt/leaderboard

Full Model Architecture

SentenceTransformer(
  (0): WordEmbeddings(
    (emb_layer): Embedding(929607, 1000)
  )
  (1): Pooling({'word_embedding_dimension': 1000, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Citing & Authors

@inproceedings{hartmann2017portuguese,
    title  = {Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks},
    author = {Hartmann, Nathan S and
              Fonseca, Erick R and
              Shulby, Christopher D and
              Treviso, Marcos V and
              Rodrigues, J{'{e}}ssica S and
              Alu{'{\i}}sio, Sandra Maria},
    year = {2017},
    publisher = {SBC},
    booktitle = {Brazilian Symposium in Information and Human Language Technology - STIL},
    url = {https://sol.sbc.org.br/index.php/stil/article/view/4008}
}