mteb-pt/average_pt_nilc_word2vec_skip_s1000

This is an adaptation of pre-trained Portuguese Word2Vec Word Embeddings to a sentence-transformers model.

The original pre-trained word embeddings can be found at: http://nilc.icmc.usp.br/nilc/index.php/repositorio-de-word-embeddings-do-nilc.

This model maps sentences & paragraphs to a 1000 dimensional dense vector space and can be used for tasks like clustering or semantic search.

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('mteb-pt/average_pt_nilc_word2vec_skip_s1000')
embeddings = model.encode(sentences)
print(embeddings)

Evaluation Results

For an automated evaluation of this model, see the Portuguese MTEB Leaderboard: mteb-pt/leaderboard

Full Model Architecture

SentenceTransformer(
  (0): WordEmbeddings(
    (emb_layer): Embedding(929607, 1000)
  )
  (1): Pooling({'word_embedding_dimension': 1000, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Citing & Authors

@inproceedings{hartmann2017portuguese,
    title  = {Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks},
    author = {Hartmann, Nathan S and
              Fonseca, Erick R and
              Shulby, Christopher D and
              Treviso, Marcos V and
              Rodrigues, J{'{e}}ssica S and
              Alu{'{\i}}sio, Sandra Maria},
    year = {2017},
    publisher = {SBC},
    booktitle = {Brazilian Symposium in Information and Human Language Technology - STIL},
    url = {https://sol.sbc.org.br/index.php/stil/article/view/4008}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.