Edit model card

SentiCSE

This is a RoBERTa-base model trained on MR dataset and finetuned for sentiment analysis with the Sentiment tasks. This model is suitable for English.

import torch
from scipy.spatial.distance import cosine
from transformers import AutoTokenizer, AutoModel


tokenizer = AutoTokenizer.from_pretrained("DILAB-HYU/SentiCSE")
model = AutoModel.from_pretrained("DILAB-HYU/SentiCSE")

# Tokenize input texts
texts = [
    "The food is delicious.",
    "The atmosphere of the restaurant is good.",
    "The food at the restaurant is devoid of flavor.",
    "The restaurant lacks a good ambiance."
]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# Get the embeddings
with torch.no_grad():
    embeddings = model(**inputs, output_hidden_states=True, return_dict=True).pooler_output

# Calculate cosine similarities
# Cosine similarities are in [-1, 1]. Higher means more similar
cosine_sim_0_1 = 1 - cosine(embeddings[0], embeddings[1])
cosine_sim_0_2 = 1 - cosine(embeddings[0], embeddings[2])
cosine_sim_0_3 = 1 - cosine(embeddings[0], embeddings[3])

print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[1], cosine_sim_0_1))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[2], cosine_sim_0_2))
print("Cosine similarity between \"%s\" and \"%s\" is: %.3f" % (texts[0], texts[3], cosine_sim_0_3))

Output:

Cosine similarity between "The food is delicious." and "The atmosphere of the restaurant is good." is: 0.942
Cosine similarity between "The food is delicious." and "The food at the restaurant is devoid of flavor." is: 0.703
Cosine similarity between "The food is delicious." and "The restaurant lacks a good ambiance." is: 0.656

BibTeX entry and citation info

Please cite the reference paper if you use this model.

@article{2024SentiCSE,
  title={SentiCSE: A Sentiment-aware Contrastive Sentence Embedding Framework with Sentiment-guided Textual Similarity},
  author={Kim, Jaemin and Na, Yohan and Kim, Kangmin and Lee, Sangrak and Chae, Dong-Kyu},
  journal={Proceedings of the 30th International Conference on Computational Linguistics (COLING)},
  year={2024},
}
Downloads last month
8

Datasets used to train DILAB-HYU/SentiCSE