Edit model card

Satoken

This is a SetFit model trained on multilingual datasets (mentioned below) for Sentiment classification.

The model has been trained using an efficient few-shot learning technique that involves:

  1. Fine-tuning a Sentence Transformer with contrastive learning.
  2. Training a classification head with features from the fine-tuned Sentence Transformer.

It is utilized by Germla for it's feedback analysis tool. (specifically the Sentiment analysis feature)

For other models (specific language-basis) check here

Usage

To use this model for inference, first install the SetFit library:

python -m pip install setfit

You can then run inference as follows:

from setfit import SetFitModel

# Download from Hub and run inference
model = SetFitModel.from_pretrained("germla/satoken")
# Run inference
preds = model(["i loved the spiderman movie!", "pineapple on pizza is the worst 🤮"])

Training Details

Training Data

Training Procedure

We made sure to have a balanced dataset. The model was trained on only 35% (50% for chinese) of the train split of all datasets.

Preprocessing

  • Basic Cleaning (removal of dups, links, mentions, hashtags, etc.)
  • Removal of stopwords using nltk

Speeds, Sizes, Times

The training procedure took 6hours on the NVIDIA T4 GPU.

Evaluation

Testing Data, Factors & Metrics

Environmental Impact

  • Hardware Type: NVIDIA T4 GPU
  • Hours used: 6
  • Cloud Provider: Amazon Web Services
  • Compute Region: ap-south-1 (Mumbai)
  • Carbon Emitted: 0.39 kg co2 eq.
Downloads last month
38
Inference Examples
Inference API (serverless) does not yet support sentence-transformers models for this pipeline type.

Dataset used to train germla/satoken

Evaluation results