Update encoder model card

3df0af0 verified 3 days ago

1.57 kB

language: en
license: mit
tags:
  - sentiment-analysis
  - text-classification
  - encoder
library_name: sentimentizer
task: text-classification

Sentimentizer ENCODER Sentiment Model

Description

A Transformer Encoder for sentiment classification built on pre-trained GloVe embeddings. The model uses multi-head self-attention with positional encodings and a classification token (CLS) to produce a sentiment score.

Training Data

Trained on the Yelp Open Dataset reviews, with GloVe Wiki-Gigaword-100 pre-trained embeddings. Reviews are tokenized with a custom dictionary (20k vocab, min frequency 3) and padded/truncated to 200 tokens.

Usage

from sentimentizer.hf import download_weights
from sentimentizer.config import DriverConfig, weights_path_for

# Download weights + dictionary from Hugging Face Hub
weights_path = weights_path_for("encoder")
download_weights(
    "encoder",
    weights_path,
    dict_path=DriverConfig.files.dictionary_file_path,
)

# Load and run inference
from sentimentizer.models.encoder import get_trained_model
from sentimentizer.tokenizer import get_trained_tokenizer

model = get_trained_model(device="cpu")
tokenizer = get_trained_tokenizer()

import numpy as np
token_ids = tokenizer.tokenize_text("amazing food great service")
score = model.predict(token_ids)
print(f'Sentiment score: {score.item():.4f}')  # >0.5 = positive, <0.5 = negative

Files

encoder_weights.pth — Model state dictionary
yelp.dictionary — Gensim dictionary for tokenization