DistilBERT Fine-tuned for Italian Sentiment Analysis on Tripadvisor Reviews

Model Overview

This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased specifically optimized for sentiment analysis of Italian-language Tripadvisor reviews. The model was trained on a dataset of 15,000 Italian Tripadvisor reviews to classify sentiment into positive, negative, or neutral categories.

Model Details

  • Base Model: distilbert-base-multilingual-cased
  • Fine-tuning Dataset: 15,000 Italian Tripadvisor reviews
  • Task: Sentiment Analysis (5-class classification)
  • Language: Italian
  • Model Type: Text Classification

Intended Uses & Limitations

Intended Use

This model is designed for:

  • Analyzing sentiment in Italian-language hotel/restaurant/attraction reviews
  • Classifying user feedback into positive/negative/neutral categories
  • Extracting insights from Italian tourism-related text data
  • Only for educational purposes

Limitations

  • Performance may degrade on reviews from domains outside tourism/hospitality
  • May not handle regional Italian dialects equally well
  • Trained on Tripadvisor-style reviews - may not generalize perfectly to other review formats
  • Maximum sequence length: 512 tokens

How to Use

You can use this model directly with the Hugging Face transformers library:

from transformers import pipeline

# Initialize sentiment analysis pipeline
sentiment_analyzer = pipeline(
    "text-classification",
    model="misterkigore/distilbert-italian-tripadvisor-sentiment",
    tokenizer="misterkilgore/distilbert-italian-tripadvisor-sentiment"
)

# Analyze a sample review
review = "L'hotel aveva una vista magnifica ma il servizio era terribile."
results = sentiment_analyzer(review)
print(results)

For more advanced usage:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "misterkilgore/distilbert-italian-tripadvisor-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
review = "Il ristorante offre una cucina eccellente con ingredienti di prima qualità."
inputs = tokenizer(review, return_tensors="pt", truncation=True, padding=True)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Interpret results
labels = ["very negative", "negative", "neutral", "positive", "very positive"]
scores = predictions[0].tolist()
for label, score in zip(labels, scores):
    print(f"{label}: {score:.4f}")

Training Details

  • Training Data: 15,000 Italian Tripadvisor reviews (balanced classes)
  • Epochs: 3-5 (optimal performance typically reached within this range)
  • Batch Size: 16 or 32 (depending on GPU memory)
  • Learning Rate: 2e-5 to 5e-5
  • Metrics: Accuracy, F1-score (macro-averaged)

Evaluation Results

Performance on held-out test set (exact metrics will vary based on your specific training):

Metric Score
Accuracy 0.73
F1 (macro) 0.70
Precision 0.69
Recall 0.73

License

wtfpl

Downloads last month
0
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for misterkilgore/distilbert-italian-tripadvisor-sentiment

Finetuned
(282)
this model