DistilBERT Fine-tuned for Italian Sentiment Analysis on Tripadvisor Reviews
Model Overview
This model is a fine-tuned version of distilbert/distilbert-base-multilingual-cased
specifically optimized for sentiment analysis of Italian-language Tripadvisor reviews. The model was trained on a dataset of 15,000 Italian Tripadvisor reviews to classify sentiment into positive, negative, or neutral categories.
Model Details
- Base Model:
distilbert-base-multilingual-cased
- Fine-tuning Dataset: 15,000 Italian Tripadvisor reviews
- Task: Sentiment Analysis (5-class classification)
- Language: Italian
- Model Type: Text Classification
Intended Uses & Limitations
Intended Use
This model is designed for:
- Analyzing sentiment in Italian-language hotel/restaurant/attraction reviews
- Classifying user feedback into positive/negative/neutral categories
- Extracting insights from Italian tourism-related text data
- Only for educational purposes
Limitations
- Performance may degrade on reviews from domains outside tourism/hospitality
- May not handle regional Italian dialects equally well
- Trained on Tripadvisor-style reviews - may not generalize perfectly to other review formats
- Maximum sequence length: 512 tokens
How to Use
You can use this model directly with the Hugging Face transformers
library:
from transformers import pipeline
# Initialize sentiment analysis pipeline
sentiment_analyzer = pipeline(
"text-classification",
model="misterkigore/distilbert-italian-tripadvisor-sentiment",
tokenizer="misterkilgore/distilbert-italian-tripadvisor-sentiment"
)
# Analyze a sample review
review = "L'hotel aveva una vista magnifica ma il servizio era terribile."
results = sentiment_analyzer(review)
print(results)
For more advanced usage:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "misterkilgore/distilbert-italian-tripadvisor-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Prepare input
review = "Il ristorante offre una cucina eccellente con ingredienti di prima qualità."
inputs = tokenizer(review, return_tensors="pt", truncation=True, padding=True)
# Get predictions
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Interpret results
labels = ["very negative", "negative", "neutral", "positive", "very positive"]
scores = predictions[0].tolist()
for label, score in zip(labels, scores):
print(f"{label}: {score:.4f}")
Training Details
- Training Data: 15,000 Italian Tripadvisor reviews (balanced classes)
- Epochs: 3-5 (optimal performance typically reached within this range)
- Batch Size: 16 or 32 (depending on GPU memory)
- Learning Rate: 2e-5 to 5e-5
- Metrics: Accuracy, F1-score (macro-averaged)
Evaluation Results
Performance on held-out test set (exact metrics will vary based on your specific training):
Metric | Score |
---|---|
Accuracy | 0.73 |
F1 (macro) | 0.70 |
Precision | 0.69 |
Recall | 0.73 |
License
wtfpl
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support