Text Classification
Transformers
Safetensors
English
German
Russian
distilbert
dialogue-act-classification
multilingual
conversational-ai
asr
text-embeddings-inference
Instructions to use WSHAPER/distilbert-multilingual-dialogue-act-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use WSHAPER/distilbert-multilingual-dialogue-act-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="WSHAPER/distilbert-multilingual-dialogue-act-classifier")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("WSHAPER/distilbert-multilingual-dialogue-act-classifier") model = AutoModelForSequenceClassification.from_pretrained("WSHAPER/distilbert-multilingual-dialogue-act-classifier") - Notebooks
- Google Colab
- Kaggle
distilbert-multilingual-dialogue-act-classifier
Fine-tuned DistilBERT (distilbert-base-multilingual-cased) for 4-class dialogue act classification in English, German, and Russian. Trained on conversational dialogue data, optimized for ASR transcripts.
Labels
| Index | Label | Description |
|---|---|---|
| 0 | commissive | Promises, commitments ("I'll handle it.") |
| 1 | directive | Commands, requests ("Send the report.") |
| 2 | inform | Statements, facts ("The deadline is Friday.") |
| 3 | question | Questions, inquiries ("What is the timeline?") |
Evaluation
Per-language performance on held-out test sets:
| Language | Test Set | Accuracy | F1 Macro |
|---|---|---|---|
| English | SILICONE dyda_da | 80.8% | 0.725 |
| English | XDailyDialog | 82.5% | 0.750 |
| German | XDailyDialog | 81.8% | 0.738 |
| Russian | xdailydialog-ru | 81.7% | 0.734 |
Edge-case test suite (ASR disfluent input, conversational): 77.8% (35/45)
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model = AutoModelForSequenceClassification.from_pretrained("WSHAPER/distilbert-multilingual-dialogue-act-classifier")
tokenizer = AutoTokenizer.from_pretrained("WSHAPER/distilbert-multilingual-dialogue-act-classifier")
texts = ["What is the timeline?", "Send the report.", "The meeting went well."]
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
preds = torch.argmax(probs, dim=-1)
labels = ["commissive", "directive", "inform", "question"]
for text, pred, prob in zip(texts, preds, probs):
print(f"{text} โ {labels[pred]} ({prob[pred]:.2f})")
Training Details
- Base model:
distilbert-base-multilingual-cased(277M params) - Training data:
- XDailyDialog โ EN, DE, IT (~249K utterances)
- WSHAPER/xdailydialog-ru โ RU (~82K utterances)
- Total: ~331K utterances across 4 languages
- Hyperparameters: 5 epochs, batch 32, lr 2e-5, warmup 10%
- Hardware: NVIDIA RTX A3000 12GB, ~1.5 hours
Rust Inference (candle-transformers)
This model is compatible with candle-transformers for pure Rust inference:
// Loads model.safetensors + tokenizer.json directly
let config = DistilBertConfig::from_file("config.json");
let bert = BertModel::load(vb.pp("distilbert"), &config)?;
let classifier = candle_nn::linear(config.hidden_size, 4, vb.pp("classifier"))?;
Links
- GitHub: WSHAPER/dialogue-act-classifier โ training code, evaluation scripts, export tools
- Russian dataset: WSHAPER/xdailydialog-ru โ Russian translation of XDailyDialog
License
Apache-2.0
- Downloads last month
- 25