|
--- |
|
license: apache-2.0 |
|
tags: |
|
- question-answering |
|
- complexity-classification |
|
- distilbert |
|
datasets: |
|
- wesley7137/question_complexity_classification |
|
--- |
|
|
|
# question-complexity-classifier |
|
|
|
馃 Fine-tuned DistilBERT model for classifying question complexity (Simple vs Complex) |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Architecture:** DistilBERT base uncased |
|
- **Fine-tuned on:** Question Complexity Classification Dataset |
|
- **Language:** English |
|
- **License:** Apache 2.0 |
|
- **Max Sequence Length:** 128 tokens |
|
|
|
## Uses |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
classifier = pipeline( |
|
"text-classification", |
|
model="grahamaco/question-complexity-classifier", |
|
tokenizer="grahamaco/question-complexity-classifier", |
|
truncation=True, |
|
max_length=128 # Matches training config |
|
) |
|
|
|
result = classifier("Explain quantum computing in simple terms") |
|
# Output example: {'label': 'COMPLEX', 'score': 0.97} |
|
``` |
|
|
|
## Training Details |
|
|
|
- **Epochs:** 5 |
|
- **Batch Size:** 32 (global) |
|
- **Learning Rate:** 2e-5 |
|
- **Train/Val/Test Split:** 80/10/10 (stratified) |
|
- **Early Stopping:** Patience of 2 epochs |
|
|
|
## Evaluation Results |
|
|
|
| Metric | Value | |
|
|--------|-------| |
|
| Accuracy | 0.92 | |
|
| F1 Score | 0.91 | |
|
|
|
## Performance |
|
|
|
| Metric | Value | |
|
|--------|-------| |
|
| Inference Latency | 15.2ms (CPU) | |
|
| Throughput | 68.4 samples/sec (GPU) | |
|
|
|
## Ethical Considerations |
|
This model is intended for educational content classification only. Developers should: |
|
- Regularly audit performance across different question types |
|
- Monitor for unintended bias in complexity assessments |
|
- Provide human-review mechanisms for high-stakes classifications |
|
- Validate classifications against original context when used with RAG systems |