library_name: transformers | |
tags: [] | |
## Fine-tuned roberta-base for detecting paragraphs with eHRAF-assigned two-digit id '610' | |
## Description | |
This is a fine tuned roberta-base model for detecting whether paragraphs drawn from ethnographic source material classified under the main subject 'Marriage, Family, Kinship and Social Organization' is more specifically about '610'. | |
## Usage | |
The easiest way to use this model at inference time is with the HF pipelines API. | |
```python | |
from transformers import pipeline | |
classifier = pipeline("text-classification", model="gptmurdock/classifier-610") | |
classifier("Example text to classify") | |
``` | |
## Training data | |
... | |
## Training procedure | |
... | |
We use a 60-20-20 train-val-test split, and fine-tuned roberta-base for 5 epochs (lr = 2e-5, batch size = 40). | |
## Evaluation | |
Evals on the test set are reported below. | |
| Metric | Value | | |
|-----------|-------| | |
| Precision | 91.2 | | |
| Recall | 91.3 | | |
| F1 | 91.2 | | |