File size: 954 Bytes

---
library_name: transformers
tags: []
---

## Fine-tuned roberta-base for detecting paragraphs with eHRAF-assigned two-digit id '900'

## Description

This is a fine tuned roberta-base model for detecting whether paragraphs drawn from ethnographic source material classified under the main subject 'Language and Communication' is more specifically about '900'.

## Usage

The easiest way to use this model at inference time is with the HF pipelines API.

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="gptmurdock/classifier-900")
classifier("Example text to classify")
```

## Training data 

...

## Training procedure

...

We use a 60-20-20 train-val-test split, and fine-tuned roberta-base for 5 epochs (lr = 2e-5, batch size = 40).

## Evaluation 

Evals on the test set are reported below.

| Metric    | Value |
|-----------|-------|
| Precision | 98.0 |
| Recall    | 97.9  |
| F1   | 97.9  |