File size: 954 Bytes
99482e3
 
 
 
 
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
 
99482e3
38b18a0
 
 
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
99482e3
38b18a0
 
 
 
 
99482e3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
library_name: transformers
tags: []
---

## Fine-tuned roberta-base for detecting paragraphs with eHRAF-assigned two-digit id '900'

## Description

This is a fine tuned roberta-base model for detecting whether paragraphs drawn from ethnographic source material classified under the main subject 'Language and Communication' is more specifically about '900'.

## Usage

The easiest way to use this model at inference time is with the HF pipelines API.

```python
from transformers import pipeline

classifier = pipeline("text-classification", model="gptmurdock/classifier-900")
classifier("Example text to classify")
```

## Training data 

...

## Training procedure

...

We use a 60-20-20 train-val-test split, and fine-tuned roberta-base for 5 epochs (lr = 2e-5, batch size = 40).

## Evaluation 

Evals on the test set are reported below.

| Metric    | Value |
|-----------|-------|
| Precision | 98.0 |
| Recall    | 97.9  |
| F1   | 97.9  |