---
language: "english"
license: "mit"
datasets:
- race
metrics:
- accuracy
---

# Roberta Large Fine Tuned on RACE

## Model description

This model is a fine-tuned model of Roberta-large applied on RACE

#### How to use

```python

import datasets
from transformers import RobertaTokenizer
from transformers import  RobertaForMultipleChoice

tokenizer = RobertaTokenizer.from_pretrained(
"LIAMF-USP/roberta-large-finetuned-race")
model = RobertaForMultipleChoice.from_pretrained(
"LIAMF-USP/roberta-large-finetuned-race")
dataset = datasets.load_dataset(
    "race",
    "all",
    split=["train", "validation", "test"],
)training_examples = dataset[0]
evaluation_examples = dataset[1]
test_examples = dataset[2]

example=training_examples[0] 
example_id = example["example_id"]
question = example["question"]
context = example["article"]
options = example["options"]
label_example = example["answer"]
label_map = {label: i 
    for i, label in enumerate(["A", "B", "C", "D"])}
choices_inputs = []
for ending_idx, (_, ending) in enumerate(
                                zip(context, options)):
    if question.find("_") != -1:
        # fill in the banks questions
        question_option = question.replace("_", ending)
    else:
        question_option = question + " " + ending
    inputs = tokenizer(
        context,
        question_option,
        add_special_tokens=True,
        max_length=MAX_SEQ_LENGTH,
        padding="max_length",
        truncation=True,
        return_overflowing_tokens=False,
    )    
label = label_map[label_example]
input_ids = [x["input_ids"] for x in choices_inputs]
attention_mask = (
    [x["attention_mask"] for x in choices_inputs]
     # as the senteces follow the same structure, 
     #just one of them is necessary to check
    if "attention_mask" in choices_inputs[0]
    else None
)
example_encoded = {
    "example_id": example_id,
    "input_ids": input_ids,
    "attention_mask": attention_mask,
    "label": label,
}
output = model(**example_encoded)
```


## Training data

The initial model was [roberta large model](https://huggingface.co/roberta-large) which was then fine-tuned on [RACE dataset](https://www.cs.cmu.edu/~glai1/data/race/)

## Training procedure

It was necessary to preprocess the data with a method that is exemplified for a single instance in the _How to use_ section. The used hyperparameters were the following:

| Hyperparameter | Value |
|:----:|:----:|
| adam_beta1                  | 0.9      |
| adam_beta2                  | 0.98     |
| adam_epsilon                | 1.000e-8 |
| eval_batch_size             | 32       |
| train_batch_size            | 1        |
| fp16                        | True     |
| gradient_accumulation_steps | 16       |
| learning_rate               | 0.00001  |
| warmup_steps                | 1000     |
| max_length                  | 512      |
| epochs                      | 4        |

## Eval results:
| Dataset Acc | Eval | All Test |High School Test |Middle School Test |
|:----:|:----:|:----:|:----:|:----:|
|      | 85.2 | 84.9|83.5|88.0|

**The model was trained with a Tesla V100-PCIE-16GB**