xtremedistil-l6-h256-uncased fine-tuned on SQuAD

This model was developed as part of a project for the Deep Learning for NLP (DL4NLP) lecture at Technische Universität Darmstadt (2022). It uses xtremedistil-l6-h256-uncased as a base model and was fine-tuned on the SQuAD dataset for Question Answering. It makes no distinction between uppercase and lowercase words.

Dataset

As mentioned previously, the SQuAD dataset used to train and evaluate the model. It was downloaded from GitHub and is divided into the following splits.

Split	Number of examples
Training	86 588
Evaluation	10 507

The following script was used to download, prepare and load the dataset so that it could be appropriately used by the model. Although it was not directly downloaded from Hugging Face, the dataset was formatted the exactly same way as the version available on Hugging Face.

dataset_directory = 'dataset'
train_file = 'train.json'
dev_file = 'dev.json'

if not os.path.exists(dataset_directory):
   print('Creating dataset directory\n')
   os.makedirs(dataset_directory)

# download train and dev splits from the dataset
!wget https://s3.us-east-2.amazonaws.com/mrqa/release/v2/train/SQuAD.jsonl.gz -O dataset/train.jsonl.gz
!wget https://s3.us-east-2.amazonaws.com/mrqa/release/v2/dev/SQuAD.jsonl.gz -O dataset/dev.jsonl.gz

# unpack the files
!gzip -d dataset/train.jsonl.gz
!gzip -d dataset/dev.jsonl.gz

def prepare_data(dir, file_name):
   data = []
   with open(f'{dir}/{file_name}l', 'r') as f:
       # skip header
       next(f)
       for line in f:
           entry = json.loads(line)
           for qas in entry['qas']:
               answer_start = []
               for answer in qas['detected_answers']:
                   answer_start.append(answer['char_spans'][0][0])
               
               data.append({
                   'id': qas['id'],
                   'context': entry['context'],
                   'question': qas['question'],
                   'answers': {
                       'text': qas['answers'],
                       'answer_start': answer_start
                   }
               })

   with open(f'{dir}/{file_name}', 'w') as f:
       for entry in data:
           json.dump(entry, f)
           f.write('\n')
 
   os.remove(f'{dir}/{file_name}l')

prepare_data(dataset_directory, train_file)
prepare_data(dataset_directory, dev_file)

data_files = {'train': train_file, 'validation': dev_file}
dataset = load_dataset(dataset_directory, data_files=data_files)

Hyperparameters

The hyperparameters utilized to fine-tune the model are listed below.

epochs: 2
train_batch_size: 16
eval_batch_size: 32
optimizer: adamW
- lr: 5e-5
- weight_decay: 0.01
lr_scheduler: linear
- num_warmup_steps: 0
max_length: 512

Fine-Tuning and Evaluation

Most of the code used to pre-process the dataset, define a training loop and post-process the predictions generated by the model was adapated from the Question Answering course from Hugging Face.

The model was fine-tuned using GPU acceleration on Google Colab. The entire training and evaluation process took approximately 1h10min. More specifically, for each epoch, the training step was completed in 17-18 minutes, while the evaluation lasted for about 16-18 minutes.

After fine-tuning, the following results were achieved on the evaluation set (using the squad metric):

Metric	Value
Exact Match (EM)	61.91110688112687
F1-Score	77.2232806051733

DL4NLP-Group11
/

xtremedistil-l6-h256-uncased-squad

xtremedistil-l6-h256-uncased fine-tuned on SQuAD

Dataset

Hyperparameters

Fine-Tuning and Evaluation

Dataset used to train DL4NLP-Group11/xtremedistil-l6-h256-uncased-squad