metadata

license: apache-2.0
base_model: google/mt5-base
tags:
  - Question Answering
  - generated_from_trainer
metrics:
  - rouge
model-index:
  - name: mT5-base-turkish-qa
    results: []
language:
  - tr
pipeline_tag: text2text-generation
widget:
  - text: >-
      Soru: Nazım Hikmet ne zaman doğmuştur?

      Metin: Nâzım Hikmet, Mehmed Nâzım adıyla 15 Ocak 1902 tarihinde Selanik'te
      doğdu. O sırada Hariciye Nezareti memuru olarak Selanik'te çalışan Hikmet
      Bey, Nâzım'ın çocukluğunda memuriyetten ayrıldı ve ailesiyle birlikte,
      Halep'te bulunan babasının yanına gitti. Burada bulundukları sırada
      Hikmet-Celile çiftinin biri Ali İbrahim, diğeri Samiye adında iki çocuğu
      oldu, fakat Ali İbrahim dizanteriye yakalanıp öldü.
datasets:
  - ucsahin/TR-Extractive-QA-82K

mT5-base-turkish-qa

This model is a fine-tuned version of google/mt5-base on the ucsahin/TR-Extractive-QA-82K dataset. It achieves the following results on the evaluation set:

Loss: 0.5109
Rouge1: 79.3283
Rouge2: 68.0845
Rougel: 79.3474
Rougelsum: 79.2937

Model description

mT5-base model is trained with manually curated Turkish dataset consisting of 65K training samples with ("question", "answer", "context") triplets.

Intended uses & limitations

The intended use of the model is extractive question answering.

In order to use the inference widget, enter your input in the format:

Soru: question_text
Metin: context_text

Generated response by the model:

Cevap: answer_text

Use with Transformers:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from datasets import load_dataset

# Load the dataset
qa_tr_datasets = load_dataset("ucsahin/TR-Extractive-QA-82K")

# Load model and tokenizer
model_checkpoint = "ucsahin/mT5-base-turkish-qa"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)

inference_dataset = qa_tr_datasets["test"].select(range(10))

for input in inference_dataset:
    input_question = "Soru: " + input["question"]
    input_context = "Metin: " + input["context"]

    tokenized_inputs = tokenizer(input_question, input_context, max_length=512, truncation=True, return_tensors="pt")
    outputs = model.generate(input_ids=tokenized_inputs["input_ids"], max_new_tokens=32)
    output_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    print(f"Reference answer: {input['answer']}, Model Answer: {output_text}")

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
2.0454	0.13	500	0.6771	73.1040	59.8915	73.1819	73.0558
0.8012	0.26	1000	0.6012	76.3357	64.1967	76.3796	76.2688
0.7703	0.39	1500	0.5844	76.8932	65.2509	76.9932	76.9418
0.6783	0.51	2000	0.5587	76.7284	64.8453	76.7416	76.6720
0.6546	0.64	2500	0.5362	78.2261	66.5893	78.2515	78.2142
0.6289	0.77	3000	0.5133	78.6917	67.1534	78.6852	78.6319
0.6292	0.9	3500	0.5109	79.3283	68.0845	79.3474	79.2937

Framework versions

Transformers 4.36.2
Pytorch 2.1.0+cu118
Datasets 2.16.1
Tokenizers 0.15.0