File size: 2,314 Bytes
0a064fd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98

# Turkish QNLI Model

I fine-tuned Turkish-Bert-Model for Question-Answering problem with Turkish version of SQuAD; TQuAD 
https://huggingface.co/dbmdz/bert-base-turkish-uncased

# Data: TQuAD
I used following TQuAD data set

https://github.com/TQuad/turkish-nlp-qa-dataset

I convert the dataset into transformers glue data format of QNLI by the following script
SQuAD -> QNLI

```
import argparse
import collections
import json
import numpy as np
import os
import re
import string
import sys

ff="dev-v0.1.json"
ff="train-v0.1.json"
dataset=json.load(open(ff))

i=0
for article in dataset['data']:
 title= article['title']
 for p in article['paragraphs']:
  context= p['context']
  for qa in p['qas']:
   answer= qa['answers'][0]['text']
   all_other_answers= list(set([e['answers'][0]['text'] for e in p['qas']]))
   all_other_answers.remove(answer)
   i=i+1
   print(i,qa['question'].replace(";",":") , answer.replace(";",":"),"entailment", sep="\t")
   for other in all_other_answers:
    i=i+1
    print(i,qa['question'].replace(";",":") , other.replace(";",":"),"not_entailment" ,sep="\t")
  
```


Under QNLI folder there are dev and test test
Training data looks like 
> 613     II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir?  filozof, kimyacı, astrolog ve çevirmen  not_entailment
> 614     II.Friedrich’in bilginler arasındaki en önemli şahsiyet olarak belirttiği kişi kimdir?  kişisel eğilimi ve özel temaslar nedeniyle      not_entailment
> 615     Michael Scotus’un mesleği nedir?        filozof, kimyacı, astrolog ve çevirmen  entailment
> 616     Michael Scotus’un mesleği nedir?        Palermo’ya      not_entailment





# Training

Training the model with following environment
```
export GLUE_DIR=./glue/glue_dataTR/QNLI
export TASK_NAME=QNLI
```

```
python3 run_glue.py \
  --model_type bert \
  --model_name_or_path dbmdz/bert-base-turkish-uncased\
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --data_dir $GLUE_DIR \
  --max_seq_length 128 \
  --per_gpu_train_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir /tmp/$TASK_NAME/

```


# Evaluation Results

==
| acc | 0.9124060613527165
| loss| 0.21582801340189717
==

> See all my model
> https://huggingface.co/savasy