dudoxu commited on
Commit
0a2d0bd
1 Parent(s): e264502

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -13
README.md CHANGED
@@ -5,12 +5,20 @@ language: [de, en]
5
 
6
  # Bilingual English + German SQuAD2.0
7
 
8
- We created German Squad 2.0 (deQuAD) and merged with [**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) into an English and German training data for question answering. The [**bert-base-multilingual-cased**](https://github.com/google-research/bert/blob/master/multilingual.md) is used to fine-tune bilingual QA downstream task.
9
 
10
- # Details of deQuAD 2.0
11
- [**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) was auto-translated into German. We hired professional editors to proofread the translated transcripts, correct mistakes and double check the answers to further polish the text and enhance annotation quality. The final German dataset contains **130k** training and **11k** test samples.
12
 
13
- Evaluation on English SQuAD2.0
 
 
 
 
 
 
 
 
14
 
15
  ```
16
  HasAns_exact = 85.79622132253711
@@ -23,7 +31,19 @@ exact = 90.28889076054915
23
  f1 = 92.84713483219753
24
  total = 11873
25
  ```
 
26
 
 
 
 
 
 
 
 
 
 
 
 
27
  ## Use Model in Pipeline
28
 
29
 
@@ -36,19 +56,24 @@ qa_pipeline = pipeline(
36
  tokenizer="deutsche-telekom/bert-multi-english-german-squad2"
37
  )
38
 
39
- qa_pipeline({
40
- 'context': " ",
41
- 'question': " "})
 
 
 
42
 
43
  ```
44
 
45
  # Output:
46
 
47
  ```json
48
- {
49
- "score": 0.83,
50
- "start": 0,
51
- "end": 9,
52
- "answer": " "
53
- }
 
 
54
  ```
 
5
 
6
  # Bilingual English + German SQuAD2.0
7
 
8
+ We created German Squad 2.0 (**deQuAD 2.0**) and merged with [**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) into an English and German training data for question answering. The [**bert-base-multilingual-cased**](https://github.com/google-research/bert/blob/master/multilingual.md) is used to fine-tune bilingual QA downstream task.
9
 
10
+ ## Details of deQuAD 2.0
11
+ [**SQuAD2.0**](https://rajpurkar.github.io/SQuAD-explorer/) was auto-translated into German. We hired professional editors to proofread the translated transcripts, correct mistakes and double check the answers to further polish the text and enhance annotation quality. The final German deQuAD dataset contains **130k** training and **11k** test samples.
12
 
13
+ ## Overview
14
+ - **Language model:** bert-base-multilingual-cased
15
+ - **Language:** German, Enlgish
16
+ - **Training data:** deQuAD2.0 + SQuAD2.0 training set
17
+ - **Evaluation data:** SQuAD2.0 test set; deQuAD2.0 test set
18
+ - **Infrastructure:** 8xV100 GPU
19
+ - **Published**: July 9th, 2021
20
+
21
+ ## Evaluation on English SQuAD2.0
22
 
23
  ```
24
  HasAns_exact = 85.79622132253711
 
31
  f1 = 92.84713483219753
32
  total = 11873
33
  ```
34
+ ## Evaluation on German deQuAD2.0
35
 
36
+ ```
37
+ HasAns_exact = 63.80526406330638
38
+ HasAns_f1 = 72.47269140789888
39
+ HasAns_total = 5813
40
+ NoAns_exact = 82.0291893792861
41
+ NoAns_f1 = 82.0291893792861
42
+ NoAns_total = 5687
43
+ exact = 72.81739130434782
44
+ f1 = 77.19858740470603
45
+ total = 11500
46
+ ```
47
  ## Use Model in Pipeline
48
 
49
 
 
56
  tokenizer="deutsche-telekom/bert-multi-english-german-squad2"
57
  )
58
 
59
+ contexts = ["Die Allianz Arena ist ein Fußballstadion im Norden von München und bietet bei Bundesligaspielen 75.021 Plätze, zusammengesetzt aus 57.343 Sitzplätzen, 13.794 Stehplätzen, 1.374 Logenplätzen, 2.152 Business Seats und 966 Sponsorenplätzen. In der Allianz Arena bestreitet der FC Bayern München seit der Saison 2005/06 seine Heimspiele. Bis zum Saisonende 2017 war die Allianz Arena auch Spielstätte des TSV 1860 München.",
60
+ "Harvard is a large, highly residential research university. It operates several arts, cultural, and scientific museums, alongside the Harvard Library, which is the world's largest academic and private library system, comprising 79 individual libraries with over 18 million volumes. "]
61
+ questions = ["Wo befindet sich die Allianz Arena?",
62
+ "What is the worlds largest academic and private library system?"]
63
+
64
+ qa_pipeline(context=contexts, question=questions)
65
 
66
  ```
67
 
68
  # Output:
69
 
70
  ```json
71
+ [{'score': 0.7290093898773193,
72
+ 'start': 44,
73
+ 'end': 62,
74
+ 'answer': 'Norden von München'},
75
+ {'score': 0.7979822754859924,
76
+ 'start': 134,
77
+ 'end': 149,
78
+ 'answer': 'Harvard Library'}]
79
  ```