nguyenvulebinh commited on
Commit
3ce9c96
1 Parent(s): 05b18be

add description

Browse files
Files changed (1) hide show
  1. README.md +8 -4
README.md CHANGED
@@ -32,10 +32,12 @@ widget:
32
 
33
  This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset.
34
 
35
- Model | EM | F1 |
36
- |:---: |:---: |:---: |
37
- base | 76.43 | 84.16 |
38
- large | 77.32 | 85.46 |
 
 
39
 
40
  [MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
41
 
@@ -55,6 +57,7 @@ QA_input = {
55
  }
56
  res = nlp(QA_input)
57
  print('pipeline: {}'.format(res))
 
58
  ```
59
 
60
  - More accurate infer process ([**Using sum features strategy**](https://github.com/nguyenvulebinh/extractive-qa-mrc))
@@ -80,4 +83,5 @@ outputs = model(**inputs_ids)
80
  answer = extract_answer(inputs, outputs, tokenizer)
81
 
82
  print(answer)
 
83
  ```
 
32
 
33
  This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below using 10% of the Vietnamese dataset.
34
 
35
+
36
+ | Model | EM | F1 |
37
+ | ------------- | ------------- | ------------- |
38
+ | [base](https://huggingface.co/nguyenvulebinh/vi-mrc-base) | 76.43 | 84.16 |
39
+ | [large](https://huggingface.co/nguyenvulebinh/vi-mrc-large) | 77.32 | 85.46 |
40
+
41
 
42
  [MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
43
 
 
57
  }
58
  res = nlp(QA_input)
59
  print('pipeline: {}'.format(res))
60
+ #{'score': 0.5782045125961304, 'start': 45, 'end': 68, 'answer': 'xử lý ngôn ngữ tự nhiên'}
61
  ```
62
 
63
  - More accurate infer process ([**Using sum features strategy**](https://github.com/nguyenvulebinh/extractive-qa-mrc))
 
83
  answer = extract_answer(inputs, outputs, tokenizer)
84
 
85
  print(answer)
86
+ # answer: Google Developer Expert. Score start: 0.9926977753639221, Score end: 0.9909810423851013
87
  ```