nguyenvulebinh
commited on
Commit
•
4f637e4
1
Parent(s):
ce1c777
Update README.md
Browse files
README.md
CHANGED
@@ -27,16 +27,16 @@ widget:
|
|
27 |
- Dataset (combine English and Vietnamese):
|
28 |
- [Squad 2.0](https://rajpurkar.github.io/SQuAD-explorer/)
|
29 |
- [mailong25](https://github.com/mailong25/bert-vietnamese-question-answering/tree/master/dataset)
|
30 |
-
- [
|
31 |
- [MultiLingual Question Answering](https://github.com/facebookresearch/MLQA)
|
32 |
|
33 |
-
This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below
|
34 |
|
35 |
|
36 |
| Model | EM | F1 |
|
37 |
| ------------- | ------------- | ------------- |
|
38 |
-
| [
|
39 |
-
| [large](https://huggingface.co/nguyenvulebinh/vi-mrc-large) |
|
40 |
|
41 |
|
42 |
[MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
|
|
|
27 |
- Dataset (combine English and Vietnamese):
|
28 |
- [Squad 2.0](https://rajpurkar.github.io/SQuAD-explorer/)
|
29 |
- [mailong25](https://github.com/mailong25/bert-vietnamese-question-answering/tree/master/dataset)
|
30 |
+
- [VLSP MRC 2021](https://vlsp.org.vn/vlsp2021/eval/mrc)
|
31 |
- [MultiLingual Question Answering](https://github.com/facebookresearch/MLQA)
|
32 |
|
33 |
+
This model is intended to be used for QA in the Vietnamese language so the valid set is Vietnamese only (but English works fine). The evaluation result below uses the VLSP MRC 2021 test set. This experiment achieves TOP 1 on the leaderboard.
|
34 |
|
35 |
|
36 |
| Model | EM | F1 |
|
37 |
| ------------- | ------------- | ------------- |
|
38 |
+
| [large](https://huggingface.co/nguyenvulebinh/vi-mrc-large) public_test_set | 85.847 | 83.826 |
|
39 |
+
| [large](https://huggingface.co/nguyenvulebinh/vi-mrc-large) private_test_set | 82.072 | 78.071 |
|
40 |
|
41 |
|
42 |
[MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
|