nguyenvulebinh
commited on
Commit
•
732c309
1
Parent(s):
e89f4c3
add colab
Browse files
README.md
CHANGED
@@ -45,6 +45,7 @@ Public leaderboard | Private leaderboard
|
|
45 |
[MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
|
46 |
|
47 |
## Using pre-trained model
|
|
|
48 |
|
49 |
- Hugging Face pipeline style (**NOT using sum features strategy**).
|
50 |
|
@@ -70,8 +71,8 @@ from infer import tokenize_function, data_collator, extract_answer
|
|
70 |
from model.mrc_model import MRCQuestionAnswering
|
71 |
from transformers import AutoTokenizer
|
72 |
|
73 |
-
|
74 |
-
model_checkpoint = "nguyenvulebinh/vi-mrc-base"
|
75 |
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
|
76 |
model = MRCQuestionAnswering.from_pretrained(model_checkpoint)
|
77 |
|
|
|
45 |
[MRCQuestionAnswering](https://github.com/nguyenvulebinh/extractive-qa-mrc) using [XLM-RoBERTa](https://huggingface.co/transformers/model_doc/xlmroberta.html) as a pre-trained language model. By default, XLM-RoBERTa will split word in to sub-words. But in my implementation, I re-combine sub-words representation (after encoded by BERT layer) into word representation using sum strategy.
|
46 |
|
47 |
## Using pre-trained model
|
48 |
+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Yqgdfaca7L94OyQVnq5iQq8wRTFvVZjv?usp=sharing)
|
49 |
|
50 |
- Hugging Face pipeline style (**NOT using sum features strategy**).
|
51 |
|
|
|
71 |
from model.mrc_model import MRCQuestionAnswering
|
72 |
from transformers import AutoTokenizer
|
73 |
|
74 |
+
model_checkpoint = "nguyenvulebinh/vi-mrc-large"
|
75 |
+
#model_checkpoint = "nguyenvulebinh/vi-mrc-base"
|
76 |
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
|
77 |
model = MRCQuestionAnswering.from_pretrained(model_checkpoint)
|
78 |
|