patrickvonplaten
commited on
Commit
•
4a4fda7
1
Parent(s):
f56866a
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ license: apache-2.0
|
|
12 |
|
13 |
The model was pre-trained using T5's denoising objective on [C4](https://huggingface.co/datasets/c4), subsequently additionally pre-trained using [REALM](https://arxiv.org/pdf/2002.08909.pdf)'s salient span masking objective on [Wikipedia](https://huggingface.co/datasets/wikipedia), and finally fine-tuned on [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions).
|
14 |
|
15 |
-
**Note**: The model was fine-tuned on 90% of the train splits of [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions) for 20k steps.
|
16 |
|
17 |
Other community Checkpoints: [here](https://huggingface.co/models?search=ssm)
|
18 |
|
@@ -20,6 +20,17 @@ Paper: [How Much Knowledge Can You Pack
|
|
20 |
Into the Parameters of a Language Model?](https://arxiv.org/abs/1910.10683.pdf)
|
21 |
|
22 |
Authors: *Adam Roberts, Colin Raffel, Noam Shazeer*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
## Usage
|
24 |
|
25 |
The model can be used as follows for **closed book question answering**:
|
@@ -34,8 +45,6 @@ input_ids = t5_tok("When was Franklin D. Roosevelt born?", return_tensors="pt").
|
|
34 |
gen_output = t5_qa_model.generate(input_ids)[0]
|
35 |
|
36 |
print(t5_tok.decode(gen_output, skip_special_tokens=True))
|
37 |
-
|
38 |
-
# should give "On February 13, 1904" => not correct sadly.
|
39 |
```
|
40 |
|
41 |
## Abstract
|
|
|
12 |
|
13 |
The model was pre-trained using T5's denoising objective on [C4](https://huggingface.co/datasets/c4), subsequently additionally pre-trained using [REALM](https://arxiv.org/pdf/2002.08909.pdf)'s salient span masking objective on [Wikipedia](https://huggingface.co/datasets/wikipedia), and finally fine-tuned on [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions).
|
14 |
|
15 |
+
**Note**: The model was fine-tuned on 90% of the train splits of [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions) for 20k steps and validated on the held-out 10% of the train split.
|
16 |
|
17 |
Other community Checkpoints: [here](https://huggingface.co/models?search=ssm)
|
18 |
|
|
|
20 |
Into the Parameters of a Language Model?](https://arxiv.org/abs/1910.10683.pdf)
|
21 |
|
22 |
Authors: *Adam Roberts, Colin Raffel, Noam Shazeer*
|
23 |
+
|
24 |
+
|
25 |
+
## Results on Natural Questions - Test Set
|
26 |
+
|
27 |
+
|Id | link | Exact Match |
|
28 |
+
|---|---|---|
|
29 |
+
|**T5-large**|**https://huggingface.co/google/t5-large-ssm-nqo**|**29.0**|
|
30 |
+
|T5-xxl|https://huggingface.co/google/t5-xxl-ssm-nqo|35.2|
|
31 |
+
|T5-3b|https://huggingface.co/google/t5-3b-ssm-nqo|31.7|
|
32 |
+
|T5-11b|https://huggingface.co/google/t5-11b-ssm-nqo|34.8|
|
33 |
+
|
34 |
## Usage
|
35 |
|
36 |
The model can be used as follows for **closed book question answering**:
|
|
|
45 |
gen_output = t5_qa_model.generate(input_ids)[0]
|
46 |
|
47 |
print(t5_tok.decode(gen_output, skip_special_tokens=True))
|
|
|
|
|
48 |
```
|
49 |
|
50 |
## Abstract
|