patrickvonplaten commited on
Commit
4a4fda7
1 Parent(s): f56866a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -12,7 +12,7 @@ license: apache-2.0
12
 
13
  The model was pre-trained using T5's denoising objective on [C4](https://huggingface.co/datasets/c4), subsequently additionally pre-trained using [REALM](https://arxiv.org/pdf/2002.08909.pdf)'s salient span masking objective on [Wikipedia](https://huggingface.co/datasets/wikipedia), and finally fine-tuned on [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions).
14
 
15
- **Note**: The model was fine-tuned on 90% of the train splits of [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions) for 20k steps.
16
 
17
  Other community Checkpoints: [here](https://huggingface.co/models?search=ssm)
18
 
@@ -20,6 +20,17 @@ Paper: [How Much Knowledge Can You Pack
20
  Into the Parameters of a Language Model?](https://arxiv.org/abs/1910.10683.pdf)
21
 
22
  Authors: *Adam Roberts, Colin Raffel, Noam Shazeer*
 
 
 
 
 
 
 
 
 
 
 
23
  ## Usage
24
 
25
  The model can be used as follows for **closed book question answering**:
@@ -34,8 +45,6 @@ input_ids = t5_tok("When was Franklin D. Roosevelt born?", return_tensors="pt").
34
  gen_output = t5_qa_model.generate(input_ids)[0]
35
 
36
  print(t5_tok.decode(gen_output, skip_special_tokens=True))
37
-
38
- # should give "On February 13, 1904" => not correct sadly.
39
  ```
40
 
41
  ## Abstract
12
 
13
  The model was pre-trained using T5's denoising objective on [C4](https://huggingface.co/datasets/c4), subsequently additionally pre-trained using [REALM](https://arxiv.org/pdf/2002.08909.pdf)'s salient span masking objective on [Wikipedia](https://huggingface.co/datasets/wikipedia), and finally fine-tuned on [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions).
14
 
15
+ **Note**: The model was fine-tuned on 90% of the train splits of [Natural Questions (NQ)](https://huggingface.co/datasets/natural_questions) for 20k steps and validated on the held-out 10% of the train split.
16
 
17
  Other community Checkpoints: [here](https://huggingface.co/models?search=ssm)
18
 
20
  Into the Parameters of a Language Model?](https://arxiv.org/abs/1910.10683.pdf)
21
 
22
  Authors: *Adam Roberts, Colin Raffel, Noam Shazeer*
23
+
24
+
25
+ ## Results on Natural Questions - Test Set
26
+
27
+ |Id | link | Exact Match |
28
+ |---|---|---|
29
+ |**T5-large**|**https://huggingface.co/google/t5-large-ssm-nqo**|**29.0**|
30
+ |T5-xxl|https://huggingface.co/google/t5-xxl-ssm-nqo|35.2|
31
+ |T5-3b|https://huggingface.co/google/t5-3b-ssm-nqo|31.7|
32
+ |T5-11b|https://huggingface.co/google/t5-11b-ssm-nqo|34.8|
33
+
34
  ## Usage
35
 
36
  The model can be used as follows for **closed book question answering**:
45
  gen_output = t5_qa_model.generate(input_ids)[0]
46
 
47
  print(t5_tok.decode(gen_output, skip_special_tokens=True))
 
 
48
  ```
49
 
50
  ## Abstract