Text2Text Generation
Transformers
PyTorch
5 languages
t5
flan-ul2
Inference Endpoints
text-generation-inference
ybelkada commited on
Commit
75a0ff3
1 Parent(s): b5c5730

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -4
README.md CHANGED
@@ -60,11 +60,15 @@ datasets:
60
  license: apache-2.0
61
  ---
62
 
63
- # TL;DR FLan-UL2
 
 
 
 
64
  Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
65
  and dataset collection.
66
 
67
- According ot the original [blog]() here are the notable improvements:
68
  - The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
69
  - The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
70
  - The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
@@ -86,10 +90,13 @@ The reported results are the following :
86
 
87
  # Using the model
88
 
 
 
89
  ```python
 
90
  from transformers import AutoModelForConditionalGeneration, AutoTokenizer
91
  import torch
92
- model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto", load_in_8bits = True)
93
  tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
94
 
95
  input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
@@ -99,7 +106,24 @@ outputs = model.generate(inputs, max_length=200)
99
 
100
  print(tokenizer.decode(outputs[0]))
101
  # <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
 
 
 
102
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
  ```
104
 
105
 
@@ -193,7 +217,7 @@ In total, the model was trained for 2.65 million steps.
193
 
194
  ## Contribution
195
 
196
- This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
197
 
198
  ## Examples
199
 
 
60
  license: apache-2.0
61
  ---
62
 
63
+
64
+ # Model card for FLan-UL2
65
+
66
+ ![model image](https://raw.githubusercontent.com/google-research/google-research/master/ul2/figs/ul2.png)
67
+
68
  Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2) released earlier last year. It was fine tuned using the "Flan" prompt tuning
69
  and dataset collection.
70
 
71
+ According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
72
  - The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
73
  - The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
74
  - The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
 
90
 
91
  # Using the model
92
 
93
+ For more efficient memory usage, we advise you to load the model in `8bit` using `load_in_8bit` flag as follows:
94
+
95
  ```python
96
+ # pip install accelerate transformers bitsandbytes
97
  from transformers import AutoModelForConditionalGeneration, AutoTokenizer
98
  import torch
99
+ model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto", load_in_8bit=True)
100
  tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
101
 
102
  input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
 
106
 
107
  print(tokenizer.decode(outputs[0]))
108
  # <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
109
+ ```
110
+
111
+ Otherwise, you can load and run the model in `bfloat16` as follows:
112
 
113
+ ```python
114
+ # pip install accelerate transformers
115
+ from transformers import AutoModelForConditionalGeneration, AutoTokenizer
116
+ import torch
117
+ model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", torch_dtype=torch.bfloat16, device_map="auto")
118
+ tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
119
+
120
+ input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
121
+
122
+ inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
123
+ outputs = model.generate(inputs, max_length=200)
124
+
125
+ print(tokenizer.decode(outputs[0]))
126
+ # <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
127
  ```
128
 
129
 
 
217
 
218
  ## Contribution
219
 
220
+ This model was originally contributed by [Yi Tay](https://www.yitay.net/?author=636616684c5e64780328eece), and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
221
 
222
  ## Examples
223