google
/

flan-ul2

Text2Text Generation

Transformers

PyTorch

flan-ul2

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

ybelkada commited on Mar 3, 2023

Commit

75a0ff3

1 Parent(s): b5c5730

Update README.md

Browse files

Files changed (1) hide show

README.md +28 -4

README.md CHANGED Viewed

@@ -60,11 +60,15 @@ datasets:
 license: apache-2.0
 ---
-# TL;DR FLan-UL2
 Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2)  released earlier last year. It was fine tuned using the "Flan" prompt tuning
 and dataset collection.
-According ot the original [blog]() here are the notable improvements:
 - The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
 - The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
 - The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
@@ -86,10 +90,13 @@ The reported results are the following :
 # Using the model
 ```python
 from transformers import AutoModelForConditionalGeneration, AutoTokenizer
 import torch
-model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto", load_in_8bits = True)
 tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
 input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
@@ -99,7 +106,24 @@ outputs = model.generate(inputs, max_length=200)
 print(tokenizer.decode(outputs[0]))
 # <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
 ```
@@ -193,7 +217,7 @@ In total, the model was trained for 2.65 million steps.
 ## Contribution
-This model was contributed by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
 ## Examples

 license: apache-2.0
 ---
+# Model card for FLan-UL2
+![model image](https://raw.githubusercontent.com/google-research/google-research/master/ul2/figs/ul2.png)
 Flan-UL2 is an encoder decoder model based on the `T5` architecture. It uses the same configuration as the [`UL2 model`](https://huggingface.co/google/ul2)  released earlier last year. It was fine tuned using the "Flan" prompt tuning
 and dataset collection.
+According ot the original [blog](https://www.yitay.net/blog/flan-ul2-20b) here are the notable improvements:
 - The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
 - The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
 - The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore.
 # Using the model
+For more efficient memory usage, we advise you to load the model in `8bit` using `load_in_8bit` flag as follows:
 ```python
+# pip install accelerate transformers bitsandbytes
 from transformers import AutoModelForConditionalGeneration, AutoTokenizer
 import torch
+model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", device_map="auto", load_in_8bit=True)
 tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
 input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
 print(tokenizer.decode(outputs[0]))
 # <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
+```
+Otherwise, you can load and run the model in `bfloat16` as follows:
+```python
+# pip install accelerate transformers
+from transformers import AutoModelForConditionalGeneration, AutoTokenizer
+import torch
+model = AutoModelForConditionalGeneration.from_pretrained("google/flan-ul2", torch_dtype=torch.bfloat16, device_map="auto")
+tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
+input_string = "Answer the following question by reasoning step by step. The cafeteria had 23 apples. If they used 20 for lunch, and bought 6 more, how many apple do they have?"
+inputs = tokenizer(input_string, return_tensors="pt").input_ids.to("cuda")
+outputs = model.generate(inputs, max_length=200)
+print(tokenizer.decode(outputs[0]))
+# <pad> They have 23 - 20 = 3 apples left. They have 3 + 6 = 9 apples. Therefore, the answer is 9.</s>
 ```
 ## Contribution
+This model was originally contributed by [Yi Tay](https://www.yitay.net/?author=636616684c5e64780328eece), and added to the Hugging Face ecosystem by [Younes Belkada](https://huggingface.co/ybelkada) & [Arthur Zucker](https://huggingface.co/ArthurZ).
 ## Examples