mtensor commited on
Commit
860995a
1 Parent(s): 03212a3

add examples to readme

Browse files
Files changed (1) hide show
  1. README.md +58 -0
README.md CHANGED
@@ -38,6 +38,64 @@ Though not the focus of this model, we did evaluate it on standard image underst
38
  | COCO Captions | 141 | 138 | n/a | n/a | 149 | 135 | 138 |
39
  | AI2D | 64.5 | 73.7 | n/a | 62.3 | 81.2 | n/a | n/a |
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ## Uses
42
 
43
  ### Direct Use
 
38
  | COCO Captions | 141 | 138 | n/a | n/a | 149 | 135 | 138 |
39
  | AI2D | 64.5 | 73.7 | n/a | 62.3 | 81.2 | n/a | n/a |
40
 
41
+ ## How to Use
42
+
43
+ You can load the model and perform inference as follows:
44
+ ```python
45
+ from transformers import FuyuForCausalLM, AutoTokenizer, FuyuProcessor, FuyuImageProcessor
46
+ from PIL import Image
47
+
48
+ # load model, tokenizer, and processor
49
+ pretrained_path = "adept/fuyu-8b"
50
+ tokenizer = AutoTokenizer.from_pretrained(pretrained_path)
51
+
52
+ image_processor = FuyuImageProcessor()
53
+ processor = FuyuProcessor(image_processor=image_processor, tokenizer=tokenizer)
54
+
55
+ model = FuyuForCausalLM.from_pretrained(pretrained_path, device_map="cuda:0")
56
+
57
+ # test inference
58
+ text_prompt = "Generate a coco-style caption.\n"
59
+ image_path = "bus.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/bus.png
60
+ image_pil = Image.open(image_path)
61
+
62
+ model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
63
+ for k, v in model_inputs.items():
64
+ model_inputs[k] = v.to("cuda:0")
65
+
66
+ generation_output = model.generate(**model_inputs, max_new_tokens=8)
67
+ generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-38:]
68
+ assert generation_text == "A bus parked on the side of a road.<s>"
69
+ ```
70
+
71
+ Fuyu can also perform some question answering on natural images:
72
+ ```python
73
+ text_prompt = "What color is the bus?\n"
74
+ image_path = "/bus.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/bus.png
75
+ image_pil = Image.open(image_path)
76
+
77
+ model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
78
+ for k, v in model_inputs.items():
79
+ model_inputs[k] = v.to("cuda:0")
80
+
81
+ generation_output = model.generate(**model_inputs, max_new_tokens=6)
82
+ generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-17:]
83
+ assert generation_text == "The bus is blue.\n"
84
+
85
+
86
+ text_prompt = "What is the highest life expectancy at birth of male?\n"
87
+ image_path = "chart.png" # https://huggingface.co/adept-hf-collab/fuyu-8b/blob/main/chart.png
88
+ image_pil = Image.open(image_path)
89
+
90
+ model_inputs = processor(text=text_prompt, images=[image_pil], device="cuda:0")
91
+ for k, v in model_inputs.items():
92
+ model_inputs[k] = v.to("cuda:0")
93
+
94
+ generation_output = model.generate(**model_inputs, max_new_tokens=16)
95
+ generation_text = processor.batch_decode(generation_output, skip_special_tokens=True)[0][-55:]
96
+ assert generation_text == "The life expectancy at birth of males in 2018 is 80.7.\n"
97
+ ```
98
+
99
  ## Uses
100
 
101
  ### Direct Use