Update README.md
Browse files
README.md
CHANGED
@@ -94,16 +94,21 @@ print(generated_text)
|
|
94 |
|
95 |
To make inference more efficient, run with autocast:
|
96 |
|
|
|
|
|
97 |
with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
|
98 |
output = model.generate_from_batch(
|
99 |
inputs,
|
100 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
101 |
tokenizer=processor.tokenizer
|
102 |
)
|
|
|
|
|
103 |
We did most of our evaluation in this setting (autocast on, but float32 weights)
|
104 |
|
105 |
To even further reduce the memory requirements, the model can be run with bfloat16 weights:
|
106 |
|
|
|
107 |
model.to(dtype=torch.bfloat16)
|
108 |
inputs["images"] = inputs["images"].to(torch.bfloat16)
|
109 |
output = model.generate_from_batch(
|
@@ -111,6 +116,7 @@ output = model.generate_from_batch(
|
|
111 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
112 |
tokenizer=processor.tokenizer
|
113 |
)
|
|
|
114 |
Note that we have observed that this can change the output of the model compared to running with float32 weights.
|
115 |
|
116 |
## Evaluations
|
|
|
94 |
|
95 |
To make inference more efficient, run with autocast:
|
96 |
|
97 |
+
|
98 |
+
```python
|
99 |
with torch.autocast(device_type="cuda", enabled=True, dtype=torch.bfloat16):
|
100 |
output = model.generate_from_batch(
|
101 |
inputs,
|
102 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
103 |
tokenizer=processor.tokenizer
|
104 |
)
|
105 |
+
```
|
106 |
+
|
107 |
We did most of our evaluation in this setting (autocast on, but float32 weights)
|
108 |
|
109 |
To even further reduce the memory requirements, the model can be run with bfloat16 weights:
|
110 |
|
111 |
+
```
|
112 |
model.to(dtype=torch.bfloat16)
|
113 |
inputs["images"] = inputs["images"].to(torch.bfloat16)
|
114 |
output = model.generate_from_batch(
|
|
|
116 |
GenerationConfig(max_new_tokens=200, stop_strings="<|endoftext|>"),
|
117 |
tokenizer=processor.tokenizer
|
118 |
)
|
119 |
+
```
|
120 |
Note that we have observed that this can change the output of the model compared to running with float32 weights.
|
121 |
|
122 |
## Evaluations
|