Mixed precision on Nvidia/CUDA device snippet
#3
by
panopstor
- opened
This should enable using torch.float16 or torch.bfloat16 (snippet extracted from larger script based on example):
parser.add_argument("--dtype", type=str, default="fp16", help="force a different dtype if using GPU (fp16, bf16, fp32) (default: fp16)")
...
dtype=torch.float32
if args.dtype == "fp16":
dtype=torch.float16
elif args.dtype == "bf16":
dtype=torch.bfloat16
model = model.to(dtype=dtype).cuda()
...
with torch.cuda.amp.autocast(enabled=args.dtype != "fp32", dtype=dtype):
generated_ids = model.generate(
pixel_values=inputs["pixel_values"].cuda() if not args.cpu else inputs["pixel_values"],
input_ids=inputs["input_ids"].cuda() if not args.cpu else inputs["input_ids"],
attention_mask=inputs["attention_mask"].cuda() if not args.cpu else inputs["attention_mask"],
image_embeds=None,
image_embeds_position_mask=inputs["image_embeds_position_mask"].cuda() if not args.cpu else inputs["image_embeds_position_mask"],
use_cache=True,
max_new_tokens=args.max_new_tokens,
)
Seems to work fine with float16 or bfloat16
panopstor
changed discussion status to
closed