fietje-2b

Running on Zero

App Files Files Community

Load model in bfloat16

by Rijgersberg - opened May 9

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-1

Rijgersberg

May 9

•

edited May 9

The model is currently loaded without specifying a torch_dtype, which in my testing defaults to loading the model in torch.float32.

This PR loads the model in torch.bfloat16, which is the same dtype as used during training. It should lower memory requirements by about a factor 2, but more importantly: generation should also be sped up by around the same factor 2 without loss of quality.

Load model in bloat162ee99b8b

Rijgersberg changed pull request title from Load model in bloat16 to Load model in bfloat16 May 9

BramVanroy changed pull request status to merged May 17

BramVanroy

Owner May 17

Thanks! I ignorantly assumed that dtype=auto would take care of this when the safetensors metadata is all BF16.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment