Text Generation
Transformers
PyTorch
llama
text-generation-inference
Inference Endpoints

Update `README.md`

#1
by alvarobartt HF staff - opened

Add input_ids mapping to device, as otherwise the code fails since we're using device_map="auto" from πŸ€—accelerate and the previous code would just work if working in the CPU, otherwise we wouls encounter a RuntimeError: Expected all tensors to be on the same device exception, this way the code is safer and ensured to work in any device. Additionally, I've also removed some extra line breaks.

Maybe it would also be nice to add a line with the requirements pip install transformers[torch] einops accelerate sentencepiece even though most of it may already be at https://github.com/openlm-research/open_llama, I guess it may also be nice to include the requirements in the README.md too

BTW great work @young-geng πŸ‘πŸ» I'm also happy to open a PR at https://github.com/openlm-research/open_llama as of https://github.com/openlm-research/open_llama/issues/76, let me know if that makes sense to you!

Had to change this line in the example to:

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)

Otherwise it complains that the model and token_ids are on different devices.

(PyTorch2, Windows, 4090 GPU)

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment