Generation does not terminate on the eos type used in prompting

by bpop - opened Jan 22, 2024

bpop

Jan 22, 2024

Hello again,

I failed to reproduce the TowerInstruct generation example shown on the model page. While the example output terminates after generating a single sentence, I did not find a simple way to get the model to do so. I suspect the reason has to do with a mismatch between the model's generation_config (which specifies eos_token_id=2) and what the model actually uses as an end-of-sequence marker ("<|im_end|>", token_id=32005). Since they don't match, generation does not stop when it reaches an <|im_end|>, which means it continues to generate until it hits the max length.

Overriding the default generation config would probably solve this issue (I can't test because I'm waiting for a free GPU), but this seems like a slightly clunky fix. Any idea what we should do about it?

DuarteMRAlves

Unbabel org Jan 22, 2024

•

edited Jan 22, 2024

Hi,

Thank you for noticing!
I have fixed the generation config and tested it. It should work as expected.

Can you check on your side if it works now (you may need to redownload it)?

JaimeLugo

Jan 23, 2024

Hey Duarte and bpop, thanks for the comments... I am traveling but will share with you my findings in three days. Will download the generation config and test it.

setmiddle

Jan 28, 2024

trying as is I have "torch.cuda.OutOfMemoryError: CUDA out of memory"
then deside to quantize in my docker container:

tgi-towerinstruct-gpu:
image: ghcr.io/huggingface/text-generation-inference:1.4
command: --model-id Unbabel/TowerInstruct-7B-v0.1 --quantize eetq --num-shard 1 --max-batch-prefill-tokens 512 --max-input-length 512
volumes:
- ./models:/data
ports:
- 8102:80
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [ gpu ]

but the quantize process return:

2024-01-28 00:11:40 2024-01-28T00:11:40.004314Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-01-28 00:11:49 2024-01-28T00:11:49.839464Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-01-28 00:11:49
2024-01-28 00:11:49 2024-01-28T00:11:49.925051Z INFO shard-manager: text_generation_launcher: Shard ready in 811.549244253s rank=0
2024-01-28 00:11:50 2024-01-28T00:11:50.021987Z INFO text_generation_launcher: Starting Webserver
2024-01-28 00:11:50 2024-01-28T00:11:50.053023Z INFO text_generation_router: router/src/main.rs:181: Using the Hugging Face API
2024-01-28 00:11:50 2024-01-28T00:11:50.053123Z INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-01-28 00:11:50 2024-01-28T00:11:50.237200Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.14.1/src/tokenizer/serialization.rs:159: Warning: Token '' was expected to have ID '32000' but was given ID 'None'
2024-01-28 00:11:50 2024-01-28T00:11:50.237383Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.14.1/src/tokenizer/serialization.rs:159: Warning: Token '' was expected to have ID '32001' but was given ID 'None'
2024-01-28 00:11:50 2024-01-28T00:11:50.237391Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.14.1/src/tokenizer/serialization.rs:159: Warning: Token '' was expected to have ID '32002' but was given ID 'None'
2024-01-28 00:11:50 2024-01-28T00:11:50.237398Z WARN tokenizers::tokenizer::serialization: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokenizers-0.14.1/src/tokenizer/serialization.rs:159: Warning: Token '

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment