Custom Templates

#5
by nudelbrot - opened

HI. The model (or code) errors when not using apply_chat_template and instead feeding a manual tokenized template to the .generate() function.

How is the apply_chat_template even integrated into the model?

I'd like to change the tailing phrase "the following is..."

Same on Python 3.8, 3.11, 3.9, up to date transformers, accelerate bitsandbytes

inputs = tokenizer(prompt, return_tensors="pt").cuda()

res = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=512, do_sample=True, temperature=0.6)[0]).split(tokenizer.eos_token)[0]


File ~/micromamba/envs/env311_flash_attn/lib/python3.11/site-packages/transformers/generation/utils.py:1376, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
   1368 # 3. Define model inputs
   1369 # inputs_tensor has to be defined
   1370 # model_input_name is defined if model-specific keyword input is passed
   1371 # otherwise model_input_name is None
   1372 # all model-specific keyword inputs are removed from `model_kwargs`
   1373 inputs_tensor, model_input_name, model_kwargs = self._prepare_model_inputs(
   1374     inputs, generation_config.bos_token_id, model_kwargs
   1375 )
-> 1376 batch_size = inputs_tensor.shape[0]
   1378 # 4. Define other model kwargs
   1379 model_kwargs["output_attentions"] = generation_config.output_attentions

File ~/micromamba/envs/env311_flash_attn/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:268, in BatchEncoding.__getattr__(self, item)
    266     return self.data[item]
    267 except KeyError:
--> 268     raise AttributeError

AttributeError: 

(no more output)

NousResearch org
edited Mar 8
inputs = tokenizer(prompt, return_tensors="pt").cuda()

res = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=512, do_sample=True, temperature=0.6)[0]).split(tokenizer.eos_token)[0]

This code is incorrect. tokenizer() returns a BatchEncoding, so I don't believe you can call .cuda() on it.

Try the following instead:

inputs = tokenizer(prompt, return_tensors="pt").input_ids.cuda()

res = tokenizer.decode(model.generate(inputs, max_new_tokens=512, do_sample=True, temperature=0.6)[0]).split(tokenizer.eos_token)[0]

Custom templates can be added as described here.

@euclaise
you are correct, it works with the modification, thanks!

nudelbrot changed discussion status to closed

Sign up or log in to comment