Custom Templates
#5
by
nudelbrot
- opened
HI. The model (or code) errors when not using apply_chat_template and instead feeding a manual tokenized template to the .generate() function.
How is the apply_chat_template even integrated into the model?
I'd like to change the tailing phrase "the following is..."
Same on Python 3.8, 3.11, 3.9, up to date transformers, accelerate bitsandbytes
inputs = tokenizer(prompt, return_tensors="pt").cuda()
res = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=512, do_sample=True, temperature=0.6)[0]).split(tokenizer.eos_token)[0]
File ~/micromamba/envs/env311_flash_attn/lib/python3.11/site-packages/transformers/generation/utils.py:1376, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, negative_prompt_ids, negative_prompt_attention_mask, **kwargs)
1368 # 3. Define model inputs
1369 # inputs_tensor has to be defined
1370 # model_input_name is defined if model-specific keyword input is passed
1371 # otherwise model_input_name is None
1372 # all model-specific keyword inputs are removed from `model_kwargs`
1373 inputs_tensor, model_input_name, model_kwargs = self._prepare_model_inputs(
1374 inputs, generation_config.bos_token_id, model_kwargs
1375 )
-> 1376 batch_size = inputs_tensor.shape[0]
1378 # 4. Define other model kwargs
1379 model_kwargs["output_attentions"] = generation_config.output_attentions
File ~/micromamba/envs/env311_flash_attn/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:268, in BatchEncoding.__getattr__(self, item)
266 return self.data[item]
267 except KeyError:
--> 268 raise AttributeError
AttributeError:
(no more output)
inputs = tokenizer(prompt, return_tensors="pt").cuda()
res = tokenizer.decode(model.generate(inputs['input_ids'], max_new_tokens=512, do_sample=True, temperature=0.6)[0]).split(tokenizer.eos_token)[0]
This code is incorrect. tokenizer()
returns a BatchEncoding, so I don't believe you can call .cuda()
on it.
Try the following instead:
inputs = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
res = tokenizer.decode(model.generate(inputs, max_new_tokens=512, do_sample=True, temperature=0.6)[0]).split(tokenizer.eos_token)[0]
Custom templates can be added as described here.
nudelbrot
changed discussion status to
closed