Text Generation
Transformers
Safetensors
English
llama
causal-lm
text-generation-inference
4-bit precision

Attentions are all None

#16
by joshlevy89 - opened

I used the code on discussion #1 (https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ/discussions/1#644cc8af97a3b0904a481e3e) but modified it to output hidden_states and attentions. The textual output looks good, and the hidden_states look good, but each element of the attention tensors are None. The output size is 1007 (generated length) x 40 (layer number) but the tensors within are all None. Any ideas why this would be?

Generate updated to output internals:

with torch.no_grad():
model_output = model.generate(
input_ids,
do_sample=False, # Set to False for now
min_length=min_length,
max_length=max_length,
top_p=top_p,
temperature=temperature,
return_dict_in_generate=True,
output_scores=True,
output_attentions=True,
output_hidden_states=True
)

Sorry, no, no clue. Never tried to do that before.

What is the point of outputting the hidden states and attentions?

Sign up or log in to comment