TheBloke/stable-vicuna-13B-GPTQ · Attentions are all None

May 10, 2023

•

edited May 10, 2023

I used the code on discussion #1 (https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ/discussions/1#644cc8af97a3b0904a481e3e) but modified it to output hidden_states and attentions. The textual output looks good, and the hidden_states look good, but each element of the attention tensors are None. The output size is 1007 (generated length) x 40 (layer number) but the tensors within are all None. Any ideas why this would be?

Generate updated to output internals:

with torch.no_grad():
model_output = model.generate(
input_ids,
do_sample=False, # Set to False for now
min_length=min_length,
max_length=max_length,
top_p=top_p,
temperature=temperature,
return_dict_in_generate=True,
output_scores=True,
output_attentions=True,
output_hidden_states=True
)

TheBloke

Owner May 10, 2023

Sorry, no, no clue. Never tried to do that before.

What is the point of outputting the hidden states and attentions?

joshlevy89

May 10, 2023

Thanks for the quick reply.

Many different applications, e.g. https://openai.com/research/language-models-can-explain-neurons-in-language-models