does not return hidden states

#15
by wassname - opened

Phi overrides the pretrained transformer but does not extend it's capability for returning hidden states

image.png

here's a modified version that returns attention and hidden states https://huggingface.co/wassname/phi-2-GPTQ_w_hidden_states/blob/main/configuration_phi.py

@wassname is there any plan to really change phi-2 ?
Because the following warning remains on the main page :
"Remark: In the generation function, our model currently does not support beam search (num_beams > 1). Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings."
I personally use a lot custom input embeddings and this makes phi unusable for many usecases in my opinion.

Microsoft org

Hello @edmond and @wassname !

This will be updated once we integrate with the Phi implementation in HF.

Best regards,
Gustavo.

gugarosa changed discussion status to closed

Thanks Gustov, much appreciated. Phi -2 is an awesome model for research as it fits on consumer gpu's even when doing strange experiments (VAE, Adaptors, Probing).

Amazing ! Im impatient

@wassname Do you know if the last commits include the changes we were hoping for ? Its hard for me to know without reading their code

Yeyy, glad to hear it, as Gemma still isnt so great compared to Phi-2 xD

Sign up or log in to comment