configuration_mixformer_sequential.py deleted

#59
by tantanchen - opened

I was trying to use Open-Orca/oo-phi-1_5 and I got an error: Entry Not Found for url: https://huggingface.co/microsoft/phi-1_5/resolve/main/configuration_mixformer_sequential.py.

Looking at the file history, this was deleted last week. I don't understand the transformer wrappers very well, but it looks like there was an attempt to improve the wrapper, but I think it broke it. I've also tried using microsoft/phi-1_5 and during inference, it gave a strange error: The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:50256 for open-end generation.

I was doing this on colab if anyone wants to see the full notebook: https://colab.research.google.com/drive/1o_fKb-P_2u-QwggQzzQPNaOj6-PkjVTb#scrollTo=3SGgTfikxC-z

@gugarosa Would you mind giving some pointers as to what is wrong?

Microsoft org

Hello @tantanchen !

Regarding the issue with the Open-Orca/oo-phi-1_5 model, this looks like a problem related to the cache system. Could you please delete .cache and re-download that model? We updated our model interface, however, this only applies to microsoft/phi-1_5, i.e., other repositories should have their own model file.

Regarding the attention_mask warning, this is expected when it is used. Since the tokenizer we used for this model does not have a pad_token_id, we have to mimic a special token and use it as the padding token when doing batched inference/generation. In this case, it mimics the eos_token_id.

Hope this helps to clear some things up.

Best regards,
Gustavo.

hrrm that doesn't make much sense because I'm running this on Colab, and nothing is cached between sessions. But looks like the problem is on the Open-Ocra side. Thanks

tantanchen changed discussion status to closed

Sign up or log in to comment