rinna/nekomata-14b · Differences between modeling_qwen.py in nekomata-14b and Qwen-14b Repositories

Jan 9, 2024

There appears to be a difference between the modeling_qwen.py file in the nekomata-14b repository and the one in the qwen-14b repository. You can find them at the following links:
https://huggingface.co/Qwen/Qwen-14B/blob/main/modeling_qwen.py#L522-L525
https://huggingface.co/rinna/nekomata-14b/blob/main/modeling_qwen.py#L522-L527

This discrepancy may be impacting the use of nekomata-14b with the latest https://github.com/QwenLM/Qwen repository's LoRA fine-tune implementation in a PyTorch 2 environment.
When attempting this, I encountered a
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation.

tianyuz

Jan 9, 2024

Hi @shoey-ucci , thank you for pointing it out.
I have just synced the modeling code with the latest official code.

tianyuz changed discussion status to closed Jan 9, 2024