fix: modeling_deepseek.py should use `deepseek` instead of `deepseek_v2` architecture
#1
by
llllvvuu
- opened
I have copied the file from https://huggingface.co/deepseek-ai/deepseek-moe-16b-chat/edit/main/modeling_deepseek.py
I believe that is the correct one since the model weight dict has matching keys (using the original self_attn
architecture)
llllvvuu
changed pull request status to
closed