fix: modeling_deepseek.py should use `deepseek` instead of `deepseek_v2` architecture

by llllvvuu - opened Aug 16, 2024

←

Aug 16, 2024

I believe that is the correct one since the model weight dict has matching keys (using the original self_attn architecture)

Aug 21, 2024

Closing in favor of #3

llllvvuu changed pull request status to closed Aug 21, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment