Architectural Issue

#3
by cloudyu - opened

Thanks to DeepSeek for releasing this model.
I believe the design of top k=6 is very lacking in rigor and elegance.
Changing top k to 4 does not affect the inference performance.
Below is my preliminary test.
https://huggingface.co/autotrust/DeepSeek-V4-Flash-DSpark-4E

Sign up or log in to comment