DeepSeek Training Support

#34
by SuperXr - opened

Twinkle update: Support for DeepSeek v4 post-training is now live, including optimized support for Ascend NPU!
πŸ”— Read more: https://mp.weixin.qq.com/s/5AvzBlZe-BQ5hk_hdoXamw
πŸ’» Cookbook: https://github.com/modelscope/twinkle/blob/main/cookbook/transformers/deepseek_v4_flash.py

Sign up or log in to comment