Great article. I have been trying to deploy deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
on inferentia with a context window higher than 4096 (let's say MAX_TOTAL_TOKENS=8192
), but it seems there is no pre-compiled model for that. It would be great if you could add instructions to compile these models, that would be great.
Keerthan Vasist
kvasist
ยท
AI & ML interests
None yet
Recent Activity
commented on
an
article
9 days ago
How to deploy and fine-tune DeepSeek models on AWS
new activity
9 days ago
aws-neuron/optimum-neuron-cache:[Cache Request] deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Organizations
None yet
kvasist's activity
commented on
How to deploy and fine-tune DeepSeek models on AWS
9 days ago
[Cache Request] deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
#359 opened 9 days ago
by
kvasist