how did you do it?
1
#4 opened about 1 month ago
by
ehartford

compare to qwen3-8b and qwen3-14b
π
6
#3 opened about 1 month ago
by
decem

Could the same distillation technology be used to create a draft model for DeepSeek R1 0528 ?
π
1
#2 opened about 1 month ago
by
BernardH
Multilingual?
#1 opened about 1 month ago
by
AaronFeng753