Fine Tuning
#28 opened about 21 hours ago
by
Carulus
Error:Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered
1
#27 opened 8 days ago
by
fffutr30
Distillation dataset released?
#26 opened 13 days ago
by
Winmodel
bos_token_id mismatch between model config and tokenizer
2
#25 opened 13 days ago
by
guangy10

JAX Implementation!
1
#24 opened 18 days ago
by
jrosseruk
Step by step guide for Distillation
#23 opened 20 days ago
by
Pradeep1995

max_position_embeddings and tokenizer max discrepancies
1
#22 opened 22 days ago
by
ghpu

R1 not putting out the full model response with transformers pipeline
1
#21 opened 24 days ago
by
rcgalbo

Using it on mobile
1
#20 opened 25 days ago
by
chandan867
update Qwen/Qwen2.5-1.5B-Instruct to deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B, i got an error
#19 opened 26 days ago
by
fengkai-llm
model.safetensors file's tensors are all odd aligned causing performance issues when mmap'ing the file in place
#18 opened 29 days ago
by
iankronquist

TypeError: forward() missing 1 required positional argument: 'attention_masks'
1
#16 opened about 1 month ago
by
lmlmvxi

Using the Model
1
#14 opened about 1 month ago
by
joel1610-hon
Failed to load the model
1
#13 opened about 1 month ago
by
Raulol19
Upload IMG_1608.jpeg
#12 opened about 1 month ago
by
Ayomide223
how to fine tune?
2
#10 opened about 1 month ago
by
NickyNicky

How to turn off the r1 mode when running it with huggingface api?
4
#9 opened about 1 month ago
by
securealex
Add pipeline tag, link to paper
#7 opened about 1 month ago
by
nielsr

comfyui-deepseek-r1
#6 opened about 1 month ago
by
zwpython

is `config.json` correct?
#4 opened about 1 month ago
by
J22
System Prompt
3
#3 opened about 1 month ago
by
Wanfq

YAML Metadata Warning: empty or missing yaml metadata in repo card
#2 opened about 1 month ago
by
JLouisBiz
