SPSD-RL

Qwen3-4B-Base full-parameter checkpoint trained with prompt/completion supervision on the LorMolf/SPSD-RL conversation dataset.

Source artifact: outputs/qwen3_4b_base_spsd_rl_sft_prompt_completion_4gpu_20260603/final_bs20_accum4_ddp7200_wandb_localcache.

Downloads last month
8
Safetensors
Model size
4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LorMolf/SPSD-RL

Finetuned
(308)
this model