S1-M-7B-Beta

๐Ÿ  Homepage | ๐Ÿ‘ Our Official Code Repo | ๐Ÿค— S1-M Dataset (Beta)

S1-M-7B-Beta used for developing the algorithm "Simple Test-time Scaling in Multimodal Reasoning". By fine-tuning the base model Qwen/Qwen2-VL-7B-Instruct on data with thinking tags <think> and </think>, the model acquired the think first, then response paradigm, allowing for experiments on "Test-time Scaling".

Note: The current model is a development version, not the final official version.

Downloads last month
3
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for PKU-Alignment/s1-m_7b_beta

Base model

Qwen/Qwen2-VL-7B
Finetuned
(221)
this model