Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper âĒ 2502.06781 âĒ Published 14 days ago âĒ 59
ValueFX9507/Tifa-Deepsex-14b-CoT-GGUF-Q4 Reinforcement Learning âĒ Updated 10 days ago âĒ 197k âĒ 723
Running on Zero 1.81k 1.81k Chat With Janus-Pro-7B ð A unified multimodal understanding and generation model.
Running on Zero 397 397 Chat with DeepSeek-VL2-small ð Generate responses using images and text input