Diving into Self-Evolving Training for Multimodal Reasoning Paper β’ 2412.17451 β’ Published 3 days ago β’ 35
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper β’ 2412.17256 β’ Published 3 days ago β’ 35
Zephyr 7B Gemma Collection Models, dataset, and Demo for Zephyr 7B Gemma. For code to train the models, see: https://github.com/huggingface/alignment-handbook β’ 5 items β’ Updated Apr 12 β’ 15