CoffeeBliss

CoffeeBliss
·

AI & ML interests

None yet

Recent Activity

liked a model 7 days ago
bartowski/HuatuoGPT-o1-8B-GGUF
View all activity

Organizations

None yet

CoffeeBliss's activity

replied to lewtun's post 5 days ago
reacted to lewtun's post with 🔥 5 days ago
view post
Post
1968
This paper ( HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs (2412.18925)) has a really interesting recipe for inducing o1-like behaviour in Llama models:

* Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting.
* Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases)
* Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1
* Use the resulting data for SFT & RL
* Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement.

Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
  • 1 reply
·