Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ Source code for [Offline Reinforcement Learning for LLM Multi-Step Reasoning](ht
|
|
4 |
|
5 |
Model: [Policy](https://huggingface.co/jwhj/Qwen2.5-Math-1.5B-OREO) | [Value](https://huggingface.co/jwhj/Qwen2.5-Math-1.5B-OREO-Value)
|
6 |
|
7 |
-
<img src="
|
8 |
|
9 |
|
10 |
# Installation
|
|
|
4 |
|
5 |
Model: [Policy](https://huggingface.co/jwhj/Qwen2.5-Math-1.5B-OREO) | [Value](https://huggingface.co/jwhj/Qwen2.5-Math-1.5B-OREO-Value)
|
6 |
|
7 |
+
<img src="https://raw.githubusercontent.com/jwhj/OREO/refs/heads/main/OREO.png" alt="Image description" width="50%" />
|
8 |
|
9 |
|
10 |
# Installation
|