peakji commited on
Commit
24d609d
·
verified ·
1 Parent(s): 278989e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ ---
7
+
8
+ # Steiner-preview
9
+
10
+ **For more details, please refer to the [announcement blog post](https://medium.com/@peakji/b9a756a00855).**
11
+
12
+ Steiner is a series of reasoning models trained on synthetic data using reinforcement learning. These models can explore multiple reasoning paths in an autoregressive manner during inference and autonomously verify or backtrack when necessary, enabling a linear traversal of the implicit search tree.
13
+
14
+ Steiner is a personal interest project by Yichao 'Peak' Ji, inspired by OpenAI o1. The ultimate goal is to reproduce o1 and validate the inference-time scaling curves. The Steiner-preview model is currently a work-in-progress. The reason for open-sourcing it is that I’ve found automated evaluation methods, primarily based on multiple-choice questions, struggle to fully reflect the progress of reasoning models. In fact, the assumption that "the correct answer is always among the options" doesn’t align well with real-world reasoning scenarios, as it encourages models to perform substitution-based validation rather than open-ended exploration. For this reason, I’ve chosen to open-source these intermediate results and, when time permits, to build in public. This approach allows me to share knowledge while also gathering more evaluations and feedback from real human users.
15
+
16
+ ⚠️ Disclaimer: While Steiner has been able to achieve high-quality zero-shot results without relying on Chain of Thought (CoT) prompting or an agent framework, it has not yet replicated the inference-time scaling capabilities demonstrated by o1. In experiments using a [specialized logits processor](https://gist.github.com/peakji/f81c032b6c24b358054ed763c426a46f) to intervene on reasoning tokens, increasing the number of reasoning steps did not improve performance; in fact, it led to a decline in benchmarks such as MMLU-Pro and GPQA. As a result, Steiner cannot currently be considered a successful reproduction of OpenAI o1. There may be deficiencies in both the training methods and data quality, so please interpret the results with caution.
17
+
18
+ ## Deployment
19
+
20
+ Steiner is compatible with all existing inference services, with [vLLM](https://github.com/vllm-project/vllm) being the most recommended for deployment.
21
+
22
+ ### vLLM
23
+
24
+ Deploying Steiner is no different from using other LLMs; you just need to add the following two parameters to the inference request:
25
+
26
+ ```
27
+ "skip_special_tokens": false,
28
+ "spaces_between_special_tokens": false,
29
+ ```
30
+
31
+ For example:
32
+
33
+ ```json
34
+ {
35
+ "model": "steiner",
36
+ "skip_special_tokens": false,
37
+ "spaces_between_special_tokens": false,
38
+ "messages": [
39
+ {
40
+ "role": "user",
41
+ "content": "Hello"
42
+ }
43
+ ]
44
+ }
45
+ ```
46
+
47
+ If you are using the Python client provided by OpenAI, you can use it like this:
48
+
49
+ ```python
50
+ stream = client.chat.completions.create(
51
+ model="steiner",
52
+ messages=[{"role": "user", "content": "Hello"}],
53
+ stream=True,
54
+ extra_body={
55
+ "skip_special_tokens": False,
56
+ "spaces_between_special_tokens": False,
57
+ },
58
+ )
59
+ ```
60
+
61
+ ## Benchmarks
62
+
63
+ ### GPQA Diamond
64
+
65
+ | Subdomain | Accuracy (0-shot w/o CoT) |
66
+ | --- | --- |
67
+ | Physics (general) | 63.16% |
68
+ | Organic Chemistry | 40.28% |
69
+ | Quantum Mechanics | 76.00% |
70
+ | Electromagnetism and Photonics | 50.00% |
71
+ | High-energy particle physics | 57.14% |
72
+ | Genetics | 25.00% |
73
+ | Astrophysics | 53.85% |
74
+ | Molecular Biology | 80.00% |
75
+ | Chemistry (general) | 50.00% |
76
+ | Relativistic Mechanics | 57.14% |
77
+ | Inorganic Chemistry | 0.00% |
78
+ | Optics and Acoustics | 0.00% |
79
+ | Condensed Matter Physics | 100.00% |
80
+ | **All** | **53.54%** |
81
+
82
+
83
+ ## Limitations
84
+
85
+ * Steiner’s current post-training data does not include examples for multi-turn dialogues. The best-performing version of the Steiner model (based on [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B)) lacks the ability to handle multi-turn conversations. The open-source Steiner-preview model (based on [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct)) is compatible with chat formats but is still not recommended for multi-turn dialogues.
86
+ * Similar to OpenAI o1-2024-09-12, Steiner also does not recommend the use of custom system prompts or modifications to sampling parameters such as temperature. Steiner has not yet been trained on a diverse set of system prompts, and altering other parameters may lead to errors in the formatting of reasoning tokens.
87
+ * The language composition of Steiner's post-training data is approximately 90% English and 10% Chinese, but during the reasoning path data augmentation process, almost only English was used. Therefore, while the model's final responses demonstrate a certain degree of language following ability, the reasoning tokens may predominantly be generated in English.