Daemontatox
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -1,15 +1,13 @@
|
|
1 |
---
|
2 |
-
base_model:
|
3 |
-
- Daemontatox/SphinX
|
4 |
tags:
|
5 |
- text-generation-inference
|
6 |
- transformers
|
7 |
- unsloth
|
8 |
- qwen2
|
9 |
- trl
|
10 |
-
- logic
|
11 |
-
- Reasoning
|
12 |
- COT
|
|
|
13 |
license: apache-2.0
|
14 |
language:
|
15 |
- en
|
@@ -17,18 +15,10 @@ datasets:
|
|
17 |
- Daemontatox/LongCOT-Reason
|
18 |
metrics:
|
19 |
- accuracy
|
20 |
-
- recall
|
21 |
- bleu
|
22 |
-
- brier_score
|
23 |
-
- code_eval
|
24 |
- character
|
25 |
-
- charcut_mt
|
26 |
-
- cer
|
27 |
- bleurt
|
28 |
-
- chrf
|
29 |
-
pipeline_tag: text-generation
|
30 |
library_name: transformers
|
31 |
-
new_version: Daemontatox/Sphinx2.0
|
32 |
---
|
33 |
|
34 |
![image](./image.webp)
|
@@ -42,82 +32,97 @@ new_version: Daemontatox/Sphinx2.0
|
|
42 |
|
43 |
## Model Overview
|
44 |
|
45 |
-
The **Super Strong Reasoning Model** is
|
46 |
|
47 |
-
###
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
52 |
|
53 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
---
|
56 |
|
57 |
## Use Cases
|
58 |
|
59 |
-
|
60 |
-
1. **Research and Analysis:**
|
61 |
-
2. **
|
62 |
-
3. **
|
63 |
-
4. **
|
64 |
-
5. **Game Design and Puzzles:** Solve and create logical challenges, puzzles, or scenarios.
|
65 |
|
66 |
---
|
67 |
|
68 |
## Training Details
|
69 |
|
70 |
-
|
71 |
-
-
|
72 |
-
|
73 |
-
- Hugging Face Transformers and the TRL library for reinforcement learning with human feedback (RLHF).
|
74 |
|
75 |
-
|
76 |
-
The model was finetuned on a carefully curated dataset of reasoning-focused tasks, ensuring its ability to handle:
|
77 |
-
- Logical puzzles and mathematical problems.
|
78 |
-
- Complex question-answering tasks.
|
79 |
-
- Deductive and inductive reasoning scenarios.
|
80 |
|
81 |
-
|
82 |
-
-
|
83 |
-
-
|
84 |
|
85 |
---
|
86 |
|
87 |
-
##
|
88 |
|
89 |
-
|
90 |
-
|
91 |
-
- **
|
92 |
-
- **
|
93 |
-
- **CommonsenseQA:** Robust understanding of commonsense reasoning tasks.
|
94 |
-
|
95 |
-
### Metrics
|
96 |
-
- **Accuracy:** Consistently high on logical and abstract reasoning benchmarks.
|
97 |
-
- **Inference Speed:** Optimized for real-time applications.
|
98 |
-
- **Resource Efficiency:** Low memory footprint, suitable for deployment in limited-resource environments.
|
99 |
|
100 |
---
|
101 |
|
102 |
## Ethical Considerations
|
103 |
|
104 |
-
|
105 |
-
|
106 |
-
|
107 |
-
3. **Safe Usage:** Avoid applications that may harm individuals or propagate misinformation.
|
108 |
|
109 |
---
|
110 |
|
111 |
## License
|
112 |
|
113 |
-
This model is
|
|
|
|
|
114 |
|
115 |
## Acknowledgments
|
116 |
|
117 |
Special thanks to:
|
118 |
-
- [Unsloth](https://github.com/unslothai/unsloth) for
|
119 |
-
- Hugging Face for
|
120 |
|
121 |
---
|
122 |
|
123 |
-
Experience the
|
|
|
1 |
---
|
2 |
+
base_model: unsloth/qwen2.5-7b-instruct-bnb-4bit
|
|
|
3 |
tags:
|
4 |
- text-generation-inference
|
5 |
- transformers
|
6 |
- unsloth
|
7 |
- qwen2
|
8 |
- trl
|
|
|
|
|
9 |
- COT
|
10 |
+
- Reasoning
|
11 |
license: apache-2.0
|
12 |
language:
|
13 |
- en
|
|
|
15 |
- Daemontatox/LongCOT-Reason
|
16 |
metrics:
|
17 |
- accuracy
|
|
|
18 |
- bleu
|
|
|
|
|
19 |
- character
|
|
|
|
|
20 |
- bleurt
|
|
|
|
|
21 |
library_name: transformers
|
|
|
22 |
---
|
23 |
|
24 |
![image](./image.webp)
|
|
|
32 |
|
33 |
## Model Overview
|
34 |
|
35 |
+
The **Super Strong Reasoning Model** is an advanced AI system optimized for logical reasoning, multi-step problem-solving, and decision-making tasks. Designed with efficiency and accuracy in mind, it employs a structured system prompt to ensure high-quality answers through a transparent and iterative thought process.
|
36 |
|
37 |
+
### System Prompt and Workflow
|
38 |
+
|
39 |
+
This model operates using an innovative reasoning framework structured around the following steps:
|
40 |
+
|
41 |
+
1. **Initial Thought:**
|
42 |
+
The model uses `<Thinking>` tags to reason step-by-step and craft its best possible response.
|
43 |
+
Example:
|
44 |
+
|
45 |
+
2. **Self-Critique:**
|
46 |
+
It evaluates its initial response within `<Critique>` tags, focusing on:
|
47 |
+
- **Accuracy:** Is it factually correct and verifiable?
|
48 |
+
- **Clarity:** Is it clear and free of ambiguity?
|
49 |
+
- **Completeness:** Does it fully address the request?
|
50 |
+
- **Improvement:** What can be enhanced?
|
51 |
+
Example:
|
52 |
+
|
53 |
+
3. **Revision:**
|
54 |
+
Based on the critique, the model refines its response within `<Revising>` tags.
|
55 |
+
Example:
|
56 |
|
57 |
+
4. **Final Response:**
|
58 |
+
The revised response is presented clearly within `<Final>` tags.
|
59 |
+
Example:
|
60 |
+
|
61 |
+
5. **Tag Innovation:**
|
62 |
+
When needed, the model creates and defines new tags for better structuring or clarity, ensuring consistent usage.
|
63 |
+
Example:
|
64 |
+
|
65 |
+
### Key Features
|
66 |
+
- **Structured Reasoning:** Transparent, multi-step approach for generating and refining answers.
|
67 |
+
- **Self-Improvement:** Built-in critique and revision ensure continuous response enhancement.
|
68 |
+
- **Clarity and Adaptability:** Tagging system provides organized, adaptable responses tailored to user needs.
|
69 |
+
- **Creative Flexibility:** Supports dynamic problem-solving with the ability to introduce new tags and concepts.
|
70 |
|
71 |
---
|
72 |
|
73 |
## Use Cases
|
74 |
|
75 |
+
The model is designed for various domains, including:
|
76 |
+
1. **Research and Analysis:** Extracting insights and providing structured explanations.
|
77 |
+
2. **Education:** Assisting with tutoring by breaking down complex problems step-by-step.
|
78 |
+
3. **Problem-Solving:** Offering logical and actionable solutions for multi-step challenges.
|
79 |
+
4. **Content Generation:** Producing clear, well-organized creative or professional content.
|
|
|
80 |
|
81 |
---
|
82 |
|
83 |
## Training Details
|
84 |
|
85 |
+
- **Frameworks:**
|
86 |
+
- [Unsloth](https://github.com/unslothai/unsloth) for accelerated training.
|
87 |
+
- Hugging Face Transformers and the TRL library for reinforcement learning with human feedback (RLHF).
|
|
|
88 |
|
89 |
+
- **Dataset:** Finetuned on diverse reasoning-focused tasks, including logical puzzles, mathematical problems, and commonsense reasoning scenarios.
|
|
|
|
|
|
|
|
|
90 |
|
91 |
+
- **Hardware Efficiency:**
|
92 |
+
- Trained with bnb-4bit precision for reduced memory usage.
|
93 |
+
- Optimized training pipeline achieving 2x faster development cycles.
|
94 |
|
95 |
---
|
96 |
|
97 |
+
## Performance Metrics
|
98 |
|
99 |
+
The model excels in reasoning benchmarks:
|
100 |
+
- **ARC (AI2 Reasoning Challenge):** High accuracy in logical and commonsense tasks.
|
101 |
+
- **GSM8K (Math Reasoning):** Superior results in multi-step problem-solving.
|
102 |
+
- **CommonsenseQA:** Strong comprehension of everyday reasoning tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
103 |
|
104 |
---
|
105 |
|
106 |
## Ethical Considerations
|
107 |
|
108 |
+
- **Transparency:** Responses are structured for verifiability through tagging.
|
109 |
+
- **Bias Mitigation:** Includes self-critique to minimize biases and ensure fairness.
|
110 |
+
- **Safe Deployment:** Users are encouraged to evaluate outputs to prevent harm or misinformation.
|
|
|
111 |
|
112 |
---
|
113 |
|
114 |
## License
|
115 |
|
116 |
+
This model is distributed under the Apache 2.0 license, allowing users to use, modify, and share it in compliance with the license terms.
|
117 |
+
|
118 |
+
---
|
119 |
|
120 |
## Acknowledgments
|
121 |
|
122 |
Special thanks to:
|
123 |
+
- [Unsloth](https://github.com/unslothai/unsloth) for accelerated training workflows.
|
124 |
+
- Hugging Face for their powerful tools and libraries.
|
125 |
|
126 |
---
|
127 |
|
128 |
+
Experience the **Super Strong Reasoning Model**, leveraging its structured reasoning and self-improvement capabilities for any task requiring advanced AI reasoning.
|