prithivMLmods commited on
Commit
e517775
·
verified ·
1 Parent(s): 066a99e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -15,7 +15,9 @@ tags:
15
  ![r999.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/kUiwtiUef-r4wWzdRGxG2.png)
16
  # **Magellanic-Llama-70B-r999**
17
 
18
- Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy. Additionally, it addresses issues such as endless repetition, poor readability, and language mixing. This approach allows the model to explore chain-of-thought (CoT) reasoning for solving complex problems, improving reasoning patterns, and aligning with human preferences. Furthermore, two SFT stages serve as the seed for the model's reasoning and non-reasoning capabilities.
 
 
19
 
20
  # **Use with Transformers**
21
 
 
15
  ![r999.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/kUiwtiUef-r4wWzdRGxG2.png)
16
  # **Magellanic-Llama-70B-r999**
17
 
18
+ Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy.
19
+
20
+ Additionally, it addresses issues such as endless repetition, poor readability, and language mixing. This approach allows the model to explore chain-of-thought (CoT) reasoning for solving complex problems, improving reasoning patterns, and aligning with human preferences. Furthermore, two SFT stages serve as the seed for the model's reasoning and non-reasoning capabilities.
21
 
22
  # **Use with Transformers**
23