prithivMLmods
/

Magellanic-Llama-70B-r999

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

prithivMLmods commited on 15 days ago

Commit

e517775

·

verified ·

1 Parent(s): 066a99e

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -15,7 +15,9 @@ tags:
 ![r999.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/kUiwtiUef-r4wWzdRGxG2.png)
 # **Magellanic-Llama-70B-r999**
-Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy. Additionally, it addresses issues such as endless repetition, poor readability, and language mixing. This approach allows the model to explore chain-of-thought (CoT) reasoning for solving complex problems, improving reasoning patterns, and aligning with human preferences. Furthermore, two SFT stages serve as the seed for the model's reasoning and non-reasoning capabilities.
 # **Use with Transformers**

 ![r999.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/kUiwtiUef-r4wWzdRGxG2.png)
 # **Magellanic-Llama-70B-r999**
+Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy.
+Additionally, it addresses issues such as endless repetition, poor readability, and language mixing. This approach allows the model to explore chain-of-thought (CoT) reasoning for solving complex problems, improving reasoning patterns, and aligning with human preferences. Furthermore, two SFT stages serve as the seed for the model's reasoning and non-reasoning capabilities.
 # **Use with Transformers**