⚠️ Known Issue: "Over-learning"

DeepSeek-R1-1.5B-Altashir-Uyghur

This is a fine-tuned version of the DeepSeek-R1-Distill-Qwen-1.5B. It is specifically optimized for multilingual instruction following, with a heavy emphasis on Uyghur (ئۇيغۇرچە), Chinese, and English cross-domain reasoning.

📊 Dataset Stats (Data sourced via DeepSeek-V4-Flash)

The model was trained on a high-diversity corpus of 53,273 samples, covering 40+ domains including daily life, traditional crafts, history, reasoning, and AI basics.

Category	Sample Count	Percentage
Dialog (پاراڭلىشىش)	16,385	30.76%
Reasoning (تەپەككۇر)	15,484	29.07%
QA (سوئال-جاۋاب)	15,409	28.92%
Creative (ئىجادىيەت)	5,076	9.53%
Translation (تەرجىمە)	919	1.73%

⚙️ Training Configuration (Hardware & Costs)

Teacher Model: DeepSeek-V4-Flash (Used for data synthesis)
Student Model: DeepSeek-R1-Distill-Qwen-1.5B
Hardware: 2 x NVIDIA RTX 4090
Training Tool: llamafactory-cli
Total Epochs: 3.0
Total Steps: 10,017
Training Runtime: 08:25:32

Training Metrics

Final Train Loss: 0.7544 (Peak convergence at ~0.48)
Samples/sec: 5.284
Total FLOPs: 609,157,078 GF

⚠️ Known Issue: "Over-learning" (ئەسكەرتىش)

Observation: After 3 full epochs, the loss plateaued at step 7,000. Testing indicates the model is "over-baked" (Overfitting).

Behavior: The model shows high rigidity in its identity (Self-identification as DeepSeek-R1) and occasional repetition in multilingual contexts.
Uyghur Support: While highly capable in Uyghur, it may occasionally mix languages due to the high training intensity relative to the model's 1.5B parameter size.

Training Logs

License: This model is licensed under the Apache 2.0 License, following the base model's licensing terms.

Downloads last month: 12

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for Rekipjan/DeepSeek-R1-1.5B-Altashir-Uyghur

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Finetuned

(631)

this model

Rekipjan
/

DeepSeek-R1-1.5B-Altashir-Uyghur