Update README.md
Browse files
README.md
CHANGED
|
@@ -21,7 +21,6 @@ Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of *e
|
|
| 21 |
Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128 K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
|
| 22 |
This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.
|
| 23 |
|
| 24 |
-
---
|
| 25 |
|
| 26 |
### Flagship-Level Efficient Reasoning
|
| 27 |
|
|
@@ -30,7 +29,6 @@ Across code generation, software development, competition-level mathematics, pro
|
|
| 30 |
|
| 31 |
In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
|
| 32 |
|
| 33 |
-
---
|
| 34 |
|
| 35 |
### Aesthetic Understanding and Front-End Generation
|
| 36 |
|
|
@@ -38,7 +36,6 @@ Ling-1T excels in visual reasoning and front-end code generation tasks, combinin
|
|
| 38 |
We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
|
| 39 |
On **ArtifactsBench**, Ling-1T ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.
|
| 40 |
|
| 41 |
-
---
|
| 42 |
|
| 43 |
### Emergent Intelligence at Trillion-Scale
|
| 44 |
|
|
@@ -53,7 +50,6 @@ Ling-1T can:
|
|
| 53 |
|
| 54 |
These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.
|
| 55 |
|
| 56 |
-
---
|
| 57 |
|
| 58 |
### Pre-Training at Trillion Scale
|
| 59 |
|
|
@@ -76,18 +72,16 @@ Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-den
|
|
| 76 |
Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
|
| 77 |
A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
|
| 78 |
|
| 79 |
-
---
|
| 80 |
|
| 81 |
### Post-Training and Evo-CoT Optimization
|
| 82 |
|
| 83 |
Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
|
| 84 |
This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.
|
| 85 |
|
| 86 |
-
For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)
|
| 87 |
Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
|
| 88 |
Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
|
| 89 |
|
| 90 |
-
---
|
| 91 |
|
| 92 |
## Evaluation
|
| 93 |
|
|
|
|
| 21 |
Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128 K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
|
| 22 |
This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.
|
| 23 |
|
|
|
|
| 24 |
|
| 25 |
### Flagship-Level Efficient Reasoning
|
| 26 |
|
|
|
|
| 29 |
|
| 30 |
In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
|
| 31 |
|
|
|
|
| 32 |
|
| 33 |
### Aesthetic Understanding and Front-End Generation
|
| 34 |
|
|
|
|
| 36 |
We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
|
| 37 |
On **ArtifactsBench**, Ling-1T ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.
|
| 38 |
|
|
|
|
| 39 |
|
| 40 |
### Emergent Intelligence at Trillion-Scale
|
| 41 |
|
|
|
|
| 50 |
|
| 51 |
These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.
|
| 52 |
|
|
|
|
| 53 |
|
| 54 |
### Pre-Training at Trillion Scale
|
| 55 |
|
|
|
|
| 72 |
Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
|
| 73 |
A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
|
| 74 |
|
|
|
|
| 75 |
|
| 76 |
### Post-Training and Evo-CoT Optimization
|
| 77 |
|
| 78 |
Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
|
| 79 |
This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.
|
| 80 |
|
| 81 |
+
For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)** —a novel sentence-level policy optimization method.
|
| 82 |
Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
|
| 83 |
Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
|
| 84 |
|
|
|
|
| 85 |
|
| 86 |
## Evaluation
|
| 87 |
|