inclusionAI
/

Ling-1T

@@ -21,7 +21,6 @@ Built on the Ling 2.0 architecture, Ling-1T is designed to push the limits of *e
 Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128 K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
 This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.
----
 ### Flagship-Level Efficient Reasoning
@@ -30,7 +29,6 @@ Across code generation, software development, competition-level mathematics, pro
 In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
----
 ### Aesthetic Understanding and Front-End Generation
@@ -38,7 +36,6 @@ Ling-1T excels in visual reasoning and front-end code generation tasks, combinin
 We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
 On **ArtifactsBench**, Ling-1T ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.
----
 ### Emergent Intelligence at Trillion-Scale
@@ -53,7 +50,6 @@ Ling-1T can:
 These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.
----
 ### Pre-Training at Trillion Scale
@@ -76,18 +72,16 @@ Pre-training used over **20 T high-quality tokens**, with **> 40 % reasoning-den
 Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
 A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
----
 ### Post-Training and Evo-CoT Optimization
 Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
 This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.
-For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)**—a novel sentence-level policy optimization method.
 Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
 Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
----
 ## Evaluation

 Pre-trained on **20 trillion+ high-quality, reasoning-dense tokens**, Ling-1T-base supports up to **128 K context length** and adopts an **evolutionary chain-of-thought (Evo-CoT)** process across mid-training and post-training.
 This curriculum greatly enhances the model’s efficiency and reasoning depth, allowing Ling-1T to achieve **state-of-the-art performance** on multiple complex reasoning benchmarks—balancing **accuracy** and **efficiency**.
 ### Flagship-Level Efficient Reasoning
 In the **AIME 25** benchmark, Ling-1T extends the **Pareto frontier** of reasoning accuracy vs. reasoning length, showcasing its strength in **“efficient thinking and precise reasoning.”**
 ### Aesthetic Understanding and Front-End Generation
 We introduce a hybrid *Syntax–Function–Aesthetics* reward mechanism, enabling the model to not only generate correct and functional code but also demonstrate a refined sense of **visual aesthetics**.
 On **ArtifactsBench**, Ling-1T ranks **first among open-source models**, and the benchmark visualizations in this card were, in fact, *generated by Ling-1T itself*.
 ### Emergent Intelligence at Trillion-Scale
 These capabilities form the foundation for **general, collaborative human–AI intelligence**, which we aim to advance together with the open-source community through Ling-1T’s release.
 ### Pre-Training at Trillion Scale
 Mid-training introduced **curated chain-of-thought corpora** for “**reasoning pre-activation**”, improving downstream reasoning stability.
 A custom **WSM (Warmup–Stable–Merge)** LR scheduler with mid-train checkpoint merging simulates LR decay and boosts generalization.
 ### Post-Training and Evo-CoT Optimization
 Built upon mid-training reasoning activation, post-training adopts **Evo-CoT (Evolutionary Chain-of-Thought)** for progressive reasoning enhancement under controllable cost.
 This approach continually expands the **Pareto frontier** of reasoning accuracy vs. efficiency—ideal for reflexive non-thinking models.
+For reinforcement learning, we introduce **LPO (Linguistics-Unit Policy Optimization)** —a novel sentence-level policy optimization method.
 Unlike GRPO (token-level) or GSPO (sequence-level) algorithms, LPO treats *sentences* as the natural semantic action units, enabling precise alignment between rewards and reasoning behavior.
 Empirically, LPO offers superior **training stability** and **generalization** across reasoning tasks.
 ## Evaluation