AscendKernelGen
/

KernelGen-LM-4B

Model card Files Files and versions

AscendKernelGen commited on Mar 13

Commit

df3120f

·

verified ·

1 Parent(s): b410043

Update README.md

Files changed (1) hide show

README.md +0 -9

README.md CHANGED Viewed

@@ -23,12 +23,3 @@ Our framework, **AscendKernelGen (AKGen)**, bridges the gap between general-purp
 * **Domain-Adaptive Post-Training:** A two-stage optimization process that yields **KernelGen-LM**. We first employ **Supervised Fine-Tuning (SFT)** with error-derived supervision (correcting API misuse and numerical errors). This is followed by **Reinforcement Learning (RL)** using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
 * **Hardware-Grounded Evaluation:** Validated using **NPUKernelBench**, a comprehensive benchmark that assesses compilation success, functional correctness, and performance (latency) on real Ascend hardware across varying complexity levels.
 * **Performance:** The model demonstrates siginificant improvement on complex Level-2 kernels compared to baselines, and effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) fail completely.
-## Citation
-@article{cao2026ascendkernelgen,
-  title={AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units},
-  author={Xinzi Cao and Jianyang Zhai and Pengfei Li and Zhiheng Hu and Cen Yan and Bingxu Mu and Guanghuan Fang and Bin She and Jiayu Li and Yihan Su and Dongyang Tao and Xiansong Huang and Fan Xu and Feidiao Yang and Yao Lu and Chang-Dong Wang and Yutong Lu and Weicheng Xue and Bin Zhou and Yonghong Tian},
-  journal={arXiv preprint arXiv:2601.07160},
-  year={2026},
-  url={https://arxiv.org/abs/2601.07160}
-}

 * **Domain-Adaptive Post-Training:** A two-stage optimization process that yields **KernelGen-LM**. We first employ **Supervised Fine-Tuning (SFT)** with error-derived supervision (correcting API misuse and numerical errors). This is followed by **Reinforcement Learning (RL)** using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
 * **Hardware-Grounded Evaluation:** Validated using **NPUKernelBench**, a comprehensive benchmark that assesses compilation success, functional correctness, and performance (latency) on real Ascend hardware across varying complexity levels.
 * **Performance:** The model demonstrates siginificant improvement on complex Level-2 kernels compared to baselines, and effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) fail completely.