Update README.md
Browse files
README.md
CHANGED
|
@@ -23,12 +23,3 @@ Our framework, **AscendKernelGen (AKGen)**, bridges the gap between general-purp
|
|
| 23 |
* **Domain-Adaptive Post-Training:** A two-stage optimization process that yields **KernelGen-LM**. We first employ **Supervised Fine-Tuning (SFT)** with error-derived supervision (correcting API misuse and numerical errors). This is followed by **Reinforcement Learning (RL)** using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
|
| 24 |
* **Hardware-Grounded Evaluation:** Validated using **NPUKernelBench**, a comprehensive benchmark that assesses compilation success, functional correctness, and performance (latency) on real Ascend hardware across varying complexity levels.
|
| 25 |
* **Performance:** The model demonstrates siginificant improvement on complex Level-2 kernels compared to baselines, and effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) fail completely.
|
| 26 |
-
|
| 27 |
-
## Citation
|
| 28 |
-
@article{cao2026ascendkernelgen,
|
| 29 |
-
title={AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units},
|
| 30 |
-
author={Xinzi Cao and Jianyang Zhai and Pengfei Li and Zhiheng Hu and Cen Yan and Bingxu Mu and Guanghuan Fang and Bin She and Jiayu Li and Yihan Su and Dongyang Tao and Xiansong Huang and Fan Xu and Feidiao Yang and Yao Lu and Chang-Dong Wang and Yutong Lu and Weicheng Xue and Bin Zhou and Yonghong Tian},
|
| 31 |
-
journal={arXiv preprint arXiv:2601.07160},
|
| 32 |
-
year={2026},
|
| 33 |
-
url={https://arxiv.org/abs/2601.07160}
|
| 34 |
-
}
|
|
|
|
| 23 |
* **Domain-Adaptive Post-Training:** A two-stage optimization process that yields **KernelGen-LM**. We first employ **Supervised Fine-Tuning (SFT)** with error-derived supervision (correcting API misuse and numerical errors). This is followed by **Reinforcement Learning (RL)** using Direct Preference Optimization (DPO), driven by execution-based correctness and performance signals.
|
| 24 |
* **Hardware-Grounded Evaluation:** Validated using **NPUKernelBench**, a comprehensive benchmark that assesses compilation success, functional correctness, and performance (latency) on real Ascend hardware across varying complexity levels.
|
| 25 |
* **Performance:** The model demonstrates siginificant improvement on complex Level-2 kernels compared to baselines, and effectively solving tasks where general-purpose models (like Qwen3, Llama3.1) fail completely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|