openPangu-R-7B-2512

中文 | English

1. Introduction

openPangu-R-7B-2512 is an efficient large language model trained from scratch on Ascend NPUs, with 7B parameters (excluding the vocabulary embedding). It supports long-context processing up to 128k tokens. The model is trained on approximately 30 trillion tokens and is capable of switching between fast and slow reasoning modes.

2. Model Architecture

openPangu-R-7B-2512 introduces the following optimizations to improve both model efficiency and performance:

Hybrid sliding-window attention mechanism: We adopt a 1:1 hybrid scheme combining sliding-window attention and full attention, which significantly reduces KV cache usage and improves inference speed without compromising model accuracy. In addition, we introduce an Attention Sink strategy in all layers to ensure the stability of the hybrid attention mechanism.
Attention layer optimizations: We introduce a GroupNorm-based Gated Attention strategy. Building on Gated Attention, we apply head-wise RMSNorm (with shared parameters) to normalize the attention outputs. This approach balances the magnitude of features across heads while preserving representational diversity, effectively improving training stability and overall model performance. We also introduce a Partial RoPE mechanism, applying positional encoding to only one quarter of the dimensions of the Query and Key, which enhances performance on both long-context and short-context tasks.
Causal convolution: We insert a one-dimensional causal convolution before the input of the FFN layers. By enabling weighted information interaction across tokens, this enhances the expressive capacity of the FFN layers and further improves model performance.

The detailed architectural parameters are as follows:

	openPangu-R-7B-2512
Architecture	Dense
Parameters (Non-Embedding)	7B
Number of Layers	27
Hidden Dimension	4096
Intermediate Dimension	18432
Attention Mechanism	GQA
Number of Attention Heads	32 for Q，8 for KV
Number of MTP Modules	1
Vocabulary Size	153k
Context Length (Natively)	128k
Pretraining Tokens	30T

3. Results

Benchmark	Metric	openPangu-R-7B-2512 Slow-thinking	openPangu-R-7B-2512 Fast-thinking
General
Livebench	Acc (2024-11-25)	58.1	44.5
MMLU-Pro	Exact Match	79.1	76.6
MMLU-ProX	Acc	68.7	61.2
RULER	Acc	83.2	83.4
LongBench V2	Acc	33.4	30.4
IF-Eval	Prompt Strict	72.8	78.0
Hallucination-LeaderBoard	1-HHEM	96.4	96.8
GPQA-Diamond	Avg@4	75.4	63.1
SuperGPQA	Acc	53.1	48.7
Math
AIME24	Avg@16	86.5	65.4
AIME25	Avg@16	75.2	56.9
CNMO24	Avg@32	78.5	67.0
HMMT 2025	Avg@16 (February)	62.9	34.0
Coding
LiveCodeBench V6	Avg@3 (01/25~05/25)	57.1	35.8
Codeforces	Elo Avg@3 (02/25~09/25)	1411.6	774.4
Agent Tool Use
Ace-Bench	Acc (Prompt)	61.8	49.8
Tau-Bench (airline)	Avg@3 (FC)	50.0	42.7
Tau-Bench (retail)	Avg@3 (FC)	69.0	61.7
Tau2-Bench (airline)	Avg@3 (FC)	58.0	59.3
Tau2-Bench (retail)	Avg@3 (FC)	71.3	66.4
Tau2-Bench (telecom)	Avg@3 (FC)	45.0	43.0
BFCL-v3	Acc (Prompt)	70.6	62.7

Note: The evaluation is conducted using a 128k sequence length and a greedy decoding strategy.

4. Deployment

4.1 Environment

Hardware Requirements

Atlas 800T A2 (64GB), please refer to [Atlas 800T A2] for obtaining the driver and firmware installation packages.

System Requirements & Dependencies

System: Linux (OpenEuler ≥ 24.03 recommended)
CANN==8.1.RC1: [CANN Install]
python==3.10
torch==2.1.0
torch-npu==2.1.0.post12
transformers==4.53.2

The above software environment has been verified, and theoretically supports newer versions. For any questions, please submit an issue.

4.2 Inference Examples

The following provides a simple inference example of openPangu-R-7B-2512 based on the transformers framework:

Please modify generate.py and add the model path before running.

cd inference
python generate.py

The openPangu-R-7B-2512 model is in slow thinking mode by default, and can be switched to fast thinking mode by the following means:

In the code example generate.py, the definition of the no_thinking_prompt variable demonstrates the specific implementation for switching to fast thinking mode: by appending the /no_think tag at the end of user input, the current turn can be switched to fast thinking mode.

4.4 Using Inference Framework

vllm_ascend：[README_EN.md]

5. Model License

Unless otherwise noted, openPangu-R-7B-2512 model is licensed under the terms and conditions of OPENPANGU MODEL LICENSE AGREEMENT VERSION 1.0, which is intended to be used permissively and enable the further development of artificial intelligence technologies. Please refer to the LICENSE file located in the root directory of the model repository for details.

6. Disclaimer

Due to the technical limitations inherent in the technology on which the openPangu-R-7B-2512 (“Model”) relies and the fact that the artificial intelligence generated content is automatically produced by Model, Huawei cannot make any guarantees regarding the following matters:

The output of this Model is automatically generated via AI algorithms, it does not rule out the possibility that some of the information may be flawed, unreasonable, or cause discomfort, and the generated content does not represent Huawei's attitude or standpoint;
There is no guarantee that this Model is 100% accurate, reliable, functional, timely, secure and safety, error-free, uninterrupted, continuously stable, or free of any faults;
The output of this Model does not constitute any advices or decisions for you, and it does not guarantee the authenticity, completeness, accuracy, timeliness, legality, functionality, or practicality of the generated content. The generated content cannot replace professionals in medical, legal, and other fields in answering your questions. The generated content is for your reference only and does not represent any attitude, standpoint, or position of Huawei. You need to make independent judgments based on your actual situation, and Huawei does not assume any responsibilities.

7. Contact Us

If you have any comments or suggestions, please submit an issue or contact openPangu@huawei.com.