Upload README_EN.md with huggingface_hub
Browse files- README_EN.md +120 -0
README_EN.md
ADDED
|
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# openPangu-R-7B-2512
|
| 2 |
+
[中文](README.md) | English
|
| 3 |
+
|
| 4 |
+
## 1. Introduction
|
| 5 |
+
**openPangu-R-7B-2512** is an efficient large language model trained from scratch on Ascend NPUs, with 7B parameters (excluding the vocabulary embedding). It supports long-context processing up to 128k tokens. The model is trained on approximately 30 trillion tokens and is capable of switching between fast and slow reasoning modes.
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
## 2. Model Architecture
|
| 10 |
+
**openPangu-R-7B-2512** introduces the following optimizations to improve both model efficiency and performance:
|
| 11 |
+
|
| 12 |
+
* **Hybrid sliding-window attention mechanism**: We adopt a 1:1 hybrid scheme combining sliding-window attention and full attention, which significantly reduces KV cache usage and improves inference speed without compromising model accuracy. In addition, we introduce an Attention Sink strategy in all layers to ensure the stability of the hybrid attention mechanism.
|
| 13 |
+
|
| 14 |
+
* **Attention layer optimizations**: We introduce a GroupNorm-based Gated Attention strategy. Building on Gated Attention, we apply head-wise RMSNorm (with shared parameters) to normalize the attention outputs. This approach balances the magnitude of features across heads while preserving representational diversity, effectively improving training stability and overall model performance. We also introduce a Partial RoPE mechanism, applying positional encoding to only one quarter of the dimensions of the Query and Key, which enhances performance on both long-context and short-context tasks.
|
| 15 |
+
|
| 16 |
+
* **Causal convolution**: We insert a one-dimensional causal convolution before the input of the FFN layers. By enabling weighted information interaction across tokens, this enhances the expressive capacity of the FFN layers and further improves model performance.
|
| 17 |
+
|
| 18 |
+
The detailed architectural parameters are as follows:
|
| 19 |
+
|
| 20 |
+
| | openPangu-R-7B-2512 |
|
| 21 |
+
| :---------------------------: | :----------------: |
|
| 22 |
+
| **Architecture** | Dense |
|
| 23 |
+
| **Parameters (Non-Embedding)** | 7B |
|
| 24 |
+
| **Number of Layers** | 27 |
|
| 25 |
+
| **Hidden Dimension** | 4096 |
|
| 26 |
+
| **Intermediate Dimension** | 18432 |
|
| 27 |
+
| **Attention Mechanism** | GQA |
|
| 28 |
+
| **Number of Attention Heads** | 32 for Q,8 for KV |
|
| 29 |
+
| **Number of MTP Modules** | 1 |
|
| 30 |
+
| **Vocabulary Size** | 153k |
|
| 31 |
+
| **Context Length (Natively)** | 128k |
|
| 32 |
+
| **Pretraining Tokens** | 30T |
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
## 3. Results
|
| 36 |
+
|
| 37 |
+
| **Benchmark** | **Metric** | **openPangu-R-7B-2512 Slow-thinking** | **openPangu-R-7B-2512 Fast-thinking** |
|
| 38 |
+
| ------------------------- | ----------------------- | ------------------------------------- | ------------------------------------- |
|
| 39 |
+
| **General** | | | |
|
| 40 |
+
| Livebench | Acc (2024-11-25) | 58.1 | 44.5 |
|
| 41 |
+
| MMLU-Pro | Exact Match | 79.1 | 76.6 |
|
| 42 |
+
| MMLU-ProX | Acc | 68.7 | 61.2 |
|
| 43 |
+
| RULER | Acc | 83.2 | 83.4 |
|
| 44 |
+
| LongBench V2 | Acc | 33.4 | 30.4 |
|
| 45 |
+
| IF-Eval | Prompt Strict | 72.8 | 78.0 |
|
| 46 |
+
| Hallucination-LeaderBoard | 1-HHEM | 96.4 | 96.8 |
|
| 47 |
+
| GPQA-Diamond | Avg@4 | 75.4 | 63.1 |
|
| 48 |
+
| SuperGPQA | Acc | 53.1 | 48.7 |
|
| 49 |
+
| **Math** | | | |
|
| 50 |
+
| AIME24 | Avg@16 | 86.5 | 65.4 |
|
| 51 |
+
| AIME25 | Avg@16 | 75.2 | 56.9 |
|
| 52 |
+
| CNMO24 | Avg@32 | 78.5 | 67.0 |
|
| 53 |
+
| HMMT 2025 | Avg@16 (February) | 62.9 | 34.0 |
|
| 54 |
+
| **Coding** | | | |
|
| 55 |
+
| LiveCodeBench V6 | Avg@3 (01/25~05/25) | 57.1 | 35.8 |
|
| 56 |
+
| Codeforces | Elo Avg@3 (02/25~09/25) | 1411.6 | 774.4 |
|
| 57 |
+
| **Agent Tool Use** | | | |
|
| 58 |
+
| Ace-Bench | Acc (Prompt) | 61.8 | 49.8 |
|
| 59 |
+
| Tau-Bench (airline) | Avg@3 (FC) | 50.0 | 42.7 |
|
| 60 |
+
| Tau-Bench (retail) | Avg@3 (FC) | 69.0 | 61.7 |
|
| 61 |
+
| Tau2-Bench (airline) | Avg@3 (FC) | 58.0 | 59.3 |
|
| 62 |
+
| Tau2-Bench (retail) | Avg@3 (FC) | 71.3 | 66.4 |
|
| 63 |
+
| Tau2-Bench (telecom) | Avg@3 (FC) | 45.0 | 43.0 |
|
| 64 |
+
| BFCL-v3 | Acc (Prompt) | 70.6 | 62.7 |
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
**Note:** The evaluation is conducted using a 128k sequence length and a greedy decoding strategy.
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
## 4. Deployment
|
| 71 |
+
|
| 72 |
+
### 4.1 Environment
|
| 73 |
+
|
| 74 |
+
##### Hardware Requirements
|
| 75 |
+
|
| 76 |
+
Atlas 800T A2 (64GB), please refer to [[Atlas 800T A2](https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.2.RC1.alpha003&driver=Ascend+HDK+25.0.RC1)] for obtaining the driver and firmware installation packages.
|
| 77 |
+
|
| 78 |
+
#### System Requirements & Dependencies
|
| 79 |
+
|
| 80 |
+
- System: Linux (OpenEuler ≥ 24.03 recommended)
|
| 81 |
+
- CANN==8.1.RC1: [[CANN Install]](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/82RC1alpha002/softwareinst/instg/instg_0001.html?Mode=PmIns&OS=Ubuntu&Software=cannToolKit)
|
| 82 |
+
- python==3.10
|
| 83 |
+
- torch==2.1.0
|
| 84 |
+
- torch-npu==2.1.0.post12
|
| 85 |
+
- transformers==4.53.2
|
| 86 |
+
|
| 87 |
+
The above software environment has been verified, and theoretically supports newer versions. For any questions, please submit an issue.
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
### 4.2 Inference Examples
|
| 91 |
+
|
| 92 |
+
The following provides a simple inference example of openPangu-R-7B-2512 based on the `transformers` framework:
|
| 93 |
+
>Please modify generate.py and add the model path before running.
|
| 94 |
+
```bash
|
| 95 |
+
cd inference
|
| 96 |
+
python generate.py
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
The openPangu-R-7B-2512 model is in slow thinking mode by default, and can be switched to fast thinking mode by the following means:
|
| 100 |
+
- In the code example `generate.py`, the definition of the `no_thinking_prompt` variable demonstrates the specific implementation for switching to fast thinking mode: by appending the ` /no_think` tag at the end of user input, the current turn can be switched to fast thinking mode.
|
| 101 |
+
|
| 102 |
+
### 4.4 Using Inference Framework
|
| 103 |
+
vllm_ascend:[[README_EN.md]](inference/README_EN.md)
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
|
| 108 |
+
## 5. Model License
|
| 109 |
+
|
| 110 |
+
Unless otherwise noted, openPangu-R-7B-2512 model is licensed under the terms and conditions of OPENPANGU MODEL LICENSE AGREEMENT VERSION 1.0, which is intended to be used permissively and enable the further development of artificial intelligence technologies. Please refer to the [LICENSE](LICENSE) file located in the root directory of the model repository for details.
|
| 111 |
+
|
| 112 |
+
## 6. Disclaimer
|
| 113 |
+
Due to the technical limitations inherent in the technology on which the openPangu-R-7B-2512 (“Model”) relies and the fact that the artificial intelligence generated content is automatically produced by Model, Huawei cannot make any guarantees regarding the following matters:
|
| 114 |
+
- The output of this Model is automatically generated via AI algorithms, it does not rule out the possibility that some of the information may be flawed, unreasonable, or cause discomfort, and the generated content does not represent Huawei's attitude or standpoint;
|
| 115 |
+
- There is no guarantee that this Model is 100% accurate, reliable, functional, timely, secure and safety, error-free, uninterrupted, continuously stable, or free of any faults;
|
| 116 |
+
- The output of this Model does not constitute any advices or decisions for you, and it does not guarantee the authenticity, completeness, accuracy, timeliness, legality, functionality, or practicality of the generated content. The generated content cannot replace professionals in medical, legal, and other fields in answering your questions. The generated content is for your reference only and does not represent any attitude, standpoint, or position of Huawei. You need to make independent judgments based on your actual situation, and Huawei does not assume any responsibilities.
|
| 117 |
+
|
| 118 |
+
|
| 119 |
+
## 7. Contact Us
|
| 120 |
+
If you have any comments or suggestions, please submit an issue or contact openPangu@huawei.com.
|