Commit 
							
							·
						
						0640ef5
	
1
								Parent(s):
							
							9070482
								
Update Technical Report
Browse files
    	
        README.md
    CHANGED
    
    | @@ -34,7 +34,7 @@ In the EXAONE 4.0 architecture, we apply new architectural changes compared to p | |
| 34 | 
             
            1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
         | 
| 35 | 
             
            2. **QK-Reorder-Norm**: We adopt the Post-LN (LayerNorm) scheme for transformer blocks instead of Pre-LN, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
         | 
| 36 |  | 
| 37 | 
            -
            For more details, please refer to our [technical report](https:// | 
| 38 |  | 
| 39 |  | 
| 40 | 
             
            ### Model Configuration
         | 
| @@ -211,7 +211,7 @@ For more details, please refer to [the documentation](https://github.com/NVIDIA/ | |
| 211 |  | 
| 212 | 
             
            ## Performance
         | 
| 213 |  | 
| 214 | 
            -
            The following tables show the evaluation results of each model, with reasoning and non-reasoning mode. The evaluation details can be found in the [technical report](https:// | 
| 215 |  | 
| 216 | 
             
            - ✅ denotes the model has a hybrid reasoning capability, evaluated by selecting reasoning / non-reasoning on the purpose.
         | 
| 217 | 
             
            - To assess Korean **practical** and **professional** knowledge, we adopt both the [KMMLU-Redux](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Redux) and [KMMLU-Pro](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro) benchmarks. Both datasets are publicly released!
         | 
| @@ -1159,7 +1159,14 @@ The model is licensed under [EXAONE AI Model License Agreement 1.2 - NC](./LICEN | |
| 1159 |  | 
| 1160 | 
             
            ## Citation
         | 
| 1161 |  | 
| 1162 | 
            -
             | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 1163 |  | 
| 1164 |  | 
| 1165 | 
             
            ## Contact
         | 
|  | |
| 34 | 
             
            1. **Hybrid Attention**: For the 32B model, we adopt hybrid attention scheme, which combines *Local attention (sliding window attention)* with *Global attention (full attention)* in a 3:1 ratio. We do not use RoPE (Rotary Positional Embedding) for global attention for better global context understanding.
         | 
| 35 | 
             
            2. **QK-Reorder-Norm**: We adopt the Post-LN (LayerNorm) scheme for transformer blocks instead of Pre-LN, and we add RMS normalization right after the Q and K projection. It helps yield better performance on downstream tasks despite consuming more computation.
         | 
| 36 |  | 
| 37 | 
            +
            For more details, please refer to our [technical report](https://arxiv.org/abs/2507.11407), [blog](https://www.lgresearch.ai/blog/view?seq=576), and [GitHub](https://github.com/LG-AI-EXAONE/EXAONE-4.0).
         | 
| 38 |  | 
| 39 |  | 
| 40 | 
             
            ### Model Configuration
         | 
|  | |
| 211 |  | 
| 212 | 
             
            ## Performance
         | 
| 213 |  | 
| 214 | 
            +
            The following tables show the evaluation results of each model, with reasoning and non-reasoning mode. The evaluation details can be found in the [technical report](https://arxiv.org/abs/2507.11407).
         | 
| 215 |  | 
| 216 | 
             
            - ✅ denotes the model has a hybrid reasoning capability, evaluated by selecting reasoning / non-reasoning on the purpose.
         | 
| 217 | 
             
            - To assess Korean **practical** and **professional** knowledge, we adopt both the [KMMLU-Redux](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Redux) and [KMMLU-Pro](https://huggingface.co/datasets/LGAI-EXAONE/KMMLU-Pro) benchmarks. Both datasets are publicly released!
         | 
|  | |
| 1159 |  | 
| 1160 | 
             
            ## Citation
         | 
| 1161 |  | 
| 1162 | 
            +
            ```
         | 
| 1163 | 
            +
            @article{exaone-4.0,
         | 
| 1164 | 
            +
              title={EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes},
         | 
| 1165 | 
            +
              author={{LG AI Research}},
         | 
| 1166 | 
            +
              journal={arXiv preprint arXiv:2507.11407},
         | 
| 1167 | 
            +
              year={2025}
         | 
| 1168 | 
            +
            }
         | 
| 1169 | 
            +
            ```
         | 
| 1170 |  | 
| 1171 |  | 
| 1172 | 
             
            ## Contact
         | 

