File size: 3,602 Bytes

bf6c7c4
 
f46df56
 
 
 
 
 
 
bf6c7c4
 
f46df56
bf6c7c4
f46df56
bf6c7c4
f46df56
bf6c7c4
f46df56
bf6c7c4
f46df56
 
 
bf6c7c4
f46df56
bf6c7c4
 
f46df56
bf6c7c4
f46df56
bf6c7c4
 
f46df56
bf6c7c4
f46df56
bf6c7c4
f46df56
 
 
 
 
 
 
 
 
 
 
 
 
 
bf6c7c4
f46df56
bf6c7c4
 
f46df56
bf6c7c4
f46df56
bf6c7c4
f46df56
 
bf6c7c4
f46df56
 
 
bf6c7c4
f46df56

---
library_name: transformers
license: mit
datasets:
- OLAIR/Open-R1-Ko-SFT-v2.0
language:
- ko
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
---

# Model Card: OLAIR/ko-r1-7b-v2.0.3

This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations.

---

## 1. Overview

**Model Name:** OLAIR/ko-r1-7b-v2.0.3  
**Model Type:** Large Language Model (LLM) for Korean language understanding and reasoning  
**Version:** 2.0.3

This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies.


## 2. Training Data

The model was trained on the dataset provided by OLAIR, specifically the [Open-R1-Ko-SFT-v2.0](https://huggingface.co/datasets/OLAIR/Open-R1-Ko-SFT-v2.0) dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean.


## 3. Benchmark Performance

The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3:

| Model                                 | Chemistry | Math  | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|-------|---------|----------------------|---------|---------|
| o1-2024-12-17                         | 42.9      | 74.5  | 77.8    | 70.0                 | 30.8    | 59.2    |
| o3-mini-high                          | 35.7      | 72.7  | 70.4    | 70.0                 | 23.1    | 54.4    |
| o3-mini-2025-01-31                     | 35.7      | 74.5  | 74.1    | 60.0                 | 7.7     | 50.4    |
| o1-mini-2024-09-12                     | 35.7      | 54.5  | 63.0    | 60.0                 | 0.0     | 42.6    |
| Deepseek-R1                           | 35.7      | 52.7  | 51.9    | 60.0                 | 0.0     | 40.1    |
| gpt-4o-2024-11-20                      | 28.6      | 21.8  | 37.0    | 50.0                 | 0.0     | 27.5    |
| **Ko-R1-7B-v2.0.3**                   | **7.1**   | **56.4**  | **29.6**    | **40.0**                 | **0.0**     | **26.6**    |
| Qwen2.5-72B-Instruct                  | 35.7      | 29.1  | 37.0    | 30.0                 | 0.0     | 26.4    |
| Ko-R1-7B-v1                           | 0.0       | 60.0  | 22.2    | 40.0                 | 0.0     | 24.4    |
| Exaone-3.5-32B-Instruct               | 28.6      | 27.3  | 22.2    | 40.0                 | 0.0     | 23.6    |
| gpt-4o-mini-2024-07-18                 | 7.1       | 29.1  | 22.2    | 50.0                 | 0.0     | 21.7    |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 14.3      | 10.9  | 33.3    | 0.0                  | 0.0     | 11.7    |

*Note:* The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts.


## 4. Limitations

- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.

## ETC
How to Cite

```
To be added
```

Contact 
```
spthsrbwls123@yonsei.ac.kr
```