seong67360
/

Qwen2.5-7B-Instruct_v3

PEFT

Safetensors

krx

Model card Files Files and versions Community

South-Korea-Shuan commited on Nov 7, 2024

Commit

f7ecbbe

•

1 Parent(s): 96e08bb

Update README.md

Browse files

Files changed (1) hide show

README.md +68 -72

README.md CHANGED Viewed

@@ -8,12 +8,11 @@ tags:
 - krx
 ---
-# Qwen 2.5 7B Instruct Model Fine-tuning
-This repository contains code for fine-tuning the Qwen 2.5 7B Instruct model using Amazon SageMaker. The project uses QLoRA (Quantized Low-Rank Adaptation) for efficient fine-tuning of large language models.
-## Project Structure
 ```
 .
@@ -26,33 +25,33 @@ This repository contains code for fine-tuning the Qwen 2.5 7B Instruct model usi
 └── README.md
 ```
-## Prerequisites
-- Amazon SageMaker access
-- Hugging Face account and access token
-- AWS credentials configured
 - Python 3.10+
-## Environment Setup
-The project uses the following key dependencies:
 - PyTorch 2.1.0
-- Transformers (latest from main branch)
 - Accelerate >= 0.27.0
 - PEFT >= 0.6.0
 - BitsAndBytes >= 0.41.0
-## Model Configuration
-- Base Model: `Qwen/Qwen2.5-7B-Instruct`
-- Training Method: QLoRA (4-bit quantization)
-- Instance Type: ml.p5.48xlarge
-- Distribution Strategy: PyTorch DDP
-## Training Configuration
-### Hyperparameters
 ```python
 {
@@ -72,80 +71,77 @@ The project uses the following key dependencies:
 }
 ```
-### Environment Variables
-The training environment is configured with optimizations for distributed training and memory management:
-- CUDA device configuration
-- Memory optimization settings
-- EFA (Elastic Fabric Adapter) configuration for distributed training
-- Hugging Face token and cache settings
-## Training Process
-1. **Environment Preparation**:
-   - Creates `requirements.txt` with necessary dependencies
-   - Generates `bootstrap.sh` for Transformers installation
-   - Sets up SageMaker training configuration
-2. **Model Loading**:
-   - Loads the base Qwen 2.5 7B model with 4-bit quantization
-   - Configures BitsAndBytes for quantization
-   - Prepares model for k-bit training
-3. **Dataset Processing**:
-   - Uses the Sujet Finance dataset
-   - Formats conversations in Qwen2 format
-   - Applies tokenization with maximum length of 2048 tokens
-   - Implements data preprocessing with parallel processing
-4. **Training**:
-   - Implements gradient checkpointing for memory efficiency
-   - Uses cosine learning rate schedule with warmup
-   - Saves checkpoints every 50 steps
-   - Logs training metrics every 10 steps
-## Monitoring and Metrics
-The training process tracks the following metrics:
-- Training loss
-- Evaluation loss
-## Error Handling
-The implementation includes comprehensive error handling and logging:
-- Environment validation
-- Dataset preparation verification
-- Training process monitoring
-- Detailed error messages and stack traces
-## Usage
-1. Configure AWS credentials and SageMaker role
-2. Set up Hugging Face token
-3. Run the training script:
 ```bash
 python sagemaker_train.py
 ```
-## Custom Components
-### Custom Tokenizer
-The project includes a custom implementation of the Qwen2 tokenizer (`tokenization_qwen2.py`) with:
-- Special token handling
-- Unicode normalization
-- Vocabulary management
-- Input preparation for model training
-## Notes
-- The training script is optimized for the ml.p5.48xlarge instance type
-- Uses PyTorch Distributed Data Parallel for training
-- Implements gradient checkpointing for memory optimization
-- Includes automatic retry mechanism for training failures
-## License
-[Add License Information]

 - krx
 ---
+# Qwen 2.5 7B Instruct 모델 파인튜닝
+이 저장소는 Amazon SageMaker를 사용하여 Qwen 2.5 7B Instruct 모델을 파인튜닝하는 코드를 포함하고 있습니다. 이 프로젝트는 대규모 언어 모델의 효율적인 파인튜닝을 위해 QLoRA(Quantized Low-Rank Adaptation)를 사용합니다.
+## 프로젝트 구조
 ```
 .
 └── README.md
 ```
+## 사전 요구사항
+- Amazon SageMaker 접근 권한
+- Hugging Face 계정 및 접근 토큰
+- AWS 자격 증명 구성
 - Python 3.10+
+## 환경 설정
+프로젝트에서 사용하는 주요 의존성:
 - PyTorch 2.1.0
+- Transformers (main 브랜치의 최신 버전)
 - Accelerate >= 0.27.0
 - PEFT >= 0.6.0
 - BitsAndBytes >= 0.41.0
+## 모델 구성
+- 기본 모델: `Qwen/Qwen2.5-7B-Instruct`
+- 학습 방법: QLoRA (4비트 양자화)
+- 인스턴스 유형: ml.p5.48xlarge
+- 분산 전략: PyTorch DDP
+## 학습 구성
+### 하이퍼파라미터
 ```python
 {
 }
 ```
+### 환경 변수
+학습 환경은 분산 학습 및 메모리 관리를 위한 최적화로 구성되어 있습니다:
+- CUDA 장치 구성
+- 메모리 최적화 설정
+- 분산 학습을 위한 EFA(Elastic Fabric Adapter) 구성
+- Hugging Face 토큰 및 캐시 설정
+## 학습 프로세스
+1. **환경 준비**:
+   - 필요한 의존성이 포함된 `requirements.txt` 생성
+   - Transformers 설치를 위한 `bootstrap.sh` 생성
+   - SageMaker 학습 구성 설정
+2. **모델 로딩**:
+   - 4비트 양자화로 기본 Qwen 2.5 7B 모델 로드
+   - 양자화를 위한 BitsAndBytes 구성
+   - k-bit 학습을 위한 모델 준비
+3. **데이터셋 처리**:
+   - Sujet Finance 데이터셋 사용
+   - Qwen2 형식으로 대화 포맷팅
+   - 최대 2048 토큰 길이로 토크나이징
+   - 병렬 처리를 통한 데이터 전처리 구현
+4. **학습**:
+   - 메모리 효율성을 위한 gradient checkpointing 구현
+   - 웜업이 포함된 코사인 학습률 스케줄 사용
+   - 50 스텝마다 체크포인트 저장
+   - 10 스텝마다 학습 메트릭 로깅
+## 모니터링 및 메트릭
+학습 과정에서 다음 메트릭을 추적합니다:
+- 학습 손실(Training loss)
+- 평가 손실(Evaluation loss)
+## 오류 처리
+구현에는 포괄적인 오류 처리 및 로깅이 포함되어 있습니다:
+- 환경 유효성 검사
+- 데이터셋 준비 검증
+- 학습 프로세스 모니터링
+- 자세한 오류 메시지 및 스택 추적
+## 사용 방법
+1. AWS 자격 ��명 및 SageMaker 역할 구성
+2. Hugging Face 토큰 설정
+3. 학습 스크립트 실행:
 ```bash
 python sagemaker_train.py
 ```
+## 커스텀 컴포넌트
+### 커스텀 토크나이저
+프로젝트는 다음 기능이 포함된 Qwen2 토크나이저의 커스텀 구현(`tokenization_qwen2.py`)을 포함합니다:
+- 특수 토큰 처리
+- 유니코드 정규화
+- 어휘 관리
+- 모델 학습을 위한 입력 준비
+## 주의사항
+- 학습 스크립트는 ml.p5.48xlarge 인스턴스 타입에 최적화되어 있습니다
+- PyTorch Distributed Data Parallel을 사용한 학습
+- 메모리 최적화를 위한 gradient checkpointing 구현
+- 학습 실패에 대한 자동 재시도 메커니즘 포함