seong67360
/

Qwen2.5-7B-Instruct_v4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

South-Korea-Shuan commited on Nov 7, 2024

Commit

37fe333

·

verified ·

1 Parent(s): bdc8a19

Update README.md

Files changed (1) hide show

README.md +39 -5

README.md CHANGED Viewed

@@ -11,11 +11,26 @@ tags:
 이 저장소는 Amazon SageMaker를 사용하여 Qwen 2.5 7B Instruct 모델을 파인튜닝하는 코드를 포함하고 있습니다. 이 프로젝트는 대규모 언어 모델의 효율적인 파인튜닝을 위해 QLoRA(Quantized Low-Rank Adaptation)를 사용합니다.
 ## 모델 사용 방법
-파인튜닝된 모델은 다음과 같이 사용할 수 있습니다:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 # 모델과 토크나이저 로드
 model = AutoModelForCausalLM.from_pretrained(
     "seong67360/Qwen2.5-7B-Instruct_v4",
@@ -27,20 +42,38 @@ tokenizer = AutoTokenizer.from_pretrained(
     trust_remote_code=True
 )
-# 대화 형식 예시
 messages = [
     {"role": "system", "content": "You are a helpful assistant."},
     {"role": "user", "content": "What is quantum computing?"}
 ]
-# 토크나이징 및 생성
 response = model.chat(tokenizer, messages)
 print(response)
 ```
-# 주요 파라미터
-생성 시 다음 파라미터를 조정할 수 있습니다:
 ```python
 response = model.chat(
     tokenizer,
@@ -52,6 +85,7 @@ response = model.chat(
 )
 ```
 ## 프로젝트 구조

 이 저장소는 Amazon SageMaker를 사용하여 Qwen 2.5 7B Instruct 모델을 파인튜닝하는 코드를 포함하고 있습니다. 이 프로젝트는 대규모 언어 모델의 효율적인 파인튜닝을 위해 QLoRA(Quantized Low-Rank Adaptation)를 사용합니다.
 ## 모델 사용 방법
+### 요구사항
+- Python 3.8 이상
+- CUDA 지원 GPU (최소 24GB VRAM 권장)
+- 필요한 라이브러리:
+```bash
+pip install torch transformers accelerate
+```
+## 기본 사용 예시
 ```python
+import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
+# CUDA 사용 가능 여부 확인
+if torch.cuda.is_available():
+    print(f"Using GPU: {torch.cuda.get_device_name(0)}")
+else:
+    print("Warning: CUDA not available, using CPU")
 # 모델과 토크나이저 로드
 model = AutoModelForCausalLM.from_pretrained(
     "seong67360/Qwen2.5-7B-Instruct_v4",
     trust_remote_code=True
 )
+# 대화 예시
 messages = [
     {"role": "system", "content": "You are a helpful assistant."},
     {"role": "user", "content": "What is quantum computing?"}
 ]
+# 응답 생성
 response = model.chat(tokenizer, messages)
 print(response)
 ```
+## 메모리 최적화 옵션
+GPU 메모리가 제한된 경우, 8비트 또는 4비트 양자화를 사용할 수 있습니다:
+```python
+# 8비트 양자화
+model = AutoModelForCausalLM.from_pretrained(
+    "seong67360/Qwen2.5-7B-Instruct_v4",
+    device_map="auto",
+    trust_remote_code=True,
+    load_in_8bit=True
+)
+# 또는 4비트 양자화
+model = AutoModelForCausalLM.from_pretrained(
+    "seong67360/Qwen2.5-7B-Instruct_v4",
+    device_map="auto",
+    trust_remote_code=True,
+    load_in_4bit=True
+)
+```
+## 생성 파라미터 설정
 ```python
 response = model.chat(
     tokenizer,
 )
 ```
+---
 ## 프로젝트 구조