lily_fast_api / GPU_DEPLOYMENT_GUIDE.md
gbrabbit's picture
Fresh start for HF Spaces deployment
526927a
# ๐Ÿš€ GPU ํ™˜๊ฒฝ ๋ฐฐํฌ ๊ฐ€์ด๋“œ
## ๐Ÿ“‹ ์‚ฌ์ „ ์š”๊ตฌ์‚ฌํ•ญ
### 1. ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ
- **GPU**: NVIDIA GPU (RTX 3060 ์ด์ƒ ๊ถŒ์žฅ)
- **๋ฉ”๋ชจ๋ฆฌ**: ์ตœ์†Œ 16GB RAM, ๊ถŒ์žฅ 32GB RAM
- **์ €์žฅ๊ณต๊ฐ„**: ์ตœ์†Œ 50GB ์—ฌ์œ  ๊ณต๊ฐ„
### 2. ์†Œํ”„ํŠธ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ
#### NVIDIA ๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜
```bash
# Ubuntu/Debian
sudo apt update
sudo apt install nvidia-driver-470
# Windows
# NVIDIA ์›น์‚ฌ์ดํŠธ์—์„œ ์ตœ์‹  ๋“œ๋ผ์ด๋ฒ„ ๋‹ค์šด๋กœ๋“œ
```
#### CUDA ์„ค์น˜
```bash
# CUDA 11.8 ์„ค์น˜ (๊ถŒ์žฅ)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
```
#### Docker ์„ค์น˜
```bash
# Ubuntu/Debian
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Windows
# Docker Desktop ์„ค์น˜
```
#### NVIDIA Docker ์„ค์น˜
```bash
# NVIDIA Container Toolkit ์„ค์น˜
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
```
## ๐Ÿ”ง ํ™˜๊ฒฝ ์„ค์ •
### 1. GPU ํ™˜๊ฒฝ ํ™•์ธ
```bash
cd C:\Project\lily_generate_project\lily_generate_package
python check_gpu_environment.py
```
### 2. Hugging Face ์„ค์ •
```bash
# Hugging Face ํ† ํฐ ์„ค์ •
huggingface-cli login
# ๋˜๋Š” Python ์Šคํฌ๋ฆฝํŠธ๋กœ ์„ค์ •
python huggingface_gpu_setup.py
```
## ๐Ÿš€ ๋ฐฐํฌ ์‹คํ–‰
### 1. ์ž๋™ ๋ฐฐํฌ (๊ถŒ์žฅ)
```bash
# ๋ฐฐํฌ ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰
chmod +x deploy_gpu_huggingface.sh
./deploy_gpu_huggingface.sh
```
### 2. ์ˆ˜๋™ ๋ฐฐํฌ
```bash
# 1. ๊ธฐ์กด ์ปจํ…Œ์ด๋„ˆ ์ •๋ฆฌ
docker-compose -f docker-compose.gpu.yml down --volumes --remove-orphans
# 2. GPU ๋ฉ”๋ชจ๋ฆฌ ์ •๋ฆฌ
nvidia-smi --gpu-reset
# 3. ์ด๋ฏธ์ง€ ๋นŒ๋“œ
docker-compose -f docker-compose.gpu.yml build --no-cache
# 4. ์ปจํ…Œ์ด๋„ˆ ์‹œ์ž‘
docker-compose -f docker-compose.gpu.yml up -d
# 5. ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ
docker-compose -f docker-compose.gpu.yml logs -f
```
## ๐Ÿงช ํ…Œ์ŠคํŠธ
### 1. GPU ๋ฐฐํฌ ํ…Œ์ŠคํŠธ
```bash
python test_gpu_deployment.py
```
### 2. Hugging Face ๋ชจ๋ธ ํ…Œ์ŠคํŠธ
```bash
python huggingface_gpu_setup.py
```
### 3. API ํ…Œ์ŠคํŠธ
```bash
curl http://localhost:8001/health
```
## ๐Ÿ“Š ๋ชจ๋‹ˆํ„ฐ๋ง
### 1. GPU ์‚ฌ์šฉ๋Ÿ‰ ํ™•์ธ
```bash
nvidia-smi
nvidia-smi -l 1 # 1์ดˆ๋งˆ๋‹ค ์—…๋ฐ์ดํŠธ
```
### 2. ์ปจํ…Œ์ด๋„ˆ ์ƒํƒœ ํ™•์ธ
```bash
docker ps
docker stats
```
### 3. ๋กœ๊ทธ ํ™•์ธ
```bash
# ์ „์ฒด ๋กœ๊ทธ
docker-compose -f docker-compose.gpu.yml logs -f
# ํŠน์ • ์„œ๋น„์Šค ๋กœ๊ทธ
docker-compose -f docker-compose.gpu.yml logs -f lily-llm-api-gpu
```
## ๐Ÿ”ง ๋ฌธ์ œ ํ•ด๊ฒฐ
### 1. GPU ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ
```bash
# GPU ๋ฉ”๋ชจ๋ฆฌ ์ •๋ฆฌ
nvidia-smi --gpu-reset
# ์ปจํ…Œ์ด๋„ˆ ์žฌ์‹œ์ž‘
docker-compose -f docker-compose.gpu.yml restart
```
### 2. CUDA ๋ฒ„์ „ ์ถฉ๋Œ
```bash
# CUDA ๋ฒ„์ „ ํ™•์ธ
nvcc --version
# PyTorch CUDA ๋ฒ„์ „ ํ™•์ธ
python -c "import torch; print(torch.version.cuda)"
```
### 3. Docker ๊ถŒํ•œ ๋ฌธ์ œ
```bash
# Docker ๊ทธ๋ฃน์— ์‚ฌ์šฉ์ž ์ถ”๊ฐ€
sudo usermod -aG docker $USER
# ์žฌ๋กœ๊ทธ์ธ ํ›„ ํ™•์ธ
docker ps
```
### 4. Hugging Face ํ† ํฐ ๋ฌธ์ œ
```bash
# ํ† ํฐ ์žฌ์„ค์ •
huggingface-cli logout
huggingface-cli login
```
## ๐Ÿ“ˆ ์„ฑ๋Šฅ ์ตœ์ ํ™”
### 1. ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”
```bash
# 4-bit ์–‘์žํ™” ์ ์šฉ
python huggingface_gpu_setup.py
# ์„ฑ๋Šฅ ์ตœ์ ํ™” ์ ์šฉ
python performance_optimization.py
```
### 2. ๋ฐฐ์น˜ ํฌ๊ธฐ ์กฐ์ •
```python
# config.yaml์—์„œ ๋ฐฐ์น˜ ํฌ๊ธฐ ์กฐ์ •
batch_size: 4 # GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋”ฐ๋ผ ์กฐ์ •
```
### 3. ๋ชจ๋ธ ์บ์‹ฑ
```bash
# Hugging Face ์บ์‹œ ์„ค์ •
export HF_HOME="/path/to/cache"
export TRANSFORMERS_CACHE="/path/to/cache"
```
## ๐Ÿ”„ ์—…๋ฐ์ดํŠธ
### 1. ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ
```bash
# ์ตœ์‹  ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ
python huggingface_gpu_setup.py
# ์ปจํ…Œ์ด๋„ˆ ์žฌ์‹œ์ž‘
docker-compose -f docker-compose.gpu.yml restart
```
### 2. ์ฝ”๋“œ ์—…๋ฐ์ดํŠธ
```bash
# ์ฝ”๋“œ ๋ณ€๊ฒฝ ํ›„ ์žฌ๋นŒ๋“œ
docker-compose -f docker-compose.gpu.yml build --no-cache
docker-compose -f docker-compose.gpu.yml up -d
```
## ๐Ÿ“ž ์ง€์›
### ๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ ํ™•์ธ์‚ฌํ•ญ
1. GPU ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „
2. CUDA ๋ฒ„์ „
3. Docker ๋ฒ„์ „
4. ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰
5. GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰
### ๋กœ๊ทธ ํŒŒ์ผ ์œ„์น˜
- Docker ๋กœ๊ทธ: `docker-compose -f docker-compose.gpu.yml logs`
- ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋กœ๊ทธ: `logs/` ๋””๋ ‰ํ† ๋ฆฌ
- GPU ๋กœ๊ทธ: `nvidia-smi`
## ๐ŸŽฏ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ
### ๊ถŒ์žฅ ์‚ฌ์–‘๋ณ„ ์„ฑ๋Šฅ
- **RTX 3060 (12GB)**: ๊ธฐ๋ณธ ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ
- **RTX 3080 (10GB)**: ์ค‘๊ฐ„ ํฌ๊ธฐ ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ
- **RTX 3090 (24GB)**: ๋Œ€์šฉ๋Ÿ‰ ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ
- **RTX 4090 (24GB)**: ์ตœ๊ณ  ์„ฑ๋Šฅ, ๋ชจ๋“  ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ
### ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ๊ฐ€์ด๋“œ
- **4-bit ์–‘์žํ™”**: ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 25%
- **8-bit ์–‘์žํ™”**: ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 50%
- **16-bit (FP16)**: ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 100%
- **32-bit (FP32)**: ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 200%