lily_fast_api / GPU_DEPLOYMENT_GUIDE.md
gbrabbit's picture
Fresh start for HF Spaces deployment
526927a

๐Ÿš€ GPU ํ™˜๊ฒฝ ๋ฐฐํฌ ๊ฐ€์ด๋“œ

๐Ÿ“‹ ์‚ฌ์ „ ์š”๊ตฌ์‚ฌํ•ญ

1. ํ•˜๋“œ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ

  • GPU: NVIDIA GPU (RTX 3060 ์ด์ƒ ๊ถŒ์žฅ)
  • ๋ฉ”๋ชจ๋ฆฌ: ์ตœ์†Œ 16GB RAM, ๊ถŒ์žฅ 32GB RAM
  • ์ €์žฅ๊ณต๊ฐ„: ์ตœ์†Œ 50GB ์—ฌ์œ  ๊ณต๊ฐ„

2. ์†Œํ”„ํŠธ์›จ์–ด ์š”๊ตฌ์‚ฌํ•ญ

NVIDIA ๋“œ๋ผ์ด๋ฒ„ ์„ค์น˜

# Ubuntu/Debian
sudo apt update
sudo apt install nvidia-driver-470

# Windows
# NVIDIA ์›น์‚ฌ์ดํŠธ์—์„œ ์ตœ์‹  ๋“œ๋ผ์ด๋ฒ„ ๋‹ค์šด๋กœ๋“œ

CUDA ์„ค์น˜

# CUDA 11.8 ์„ค์น˜ (๊ถŒ์žฅ)
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

Docker ์„ค์น˜

# Ubuntu/Debian
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER

# Windows
# Docker Desktop ์„ค์น˜

NVIDIA Docker ์„ค์น˜

# NVIDIA Container Toolkit ์„ค์น˜
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

๐Ÿ”ง ํ™˜๊ฒฝ ์„ค์ •

1. GPU ํ™˜๊ฒฝ ํ™•์ธ

cd C:\Project\lily_generate_project\lily_generate_package
python check_gpu_environment.py

2. Hugging Face ์„ค์ •

# Hugging Face ํ† ํฐ ์„ค์ •
huggingface-cli login

# ๋˜๋Š” Python ์Šคํฌ๋ฆฝํŠธ๋กœ ์„ค์ •
python huggingface_gpu_setup.py

๐Ÿš€ ๋ฐฐํฌ ์‹คํ–‰

1. ์ž๋™ ๋ฐฐํฌ (๊ถŒ์žฅ)

# ๋ฐฐํฌ ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰
chmod +x deploy_gpu_huggingface.sh
./deploy_gpu_huggingface.sh

2. ์ˆ˜๋™ ๋ฐฐํฌ

# 1. ๊ธฐ์กด ์ปจํ…Œ์ด๋„ˆ ์ •๋ฆฌ
docker-compose -f docker-compose.gpu.yml down --volumes --remove-orphans

# 2. GPU ๋ฉ”๋ชจ๋ฆฌ ์ •๋ฆฌ
nvidia-smi --gpu-reset

# 3. ์ด๋ฏธ์ง€ ๋นŒ๋“œ
docker-compose -f docker-compose.gpu.yml build --no-cache

# 4. ์ปจํ…Œ์ด๋„ˆ ์‹œ์ž‘
docker-compose -f docker-compose.gpu.yml up -d

# 5. ์„œ๋น„์Šค ์ƒํƒœ ํ™•์ธ
docker-compose -f docker-compose.gpu.yml logs -f

๐Ÿงช ํ…Œ์ŠคํŠธ

1. GPU ๋ฐฐํฌ ํ…Œ์ŠคํŠธ

python test_gpu_deployment.py

2. Hugging Face ๋ชจ๋ธ ํ…Œ์ŠคํŠธ

python huggingface_gpu_setup.py

3. API ํ…Œ์ŠคํŠธ

curl http://localhost:8001/health

๐Ÿ“Š ๋ชจ๋‹ˆํ„ฐ๋ง

1. GPU ์‚ฌ์šฉ๋Ÿ‰ ํ™•์ธ

nvidia-smi
nvidia-smi -l 1  # 1์ดˆ๋งˆ๋‹ค ์—…๋ฐ์ดํŠธ

2. ์ปจํ…Œ์ด๋„ˆ ์ƒํƒœ ํ™•์ธ

docker ps
docker stats

3. ๋กœ๊ทธ ํ™•์ธ

# ์ „์ฒด ๋กœ๊ทธ
docker-compose -f docker-compose.gpu.yml logs -f

# ํŠน์ • ์„œ๋น„์Šค ๋กœ๊ทธ
docker-compose -f docker-compose.gpu.yml logs -f lily-llm-api-gpu

๐Ÿ”ง ๋ฌธ์ œ ํ•ด๊ฒฐ

1. GPU ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ

# GPU ๋ฉ”๋ชจ๋ฆฌ ์ •๋ฆฌ
nvidia-smi --gpu-reset

# ์ปจํ…Œ์ด๋„ˆ ์žฌ์‹œ์ž‘
docker-compose -f docker-compose.gpu.yml restart

2. CUDA ๋ฒ„์ „ ์ถฉ๋Œ

# CUDA ๋ฒ„์ „ ํ™•์ธ
nvcc --version

# PyTorch CUDA ๋ฒ„์ „ ํ™•์ธ
python -c "import torch; print(torch.version.cuda)"

3. Docker ๊ถŒํ•œ ๋ฌธ์ œ

# Docker ๊ทธ๋ฃน์— ์‚ฌ์šฉ์ž ์ถ”๊ฐ€
sudo usermod -aG docker $USER

# ์žฌ๋กœ๊ทธ์ธ ํ›„ ํ™•์ธ
docker ps

4. Hugging Face ํ† ํฐ ๋ฌธ์ œ

# ํ† ํฐ ์žฌ์„ค์ •
huggingface-cli logout
huggingface-cli login

๐Ÿ“ˆ ์„ฑ๋Šฅ ์ตœ์ ํ™”

1. ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™”

# 4-bit ์–‘์žํ™” ์ ์šฉ
python huggingface_gpu_setup.py

# ์„ฑ๋Šฅ ์ตœ์ ํ™” ์ ์šฉ
python performance_optimization.py

2. ๋ฐฐ์น˜ ํฌ๊ธฐ ์กฐ์ •

# config.yaml์—์„œ ๋ฐฐ์น˜ ํฌ๊ธฐ ์กฐ์ •
batch_size: 4  # GPU ๋ฉ”๋ชจ๋ฆฌ์— ๋”ฐ๋ผ ์กฐ์ •

3. ๋ชจ๋ธ ์บ์‹ฑ

# Hugging Face ์บ์‹œ ์„ค์ •
export HF_HOME="/path/to/cache"
export TRANSFORMERS_CACHE="/path/to/cache"

๐Ÿ”„ ์—…๋ฐ์ดํŠธ

1. ๋ชจ๋ธ ์—…๋ฐ์ดํŠธ

# ์ตœ์‹  ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ
python huggingface_gpu_setup.py

# ์ปจํ…Œ์ด๋„ˆ ์žฌ์‹œ์ž‘
docker-compose -f docker-compose.gpu.yml restart

2. ์ฝ”๋“œ ์—…๋ฐ์ดํŠธ

# ์ฝ”๋“œ ๋ณ€๊ฒฝ ํ›„ ์žฌ๋นŒ๋“œ
docker-compose -f docker-compose.gpu.yml build --no-cache
docker-compose -f docker-compose.gpu.yml up -d

๐Ÿ“ž ์ง€์›

๋ฌธ์ œ ๋ฐœ์ƒ ์‹œ ํ™•์ธ์‚ฌํ•ญ

  1. GPU ๋“œ๋ผ์ด๋ฒ„ ๋ฒ„์ „
  2. CUDA ๋ฒ„์ „
  3. Docker ๋ฒ„์ „
  4. ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰
  5. GPU ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰

๋กœ๊ทธ ํŒŒ์ผ ์œ„์น˜

  • Docker ๋กœ๊ทธ: docker-compose -f docker-compose.gpu.yml logs
  • ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋กœ๊ทธ: logs/ ๋””๋ ‰ํ† ๋ฆฌ
  • GPU ๋กœ๊ทธ: nvidia-smi

๐ŸŽฏ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ

๊ถŒ์žฅ ์‚ฌ์–‘๋ณ„ ์„ฑ๋Šฅ

  • RTX 3060 (12GB): ๊ธฐ๋ณธ ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ
  • RTX 3080 (10GB): ์ค‘๊ฐ„ ํฌ๊ธฐ ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ
  • RTX 3090 (24GB): ๋Œ€์šฉ๋Ÿ‰ ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ
  • RTX 4090 (24GB): ์ตœ๊ณ  ์„ฑ๋Šฅ, ๋ชจ๋“  ๋ชจ๋ธ ์‹คํ–‰ ๊ฐ€๋Šฅ

๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ๊ฐ€์ด๋“œ

  • 4-bit ์–‘์žํ™”: ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 25%
  • 8-bit ์–‘์žํ™”: ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 50%
  • 16-bit (FP16): ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 100%
  • 32-bit (FP32): ๋ชจ๋ธ ํฌ๊ธฐ์˜ ์•ฝ 200%