# Lily LLM API 사용자 가이드

## 📋 목차

1. [시작하기](#시작하기)
2. [기본 기능](#기본-기능)
3. [고급 기능](#고급-기능)
4. [문제 해결](#문제-해결)
5. [모범 사례](#모범-사례)

## 🚀 시작하기

### 시스템 요구사항

- **최소 사양**:
  - CPU: 4코어 이상
  - RAM: 8GB 이상
  - 저장공간: 20GB 이상
  - GPU: 선택사항 (CUDA 지원 시 성능 향상)

- **권장 사양**:
  - CPU: 8코어 이상
  - RAM: 16GB 이상
  - 저장공간: 50GB 이상
  - GPU: NVIDIA RTX 3060 이상 (CUDA 지원)

### 설치 및 실행

#### 1. Docker를 사용한 배포 (권장)

```bash
# 저장소 클론
git clone <repository-url>
cd lily_generate_package

# 배포 실행
chmod +x scripts/deploy.sh
./scripts/deploy.sh deploy

# 상태 확인
./scripts/deploy.sh status
```

#### 2. 로컬 개발 환경

```bash
# 가상환경 생성
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# 의존성 설치
pip install -r requirements.txt

# NLTK 데이터 다운로드
python -c "import nltk; nltk.download('punkt'); nltk.download('punkt_tab')"

# 서버 실행
python run_server_v2.py
```

### 첫 번째 요청

```bash
# 서버 상태 확인
curl http://localhost:8001/health

# 모델 목록 조회
curl http://localhost:8001/models

# 간단한 텍스트 생성
curl -X POST http://localhost:8001/generate \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "prompt=안녕하세요!&model_id=polyglot-ko-1.3b-chat&max_length=100"
```

## 🤖 기본 기능

### 1. 텍스트 생성

#### 단순 텍스트 생성

```python
import requests

def generate_text(prompt, model_id="polyglot-ko-1.3b-chat"):
    url = "http://localhost:8001/generate"
    data = {
        "prompt": prompt,
        "model_id": model_id,
        "max_length": 200,
        "temperature": 0.7,
        "top_p": 0.9,
        "do_sample": True
    }
    
    response = requests.post(url, data=data)
    return response.json()

# 사용 예제
result = generate_text("인공지능의 미래에 대해 설명해주세요.")
print(result["generated_text"])
```

#### 파라미터 설명

| 파라미터 | 설명 | 기본값 | 범위 |
|----------|------|--------|------|
| `prompt` | 입력 텍스트 | 필수 | - |
| `model_id` | 사용할 모델 | polyglot-ko-1.3b-chat | 사용 가능한 모델 목록 |
| `max_length` | 최대 토큰 수 | 200 | 1-4000 |
| `temperature` | 창의성 조절 | 0.7 | 0.0-2.0 |
| `top_p` | 누적 확률 임계값 | 0.9 | 0.0-1.0 |
| `do_sample` | 샘플링 사용 여부 | True | True/False |

### 2. 멀티모달 처리

#### 이미지와 텍스트 함께 처리

```python
def generate_multimodal(prompt, image_files, model_id="kanana-1.5-v-3b-instruct"):
    url = "http://localhost:8001/generate-multimodal"
    
    files = []
    for i, image_file in enumerate(image_files):
        files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))
    
    data = {
        "prompt": prompt,
        "model_id": model_id,
        "max_length": 200,
        "temperature": 0.7
    }
    
    response = requests.post(url, files=files, data=data)
    return response.json()

# 사용 예제
result = generate_multimodal(
    "이 이미지에 대해 설명해주세요.",
    ["image1.jpg", "image2.jpg"]
)
print(result["generated_text"])
```

### 3. 사용자 관리

#### 사용자 등록 및 로그인

```python
def register_user(username, email, password):
    url = "http://localhost:8001/auth/register"
    data = {
        "username": username,
        "email": email,
        "password": password
    }
    
    response = requests.post(url, data=data)
    return response.json()

def login_user(username, password):
    url = "http://localhost:8001/auth/login"
    data = {
        "username": username,
        "password": password
    }
    
    response = requests.post(url, data=data)
    return response.json()

# 사용 예제
# 1. 사용자 등록
register_result = register_user("testuser", "test@example.com", "password123")
access_token = register_result["access_token"]

# 2. 로그인
login_result = login_user("testuser", "password123")
access_token = login_result["access_token"]
```

#### 인증이 필요한 요청

```python
def authenticated_request(url, data, token):
    headers = {"Authorization": f"Bearer {token}"}
    response = requests.post(url, data=data, headers=headers)
    return response.json()

# 사용 예제
result = authenticated_request(
    "http://localhost:8001/generate",
    {"prompt": "안녕하세요!", "model_id": "polyglot-ko-1.3b-chat"},
    access_token
)
```

## 📄 고급 기능

### 1. 문서 처리 (RAG)

#### 문서 업로드

```python
def upload_document(file_path, user_id, token=None):
    url = "http://localhost:8001/document/upload"
    
    with open(file_path, 'rb') as f:
        files = {'file': f}
        data = {'user_id': user_id}
        headers = {"Authorization": f"Bearer {token}"} if token else {}
        
        response = requests.post(url, files=files, data=data, headers=headers)
        return response.json()

# 사용 예제
result = upload_document("document.pdf", "user123", access_token)
document_id = result["document_id"]
```

#### RAG 쿼리

```python
def rag_query(query, user_id, token=None):
    url = "http://localhost:8001/rag/generate"
    
    data = {
        "query": query,
        "user_id": user_id,
        "max_length": 300,
        "temperature": 0.7
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

# 사용 예제
result = rag_query("인공지능의 미래에 대해 알려주세요.", "user123", access_token)
print(result["response"])
print("출처:", result["sources"])
```

#### 하이브리드 RAG (이미지 + 문서)

```python
def hybrid_rag_query(query, image_files, user_id, token=None):
    url = "http://localhost:8001/rag/generate-hybrid"
    
    files = []
    for i, image_file in enumerate(image_files):
        files.append(('image_files', (f'image_{i}.jpg', open(image_file, 'rb'), 'image/jpeg')))
    
    data = {
        "query": query,
        "user_id": user_id,
        "max_length": 300,
        "temperature": 0.7
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, files=files, data=data, headers=headers)
    return response.json()
```

### 2. 채팅 세션 관리

#### 세션 생성 및 메시지 관리

```python
def create_chat_session(user_id, session_name, token=None):
    url = "http://localhost:8001/session/create"
    
    data = {
        "user_id": user_id,
        "session_name": session_name
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

def add_chat_message(session_id, user_id, content, token=None):
    url = "http://localhost:8001/chat/message"
    
    data = {
        "session_id": session_id,
        "user_id": user_id,
        "message_type": "text",
        "content": content
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

def get_chat_history(session_id, token=None):
    url = f"http://localhost:8001/chat/history/{session_id}"
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.get(url, headers=headers)
    return response.json()

# 사용 예제
# 1. 세션 생성
session_result = create_chat_session("user123", "AI 상담", access_token)
session_id = session_result["session_id"]

# 2. 메시지 추가
add_chat_message(session_id, "user123", "안녕하세요!", access_token)

# 3. 채팅 기록 조회
history = get_chat_history(session_id, access_token)
for message in history:
    print(f"{message['timestamp']}: {message['content']}")
```

### 3. 백그라운드 작업

#### 문서 처리 작업

```python
def start_document_processing(file_path, user_id, token=None):
    url = "http://localhost:8001/tasks/document/process"
    
    data = {
        "file_path": file_path,
        "user_id": user_id
    }
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.post(url, data=data, headers=headers)
    return response.json()

def check_task_status(task_id, token=None):
    url = f"http://localhost:8001/tasks/{task_id}"
    headers = {"Authorization": f"Bearer {token}"} if token else {}
    
    response = requests.get(url, headers=headers)
    return response.json()

# 사용 예제
# 1. 작업 시작
task_result = start_document_processing("/path/to/document.pdf", "user123", access_token)
task_id = task_result["task_id"]

# 2. 작업 상태 확인
import time
while True:
    status = check_task_status(task_id, access_token)
    print(f"상태: {status['status']}, 진행률: {status.get('progress', 0)}%")
    
    if status['status'] in ['SUCCESS', 'FAILURE']:
        break
    
    time.sleep(5)
```

### 4. 모니터링

#### 성능 모니터링

```python
def start_monitoring():
    url = "http://localhost:8001/monitoring/start"
    response = requests.post(url)
    return response.json()

def get_monitoring_status():
    url = "http://localhost:8001/monitoring/status"
    response = requests.get(url)
    return response.json()

def get_system_health():
    url = "http://localhost:8001/monitoring/health"
    response = requests.get(url)
    return response.json()

# 사용 예제
# 1. 모니터링 시작
start_monitoring()

# 2. 상태 확인
status = get_monitoring_status()
print(f"CPU 사용률: {status['current_metrics']['cpu_percent']}%")
print(f"메모리 사용률: {status['current_metrics']['memory_percent']}%")

# 3. 시스템 건강 상태
health = get_system_health()
print(f"시스템 상태: {health['status']}")
for recommendation in health['recommendations']:
    print(f"권장사항: {recommendation}")
```

## 🔌 WebSocket 실시간 채팅

### WebSocket 클라이언트

```javascript
class LilyLLMWebSocket {
    constructor(userId) {
        this.userId = userId;
        this.ws = null;
        this.messageHandlers = [];
    }
    
    connect() {
        this.ws = new WebSocket(`ws://localhost:8001/ws/${this.userId}`);
        
        this.ws.onopen = () => {
            console.log('WebSocket 연결됨');
        };
        
        this.ws.onmessage = (event) => {
            const data = JSON.parse(event.data);
            this.handleMessage(data);
        };
        
        this.ws.onclose = () => {
            console.log('WebSocket 연결 종료');
        };
        
        this.ws.onerror = (error) => {
            console.error('WebSocket 오류:', error);
        };
    }
    
    sendMessage(message, sessionId) {
        if (this.ws && this.ws.readyState === WebSocket.OPEN) {
            this.ws.send(JSON.stringify({
                type: 'chat',
                message: message,
                session_id: sessionId
            }));
        }
    }
    
    addMessageHandler(handler) {
        this.messageHandlers.push(handler);
    }
    
    handleMessage(data) {
        this.messageHandlers.forEach(handler => handler(data));
    }
    
    disconnect() {
        if (this.ws) {
            this.ws.close();
        }
    }
}

// 사용 예제
const wsClient = new LilyLLMWebSocket('user123');
wsClient.connect();

wsClient.addMessageHandler((data) => {
    console.log('메시지 수신:', data);
});

wsClient.sendMessage('안녕하세요!', 'session123');
```

## 🚨 문제 해결

### 일반적인 문제들

#### 1. 서버 연결 실패

**증상**: `Connection refused` 또는 `Failed to establish a new connection`

**해결 방법**:
```bash
# 서버 상태 확인
curl http://localhost:8001/health

# 서버 재시작
./scripts/deploy.sh restart

# 로그 확인
./scripts/deploy.sh logs
```

#### 2. 메모리 부족

**증상**: `Out of memory` 또는 응답 속도 저하

**해결 방법**:
```bash
# 메모리 사용량 확인
docker stats

# 불필요한 컨테이너 정리
docker system prune -f

# 리소스 제한 설정 (docker-compose.yml)
services:
  lily-llm-api:
    deploy:
      resources:
        limits:
          memory: 4G
```

#### 3. 모델 로딩 실패

**증상**: `Model not found` 또는 모델 관련 오류

**해결 방법**:
```bash
# 모델 목록 확인
curl http://localhost:8001/models

# 모델 파일 확인
ls -la models/

# 서버 재시작
./scripts/deploy.sh restart
```

#### 4. 인증 오류

**증상**: `401 Unauthorized` 또는 `403 Forbidden`

**해결 방법**:
```python
# 토큰 갱신
def refresh_token(refresh_token):
    url = "http://localhost:8001/auth/refresh"
    data = {"refresh_token": refresh_token}
    response = requests.post(url, data=data)
    return response.json()

# 새로운 토큰으로 요청
new_tokens = refresh_token(old_refresh_token)
access_token = new_tokens["access_token"]
```

### 성능 최적화

#### 1. 배치 처리

```python
def batch_generate_texts(prompts, model_id="polyglot-ko-1.3b-chat"):
    results = []
    for prompt in prompts:
        result = generate_text(prompt, model_id)
        results.append(result)
    return results

# 사용 예제
prompts = [
    "첫 번째 질문입니다.",
    "두 번째 질문입니다.",
    "세 번째 질문입니다."
]
results = batch_generate_texts(prompts)
```

#### 2. 캐싱 활용

```python
import redis
import json

class CachedLilyLLMClient:
    def __init__(self, base_url="http://localhost:8001"):
        self.base_url = base_url
        self.redis_client = redis.Redis(host='localhost', port=6379, db=0)
    
    def generate_text_with_cache(self, prompt, model_id="polyglot-ko-1.3b-chat"):
        # 캐시 키 생성
        cache_key = f"text_gen:{hash(prompt + model_id)}"
        
        # 캐시에서 확인
        cached_result = self.redis_client.get(cache_key)
        if cached_result:
            return json.loads(cached_result)
        
        # API 호출
        result = generate_text(prompt, model_id)
        
        # 캐시에 저장 (1시간)
        self.redis_client.setex(cache_key, 3600, json.dumps(result))
        
        return result
```

## 📚 모범 사례

### 1. 에러 처리

```python
import requests
from requests.exceptions import RequestException

def safe_api_call(func, *args, **kwargs):
    try:
        return func(*args, **kwargs)
    except RequestException as e:
        print(f"네트워크 오류: {e}")
        return None
    except Exception as e:
        print(f"예상치 못한 오류: {e}")
        return None

# 사용 예제
result = safe_api_call(generate_text, "안녕하세요!")
if result:
    print(result["generated_text"])
```

### 2. 재시도 로직

```python
import time
from functools import wraps

def retry_on_failure(max_retries=3, delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise e
                    print(f"시도 {attempt + 1} 실패, {delay}초 후 재시도...")
                    time.sleep(delay)
            return None
        return wrapper
    return decorator

# 사용 예제
@retry_on_failure(max_retries=3, delay=2)
def robust_generate_text(prompt):
    return generate_text(prompt)
```

### 3. 비동기 처리

```python
import asyncio
import aiohttp

async def async_generate_text(session, prompt, model_id="polyglot-ko-1.3b-chat"):
    url = "http://localhost:8001/generate"
    data = {
        "prompt": prompt,
        "model_id": model_id,
        "max_length": 200,
        "temperature": 0.7
    }
    
    async with session.post(url, data=data) as response:
        return await response.json()

async def batch_generate_async(prompts):
    async with aiohttp.ClientSession() as session:
        tasks = [async_generate_text(session, prompt) for prompt in prompts]
        results = await asyncio.gather(*tasks)
        return results

# 사용 예제
prompts = ["질문1", "질문2", "질문3"]
results = asyncio.run(batch_generate_async(prompts))
```

### 4. 로깅

```python
import logging

# 로깅 설정
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('lily_llm_client.log'),
        logging.StreamHandler()
    ]
)

logger = logging.getLogger(__name__)

def generate_text_with_logging(prompt, model_id="polyglot-ko-1.3b-chat"):
    logger.info(f"텍스트 생성 시작: {prompt[:50]}...")
    
    try:
        result = generate_text(prompt, model_id)
        logger.info(f"텍스트 생성 성공: {len(result['generated_text'])} 문자")
        return result
    except Exception as e:
        logger.error(f"텍스트 생성 실패: {e}")
        raise
```

## 📞 지원

### 도움말 리소스

- **API 문서**: `http://localhost:8001/docs`
- **ReDoc 문서**: `http://localhost:8001/redoc`
- **GitHub Issues**: 프로젝트 저장소의 Issues 섹션
- **로그 파일**: `./logs/` 디렉토리

### 디버깅 팁

1. **로그 확인**: 항상 로그를 먼저 확인하세요
2. **단계별 테스트**: 복잡한 요청을 작은 단위로 나누어 테스트하세요
3. **네트워크 확인**: 방화벽이나 프록시 설정을 확인하세요
4. **리소스 모니터링**: CPU, 메모리, 디스크 사용량을 주기적으로 확인하세요

### 성능 팁

1. **적절한 모델 선택**: 작업에 맞는 모델을 선택하세요
2. **배치 처리**: 여러 요청을 한 번에 처리하세요
3. **캐싱 활용**: 반복되는 요청은 캐시를 사용하세요
4. **비동기 처리**: 대량의 요청은 비동기로 처리하세요