Instructions to use rudalson/Llama-3.2-3B-Instruct-KoAlpaca with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rudalson/Llama-3.2-3B-Instruct-KoAlpaca with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rudalson/Llama-3.2-3B-Instruct-KoAlpaca")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("rudalson/Llama-3.2-3B-Instruct-KoAlpaca")
model = AutoModelForCausalLM.from_pretrained("rudalson/Llama-3.2-3B-Instruct-KoAlpaca")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use rudalson/Llama-3.2-3B-Instruct-KoAlpaca with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rudalson/Llama-3.2-3B-Instruct-KoAlpaca"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rudalson/Llama-3.2-3B-Instruct-KoAlpaca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rudalson/Llama-3.2-3B-Instruct-KoAlpaca

SGLang

How to use rudalson/Llama-3.2-3B-Instruct-KoAlpaca with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rudalson/Llama-3.2-3B-Instruct-KoAlpaca" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rudalson/Llama-3.2-3B-Instruct-KoAlpaca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rudalson/Llama-3.2-3B-Instruct-KoAlpaca" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rudalson/Llama-3.2-3B-Instruct-KoAlpaca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use rudalson/Llama-3.2-3B-Instruct-KoAlpaca with Docker Model Runner:
```
docker model run hf.co/rudalson/Llama-3.2-3B-Instruct-KoAlpaca
```

Llama-3.2-3B-KoAlpaca-Merged

이 모델은 Meta의 Llama-3.2-3B-Instruct 모델을 베이스로 하여, 한국어 지시어 데이터셋인 KoAlpaca v1.1a를 학습(Fine-tuning)시킨 후 병합한 모델입니다. 한국어 질의응답 및 지시 이행 능력을 향상시키는 데 중점을 두었습니다.

Model Details

Model Description

Model type: Causal Language Model
Language(s) (NLP): 한국어 (Korean)
License: Llama 3.2 Community License
Finetuned from model: meta-llama/Llama-3.2-3B-Instruct
Dataset: Taegyuu/KoAlpaca-v1.1a

Uses

Direct Use

이 모델은 한국어 질문에 답변하거나, 주어진 지시에 따라 텍스트를 생성하는 작업에 직접 사용할 수 있습니다.

Prompt Format (Llama 3.2)

이 모델은 대화형 구조로 학습되었으므로 아래와 같은 메시지 형식을 권장합니다.

messages = [
    {"role": "system", "content": "당신은 한국어 질의응답 전문가입니다. 주어진 문맥을 바탕으로 정확하고 간결한 답변을 제공하세요."},
    {"role": "user", "content": "인공지능이란 무엇인가요?"}
]

Training Hyperparameters

The following hyperparameters were used during training:

LoRA Specifics:
- r: 32
- lora_alpha: 64
- target_modules: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
- lora_dropout: 0.05
Optimization:
- learning_rate: 2e-04
- lr_scheduler_type: cosine
- warmup_steps: 100
- optimizer: AdamW
- weight_decay: default (0.0)
Training Strategy:
- num_train_epochs: 1
- total_batch_size: 8 (2 per device * 4 accumulation steps)
- gradient_checkpointing: True
- fp16: True
- neftune_noise_alpha: 5

Evaluation

Metric	Score
F1 Score	11.40%
ROUGE-1	4.94%
ROUGE-L	4.62%

Support

SSAFY Tesla V100-PCIE-32GB

Downloads last month: 169

Safetensors

Model size

3B params

Tensor type

F16

Model tree for rudalson/Llama-3.2-3B-Instruct-KoAlpaca

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(1604)

this model

rudalson
/

Llama-3.2-3B-Instruct-KoAlpaca