Instructions to use lablup/gemma-2-2b-it-xaas-kie with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lablup/gemma-2-2b-it-xaas-kie with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lablup/gemma-2-2b-it-xaas-kie")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("lablup/gemma-2-2b-it-xaas-kie")
model = AutoModelForCausalLM.from_pretrained("lablup/gemma-2-2b-it-xaas-kie")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use lablup/gemma-2-2b-it-xaas-kie with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lablup/gemma-2-2b-it-xaas-kie"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lablup/gemma-2-2b-it-xaas-kie",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lablup/gemma-2-2b-it-xaas-kie

SGLang

How to use lablup/gemma-2-2b-it-xaas-kie with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lablup/gemma-2-2b-it-xaas-kie" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lablup/gemma-2-2b-it-xaas-kie",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lablup/gemma-2-2b-it-xaas-kie" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lablup/gemma-2-2b-it-xaas-kie",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lablup/gemma-2-2b-it-xaas-kie with Docker Model Runner:
```
docker model run hf.co/lablup/gemma-2-2b-it-xaas-kie
```

XaaS Gemma 2 2B — Stage 3: KIE Fine-Tuning (Production Model)

Stage 3 of 4 in the XaaS fine-tuning pipeline for Korean international trade.

Fine-tuned from the QA model (lablup/gemma-2-2b-it-xaas-qa) for Key Information Extraction (KIE) from B2B supply-chain email threads. Given a multi-turn email conversation between a Korean buyer and an overseas supplier, the model extracts structured trade information (contract terms, parties, dates, prices, delivery schedule) as YAML. This is the production merged model deployed via vLLM in the XaaS API.

Pipeline Position

google/gemma-2-2b-it
    ↓
lablup/gemma-2-2b-it-xaas-cpt
    ↓
lablup/gemma-2-2b-it-xaas-qa
    ↓  [this model]
lablup/gemma-2-2b-it-xaas-kie  ← you are here  (production)

Training Details

Parameter	Value
Base model	`lablup/gemma-2-2b-it-xaas-qa`
Method	Supervised fine-tuning (SFT) with LoRA, then merged
LoRA rank (r)	256
LoRA alpha	32
LoRA dropout	0.05
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Learning rate	2e-3
Max sequence length	6,000 tokens
Batch size (effective)	64 (2 per GPU × 16 gradient accumulation × 2 nodes)
Optimizer	paged_adamw_8bit
Precision	bfloat16
Distributed training	DeepSpeed ZeRO-3, 2 nodes
Framework	HuggingFace TRL SFTTrainer + DeepSpeed

The LoRA adapter has been merged into the base weights. Load directly with AutoModelForCausalLM (no PEFT dependency required).

Training Data

lablup/tariff_trade_domain.synthetic_trade_email_kie_kr — 1,188 synthetic B2B supply-chain email threads, each paired with a structured YAML extraction of:

계약 및 조건 (contract terms, payment conditions)
참여자 (buyer/supplier parties)
날짜 / 이벤트 (dates, key milestones)
가격 / 배송 조건 (pricing, delivery schedule)

Generated by GPT-4o-mini across 20 industries (Aerospace, Technology, Manufacturing, Healthcare, ...).

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "lablup/gemma-2-2b-it-xaas-kie"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

def extract_kie(email_thread: str) -> str:
    prompt_text = (
        "다음 이메일 대화에서 계약 관련 정보를 YAML 형식으로 추출하세요.\n\n"
        f"{email_thread}"
    )
    messages = [{"role": "user", "content": prompt_text}]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

email = """
**Buyer Details:**
- Name: 박지훈
- Company: SkyLine Aerospace Ltd.

**Email Exchange:**
From: jihoon.park@skylineaerospace.kr
Subject: 항공용 알루미늄 부품 100개 견적 요청
...
"""
print(extract_kie(email))
# ```yaml
# 계약 및 조건:
#     결제 조건: 배송 시 결제
#     배송 일정: 주문 확인일로부터 2주 이내
# ...

OpenAI-compatible API (vLLM)

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
    model="xaas-gemma-2-2b-it-lora128",
    messages=[{
        "role": "user",
        "content": "다음 이메일 대화에서 계약 관련 정보를 YAML 형식으로 추출하세요.\n\n{email_thread}"
    }],
    max_tokens=1024,
)
print(response.choices[0].message.content)

Production Deployment

Served with vLLM at --max-model-len 8128 and --tensor-parallel-size 1. Model weights are in float16, ~5 GB.

Expected Output Format

계약 및 조건:
    결제 조건: 선불 50%, 잔금 배송 시
    배송 일정: 계약 체결 후 4주
    보증: 12개월
참여자:
    구매자: 박지훈, SkyLine Aerospace Ltd.
    공급업체: GlobalParts Inc.
날짜:
    문의일: 2024-07-26
    예상 납기: 2024-08-23
이벤트:
    - 초기 문의 및 사양 확인
    - 가격 협상 (10% 대량 할인 적용)
    - 최종 계약 합의

Limitations

Training data is LLM-generated; extraction accuracy on real emails has not been independently verified
YAML schema is fixed to the training format; highly irregular email structures may produce incomplete extractions
Optimized for Korean-buyer / English-supplier email threads; pure Korean or pure English threads may work but were less represented in training