Instructions to use lablup/gemma-2-2b-it-xaas-kie with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lablup/gemma-2-2b-it-xaas-kie with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lablup/gemma-2-2b-it-xaas-kie") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("lablup/gemma-2-2b-it-xaas-kie") model = AutoModelForCausalLM.from_pretrained("lablup/gemma-2-2b-it-xaas-kie") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use lablup/gemma-2-2b-it-xaas-kie with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lablup/gemma-2-2b-it-xaas-kie" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lablup/gemma-2-2b-it-xaas-kie", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/lablup/gemma-2-2b-it-xaas-kie
- SGLang
How to use lablup/gemma-2-2b-it-xaas-kie with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lablup/gemma-2-2b-it-xaas-kie" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lablup/gemma-2-2b-it-xaas-kie", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lablup/gemma-2-2b-it-xaas-kie" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lablup/gemma-2-2b-it-xaas-kie", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use lablup/gemma-2-2b-it-xaas-kie with Docker Model Runner:
docker model run hf.co/lablup/gemma-2-2b-it-xaas-kie
XaaS Gemma 2 2B — Stage 3: KIE Fine-Tuning (Production Model)
Stage 3 of 4 in the XaaS fine-tuning pipeline for Korean international trade.
Fine-tuned from the QA model (lablup/gemma-2-2b-it-xaas-qa) for Key Information Extraction (KIE) from B2B supply-chain email threads. Given a multi-turn email conversation between a Korean buyer and an overseas supplier, the model extracts structured trade information (contract terms, parties, dates, prices, delivery schedule) as YAML. This is the production merged model deployed via vLLM in the XaaS API.
Pipeline Position
google/gemma-2-2b-it
↓
lablup/gemma-2-2b-it-xaas-cpt
↓
lablup/gemma-2-2b-it-xaas-qa
↓ [this model]
lablup/gemma-2-2b-it-xaas-kie ← you are here (production)
Training Details
| Parameter | Value |
|---|---|
| Base model | lablup/gemma-2-2b-it-xaas-qa |
| Method | Supervised fine-tuning (SFT) with LoRA, then merged |
| LoRA rank (r) | 256 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Learning rate | 2e-3 |
| Max sequence length | 6,000 tokens |
| Batch size (effective) | 64 (2 per GPU × 16 gradient accumulation × 2 nodes) |
| Optimizer | paged_adamw_8bit |
| Precision | bfloat16 |
| Distributed training | DeepSpeed ZeRO-3, 2 nodes |
| Framework | HuggingFace TRL SFTTrainer + DeepSpeed |
The LoRA adapter has been merged into the base weights. Load directly with AutoModelForCausalLM (no PEFT dependency required).
Training Data
lablup/tariff_trade_domain.synthetic_trade_email_kie_kr — 1,188 synthetic B2B supply-chain email threads, each paired with a structured YAML extraction of:
- 계약 및 조건 (contract terms, payment conditions)
- 참여자 (buyer/supplier parties)
- 날짜 / 이벤트 (dates, key milestones)
- 가격 / 배송 조건 (pricing, delivery schedule)
Generated by GPT-4o-mini across 20 industries (Aerospace, Technology, Manufacturing, Healthcare, ...).
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "lablup/gemma-2-2b-it-xaas-kie"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
def extract_kie(email_thread: str) -> str:
prompt_text = (
"다음 이메일 대화에서 계약 관련 정보를 YAML 형식으로 추출하세요.\n\n"
f"{email_thread}"
)
messages = [{"role": "user", "content": prompt_text}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
email = """
**Buyer Details:**
- Name: 박지훈
- Company: SkyLine Aerospace Ltd.
**Email Exchange:**
From: jihoon.park@skylineaerospace.kr
Subject: 항공용 알루미늄 부품 100개 견적 요청
...
"""
print(extract_kie(email))
# ```yaml
# 계약 및 조건:
# 결제 조건: 배송 시 결제
# 배송 일정: 주문 확인일로부터 2주 이내
# ...
OpenAI-compatible API (vLLM)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
response = client.chat.completions.create(
model="xaas-gemma-2-2b-it-lora128",
messages=[{
"role": "user",
"content": "다음 이메일 대화에서 계약 관련 정보를 YAML 형식으로 추출하세요.\n\n{email_thread}"
}],
max_tokens=1024,
)
print(response.choices[0].message.content)
Production Deployment
Served with vLLM at --max-model-len 8128 and --tensor-parallel-size 1. Model weights are in float16, ~5 GB.
Expected Output Format
계약 및 조건:
결제 조건: 선불 50%, 잔금 배송 시
배송 일정: 계약 체결 후 4주
보증: 12개월
참여자:
구매자: 박지훈, SkyLine Aerospace Ltd.
공급업체: GlobalParts Inc.
날짜:
문의일: 2024-07-26
예상 납기: 2024-08-23
이벤트:
- 초기 문의 및 사양 확인
- 가격 협상 (10% 대량 할인 적용)
- 최종 계약 합의
Limitations
- Training data is LLM-generated; extraction accuracy on real emails has not been independently verified
- YAML schema is fixed to the training format; highly irregular email structures may produce incomplete extractions
- Optimized for Korean-buyer / English-supplier email threads; pure Korean or pure English threads may work but were less represented in training
License
Built on Google Gemma 2 and subject to the Gemma Terms of Use.
- Downloads last month
- 17
Model tree for lablup/gemma-2-2b-it-xaas-kie
Base model
google/gemma-2-2b