Instructions to use lablup/gemma-2-2b-it-xaas-cpt with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lablup/gemma-2-2b-it-xaas-cpt with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lablup/gemma-2-2b-it-xaas-cpt")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("lablup/gemma-2-2b-it-xaas-cpt", dtype="auto") - PEFT
How to use lablup/gemma-2-2b-it-xaas-cpt with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use lablup/gemma-2-2b-it-xaas-cpt with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lablup/gemma-2-2b-it-xaas-cpt" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lablup/gemma-2-2b-it-xaas-cpt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/lablup/gemma-2-2b-it-xaas-cpt
- SGLang
How to use lablup/gemma-2-2b-it-xaas-cpt with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lablup/gemma-2-2b-it-xaas-cpt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lablup/gemma-2-2b-it-xaas-cpt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lablup/gemma-2-2b-it-xaas-cpt" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lablup/gemma-2-2b-it-xaas-cpt", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use lablup/gemma-2-2b-it-xaas-cpt with Docker Model Runner:
docker model run hf.co/lablup/gemma-2-2b-it-xaas-cpt
XaaS Gemma 2 2B β Stage 1: Continual Pre-Training (CPT)
Stage 1 of 4 in the XaaS fine-tuning pipeline for Korean international trade.
This model adapts google/gemma-2-2b-it to the Korean trade domain through continual pre-training on a curated corpus of Korean customs, HS code classification, Incoterms, and international trade regulatory text. It serves as the foundation for all downstream XaaS task-specific fine-tunes.
Pipeline Position
google/gemma-2-2b-it
β [this model]
lablup/gemma-2-2b-it-xaas-cpt β you are here
β
lablup/gemma-2-2b-it-xaas-qa (trade domain QA)
β
lablup/gemma-2-2b-it-xaas-kie (KIE from B2B emails)
lablup/gemma-2-2b-it-xaas-sum-tag (email summarization + tagging)
Training Details
| Parameter | Value |
|---|---|
| Base model | google/gemma-2-2b-it |
| Method | Continual pre-training with LoRA |
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 1 |
| Learning rate | 4e-4 |
| Max sequence length | 2,500 tokens |
| Optimizer | AdamW |
| Precision | float32 |
| Framework | HuggingFace Transformers + PEFT + Accelerate |
Training Data
Internal Korean trade-domain text corpus (XaaS/train_dataset/cpt_dataset/concatenated_dataset) covering:
- Korean Customs Act (κ΄μΈλ²) and trade regulations
- HS code classification explanatory notes (κ΄μΈμ¨ν ν΄μ€μ)
- Incoterms and international trade terminology
- Trade finance and letter-of-credit documentation
How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "lablup/gemma-2-2b-it-xaas-cpt"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
# Gemma 2 chat format
messages = [{"role": "user", "content": "μ μ©μ₯(L/C)μ κ°μ€ μ μ°¨λ₯Ό μ€λͺ
ν΄μ£ΌμΈμ."}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Downstream Models
| Model | Task |
|---|---|
| lablup/gemma-2-2b-it-xaas-qa | Korean trade QA (21,399 QA pairs) |
| lablup/gemma-2-2b-it-xaas-kie | B2B email key-information extraction |
| lablup/gemma-2-2b-it-xaas-sum-tag | Email summarization + tagging |
Limitations
- Fine-tuned for Korean trade domain; general-purpose performance may be degraded compared to base Gemma 2
- Knowledge cutoff is inherited from
google/gemma-2-2b-it; recent regulatory changes are not covered - CPT corpus is domain-specific and does not cover all Korean language use cases
License
This model is built on Google Gemma 2 and is subject to the Gemma Terms of Use. Fine-tuned weights are released under the same terms.