Gemma‑4 HS Code Classifier (Cambodia Customs)

A Gemma‑4‑E4B‑it model fine‑tuned with QLoRA to classify product descriptions into 8‑digit HS codes and return corresponding Cambodian trade rates (Customs Duty, Special Tax, VAT, Excise Tax).

Built with Unsloth for fast, memory‑efficient fine‑tuning on a single T4 GPU.

🎯 What it does

Given a plain‑English product description, the model generates:

HS Code: 61091000
Unit: PIECE
Customs Duty: 25%
Special Tax: 0%
VAT: 10%
Excise Tax: 0%

⚠️ Important: The rates in the text are generated by the model and may be wrong.
For production, always use the included lookup table (hs_code_lookup.json) – see Production use below.

🚀 Quick start (in Colab or locally)

This repository contains only the LoRA adapter, not the full model.
Loading it will automatically download the base model (unsloth/gemma-4-E4B-it) and apply the adapter in 4-bit.


# %% [Install]
%%capture
import os, re
# Install everything needed for the T4 Colab environment
!pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
!pip install --no-deps unsloth_zoo bitsandbytes accelerate xformers peft trl triton unsloth
!pip install --no-deps --upgrade "torchao>=0.16.0"
!pip install --no-deps transformers==5.5.0 "tokenizers>=0.22.0,<=0.23.0"
!pip install torchcodec
import torch
torch._dynamo.config.recompile_limit = 64


import warnings

# Suppress the specific PyTorch size check warning from bitsandbytes
warnings.filterwarnings(
    "ignore", 
    category=FutureWarning, 
    message=".*_check_is_size will be removed in a future PyTorch release.*"
)

#------------

from unsloth import FastModel

model, tokenizer = FastModel.from_pretrained(
    "Sothay/gemma4-hscode-classifier",   # LoRA adapter on Hugging Face
    load_in_4bit = True,                 # required – the adapter was trained in 4-bit
    max_seq_length = 1024,
)

# ---------- Inference with the authoritative lookup table (recommended) ----------
import json, re

with open("hs_code_lookup.json") as f:
    rate_lookup = json.load(f)

def predict_hs_code(description: str) -> dict:
    system_prompt = (
        "You are a customs compliance AI. Classify the product description to its "
        "correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
        "Special Tax, VAT, Excise Tax) and unit."
    )
    messages = [
        {"role": "system", "content": [{"type": "text", "text": system_prompt}]},
        {"role": "user",   "content": [{"type": "text", "text": f"Description: {description}"}]},
    ]
    inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
    out = model.generate(inputs, max_new_tokens=80, do_sample=False)
    text = tokenizer.decode(out[0][inputs.shape[1]:], skip_special_tokens=True)

    m = re.search(r"HS Code:\s*([0-9]{4,10})", text)
    code = m.group(1) if m else None
    if code and code in rate_lookup:
        return {"hs_code": code, "source": "lookup_table", **rate_lookup[code]}
    return {"hs_code": code, "source": "model_only_UNVERIFIED", "raw_output": text}

print(predict_hs_code("Men's cotton knitted T-shirt"))

🔍 Raw model output (debugging)

If you want to see exactly what the model generated (including the rates it predicted) without the lookup table, use the raw‑output function below.
Do not use these rates in production – they are only for debugging or confidence evaluation.

def predict_hs_code_raw(description: str, max_new_tokens=100) -> dict:
    system_prompt = (
        "You are a customs compliance AI. Classify the product description to its "
        "correct 8-digit HS code and output the corresponding trade rates (Customs Duty, "
        "Special Tax, VAT, Excise Tax) and unit."
    )
    messages = [
        {"role": "system", "content": [{"type": "text", "text": system_prompt}]},
        {"role": "user",   "content": [{"type": "text", "text": f"Description: {description}"}]},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, tokenize=True,
        return_dict=True, return_tensors="pt",
    ).to("cuda")

    out = model.generate(**inputs, max_new_tokens=max_new_tokens, use_cache=True, do_sample=False)
    raw_text = tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

    def extract(pattern, text):
        m = re.search(pattern, text)
        return m.group(1).strip() if m else None

    return {
        "hs_code":   extract(r"HS Code:\s*([0-9.]+)", raw_text),
        "unit":      extract(r"Unit:\s*(.*)", raw_text),
        "cd_rate":   extract(r"Customs Duty:\s*([\d.]+)%?", raw_text),
        "st_rate":   extract(r"Special Tax:\s*([\d.]+)%?", raw_text),
        "vat_rate":  extract(r"VAT:\s*([\d.]+)%?", raw_text),
        "et_rate":   extract(r"Excise Tax:\s*([\d.]+)%?", raw_text),
        "raw_output": raw_text
    }

# Example
raw = predict_hs_code_raw("Men's cotton knitted T-shirt")
print(raw["raw_output"])
print(raw["hs_code"])   # model’s guess

🧠 Training details

Base model: unsloth/gemma-4-E4B-it (4‑bit QLoRA)
Adapter rank: r=16, alpha=16, targeting all language & attention layers
Gradient checkpointing: Unsloth’s own implementation (avoids Gemma‑4 KV‑shared layer bug)
Dataset: Custom Cambodian HS‑code dataset (hs_code.csv) with descriptions, codes, and official rates
- Cleaned, deduplicated, split into 90/10 train/validation
- Chat roles fixed to system/user/assistant (Gemma‑4 standard)
Training config: 3 epochs, effective batch size 8, learning rate 2e‑4, linear schedule, eval & save every epoch, best model loaded
Hardware: Google Colab T4 (16 GB) – peak memory ~10 GB thanks to QLoRA
Accuracy: Evaluated on held‑out examples (exact HS‑code match) – see model card for current numbers

⚖️ Production use

Always use the lookup table – never trust the model’s generated rates.

The model is a classifier: description → HS code.
Rates are fetched deterministically from hs_code_lookup.json, a file extracted from the same official tariff data used during training.

Why?

A causal LM recalling a rate from memory will occasionally hallucinate – a customs tool with confident, wrong numbers is worse than one that says “I don’t know”.
The lookup table guarantees 100% accuracy on rates once the HS code is correct.

The hs_code_lookup.json file is included in this repository and can be downloaded via:

from huggingface_hub import hf_hub_download
hf_hub_download("Sothay/gemma4-hscode-classifier", "hs_code_lookup.json")

📦 Files in this repository

File	Description
`adapter_model.safetensors`	LoRA adapter weights (few MB)
`adapter_config.json`	Adapter configuration (references base model)
`tokenizer.json`, `tokenizer_config.json`	Tokenizer files
`hs_code_lookup.json`	Authoritative rate table for production inference
`README.md`	This file

Note: Only the adapter is stored here – the full Gemma‑4 base model is automatically fetched from Unsloth when you call FastModel.from_pretrained.
If you need a merged, full‑precision model (for vLLM, TGI, etc.), generate it locally with Unsloth:
model.save_pretrained_merged("merged_fp16", tokenizer, save_method="merged_16bit")

🦙 Ollama / llama.cpp (GGUF)

Export a quantized GGUF directly from the loaded adapter:

model.save_pretrained_gguf("gguf_model", tokenizer, quantization_method="q4_k_m")

Then use with Ollama (see Modelfile example – set temperature 0, deterministic sampling).

📊 Example predictions

Description	Predicted HS Code	Unit	CD	ST	VAT	ET
Toyota Hilux pickup, diesel 2.8L	87042110	UNIT	35%	50%	10%	0%
iPhone 15 Pro Max 256GB	85171200	UNIT	0%	0%	10%	0%
Heineken beer 330ml can	22030010	LTR	35%	30%	10%	0%

(Rates from lookup table – not generated by the model.)

⚠️ Limitations

The model may output incorrect HS codes for ambiguous, misspelled, or region‑specific descriptions.
It was trained on a fixed set of Cambodian HS codes; revisions after the training data cutoff are not covered.
Duty rates can become outdated – always cross‑check with the latest official tariff schedule.
The model is a classifier, not a legal authority. For binding decisions, consult a customs professional.

📝 License

This model is a derivative of Gemma‑4‑E4B‑it and is subject to the Gemma license.
The HS‑code dataset and lookup table are the property of their respective owners.

🙏 Acknowledgments

Unsloth – made QLoRA + Gemma‑4 on a T4 effortless
Google DeepMind – for the Gemma family of models

📚 Citation

If you use this model, please cite:

@misc{gemma4-hscode-classifier,
  author = {Sothay},
  title = {Gemma‑4 HS Code Classifier (Cambodia Customs)},
  year = 2025,
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Sothay/gemma4-hscode-classifier}}
}

Author: Sothay
Model card version: 1.2

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Sothay/gemma4-hscode-classifier

Base model

google/gemma-4-E4B

Finetuned

google/gemma-4-E4B-it

Finetuned

unsloth/gemma-4-E4B-it

Finetuned

(89)

this model