Instructions to use shareit/cycleinstruct-gemma4-supervisor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shareit/cycleinstruct-gemma4-supervisor with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="shareit/cycleinstruct-gemma4-supervisor")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("shareit/cycleinstruct-gemma4-supervisor")
model = AutoModelForMultimodalLM.from_pretrained("shareit/cycleinstruct-gemma4-supervisor")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use shareit/cycleinstruct-gemma4-supervisor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "shareit/cycleinstruct-gemma4-supervisor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shareit/cycleinstruct-gemma4-supervisor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/shareit/cycleinstruct-gemma4-supervisor

SGLang

How to use shareit/cycleinstruct-gemma4-supervisor with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "shareit/cycleinstruct-gemma4-supervisor" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shareit/cycleinstruct-gemma4-supervisor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "shareit/cycleinstruct-gemma4-supervisor" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shareit/cycleinstruct-gemma4-supervisor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use shareit/cycleinstruct-gemma4-supervisor with Docker Model Runner:
```
docker model run hf.co/shareit/cycleinstruct-gemma4-supervisor
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

cycleinstruct-gemma4-supervisor

Fully merged google/gemma-4-12B-it (11.96 B) fine-tuned in two stages for the LG-Electronics customer-service quality-supervisor task. Given a (Category, Conversation Transcript, Retrieved Document) triplet, the model emits

<think>
[Query-Document Alignment] …
[Response-Document Consistency] …
[Response Completeness] …
</think>
{"label": "correct" | "incorrect", "reason": "…"}

This repo contains a single-file, ready-to-use checkpoint — no adapter merging required at load time.

Training pipeline (CycleInstruct-motivated, two-stage SFT)

Following the CycleInstruct paper (EMNLP 2025) as the augmentation strategy motivator:

Stage 1 — CS-chatbot SFT on 9,868 natural (question, answer) pairs built from LG feedback + general-inquiry data. LoRA r=16 α=32, Muon @ lr=2e-3, seed=17, 7 epochs.
Stage 2 — Supervisor SFT on 3,771 human-annotated supervisor judgements. Stage-1 LoRA is merged into the base first, then a fresh LoRA r=16 α=32 is added and trained with Muon @ lr=1e-3, seed=42, 7 epochs on 4,096-token sequences.

The uploaded checkpoint is the result of merging both LoRA stages into the base weights and re-saving with save_pretrained.

Metrics — 199-item held-out supervisor test set (T=0, `max_new_tokens=1200`)

Metric	Stage-1 only	This model (full merged)
Parse-fail rate	97.49 %	0.50 %
Accuracy	1.01 %	70.35 %
Macro-F1	0.025	0.652
chrF	6.01	41.29
ROUGE-L	0.044	0.881
BLEU-4	0.35	22.94
BERTScore-F1	0.813	0.902
SBERT-cos (multi-mpnet)	0.427	0.829

Per-class:

Class	Precision	Recall	F1	Support
correct	0.456	0.596	0.517	52
incorrect	0.838	0.741	0.787	147

Loading

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

REPO = "shareit/cycleinstruct-gemma4-supervisor"

tok   = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForCausalLM.from_pretrained(
    REPO, torch_dtype=torch.bfloat16,
    attn_implementation="sdpa", device_map="auto").eval()

SYSTEM = "당신은 전자제품 CS 챗봇의 품질을 평가하는 수퍼바이저입니다."
USER   = "[Category] W/M\n[Conversation Transcript] …\n[Retrieved Document] …"

# Gemma-4's default chat template appends <|channel>thought<channel|> on
# add_generation_prompt=True — bypass that with a manual <|turn>model\n so
# training and inference prompts match byte-for-byte.
msgs = [{"role":"system", "content": SYSTEM},
        {"role":"user",   "content": USER}]
prompt = tok.apply_chat_template(msgs, tokenize=False,
                                 add_generation_prompt=False)
prompt = prompt + "<|turn>model\n"

out = model.generate(
    **tok(prompt, return_tensors="pt", add_special_tokens=False).to(model.device),
    do_sample=False, max_new_tokens=1200,
    pad_token_id=tok.pad_token_id,
)
print(tok.decode(out[0], skip_special_tokens=False))

max_new_tokens=1200 matters — the <think> block usually consumes 500-900 tokens before the final JSON verdict.

Training details (stage 2, on top of stage-1-merged base)

PEFT: LoRA r=16, α=32, dropout 0.05, target_modules=all-linear, bias='none'
Optimizer: Muon on 2D matrices (Newton-Schulz orthogonalisation) + AdamW on 1D params
LR: 1e-3 (matrix) / 1e-4 (aux), cosine decay with 3 % warmup, grad-clip 1.0
Batch: per-device 1 × grad-accum 16 (effective 16)
Seq len: 4096 (user text char-clipped if exceeds; assistant always preserved)
Seed: 42, Epochs: 7
Attention: SDPA (bf16 native on H200)
Wall clock: 6h02m on a half-H200 (48 GB active)

Data

Stage-1 train: 9,868 (q, a) pairs from data/processed/train_pairs.jsonl (multilingual, mostly English, ~50 % English, ~15 % German, then FR/ES/IT/JA/ZH…)
Stage-2 train: 3,771 supervisor-annotated rows {"conversations": [{"from":"system", …}, {"from":"user", …}, {"from":"assistant", …}]} with the assistant response being a <think>…</think>{"label":…,"reason":…} judgement.
Test: 199 held-out supervisor rows (unseen during either stage).

Intended use / limitations

Intended for research reproduction of CycleInstruct-style continuation training on labeled downstream tasks.
The correct class has substantially lower F1 (0.517) than incorrect (0.787), reflecting the 39/61 % class imbalance in the training data. Class-weighted loss or balanced sampling would likely help.
The <think> reasoning is Korean; input transcripts may be any language.

License

This model is a derivative of google/gemma-4-12B-it and is distributed under the Gemma Terms of Use (https://ai.google.dev/gemma/docs/gemma_4_license). By using this model you agree to the Gemma Prohibited Use Policy. Powered by Gemma.

Downloads last month: 221

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for shareit/cycleinstruct-gemma4-supervisor

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Finetuned

(102)

this model

Quantizations

1 model

Paper for shareit/cycleinstruct-gemma4-supervisor

Twist-angle tunable Josephson junctions in three-dimensional superconductors

Paper • 2508.09551 • Published Jan 13