Instructions to use shareit/cycleinstruct-phi4-supervisor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use shareit/cycleinstruct-phi4-supervisor with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="shareit/cycleinstruct-phi4-supervisor")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("shareit/cycleinstruct-phi4-supervisor")
model = AutoModelForCausalLM.from_pretrained("shareit/cycleinstruct-phi4-supervisor")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use shareit/cycleinstruct-phi4-supervisor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "shareit/cycleinstruct-phi4-supervisor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shareit/cycleinstruct-phi4-supervisor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/shareit/cycleinstruct-phi4-supervisor

SGLang

How to use shareit/cycleinstruct-phi4-supervisor with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "shareit/cycleinstruct-phi4-supervisor" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shareit/cycleinstruct-phi4-supervisor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "shareit/cycleinstruct-phi4-supervisor" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "shareit/cycleinstruct-phi4-supervisor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use shareit/cycleinstruct-phi4-supervisor with Docker Model Runner:
```
docker model run hf.co/shareit/cycleinstruct-phi4-supervisor
```

cycleinstruct-phi4-supervisor

Fully merged microsoft/Phi-4-reasoning (14.66 B) fine-tuned in two stages for the LG-Electronics customer-service quality-supervisor task. Given a (Category, Conversation Transcript, Retrieved Document) triplet, the model emits

<think>
[Query-Document Alignment] …
[Response-Document Consistency] …
[Response Completeness] …
</think>
{"label": "correct" | "incorrect", "reason": "…"}

This repo contains a single-file, ready-to-use checkpoint — no adapter merging required at load time.

Training pipeline (CycleInstruct-motivated, two-stage SFT)

Following the CycleInstruct paper (EMNLP 2025) as the augmentation strategy motivator:

Stage 1 — CS-chatbot SFT on 9,868 natural (question, answer) pairs built from LG feedback + general-inquiry data. LoRA r=16 α=32, Muon @ lr=2e-3, seed=1337, 8 epochs.
Stage 2 — Supervisor SFT on 3,771 human-annotated supervisor judgements. Stage-1 LoRA is merged into the base first, then a fresh LoRA r=16 α=32 is added and trained with Muon @ lr=1e-3, seed=42, 7 epochs on 4,096-token sequences.

The uploaded checkpoint is the result of merging both LoRA stages into the base weights and re-saving with save_pretrained.

Metrics — 199-item held-out supervisor test set (T=0, `max_new_tokens=1200`)

Metric	Stage-1 only	This model (full merged)
Parse-fail rate	95.98 %	0.00 %
Accuracy	1.01 %	68.84 %
Macro-F1	0.033	0.615
chrF	6.55	40.92
ROUGE-L	0.062	0.885
BLEU-4	0.37	22.41
BERTScore-F1	0.826	0.901
SBERT-cos (multi-mpnet)	0.437	0.830

Per-class:

Class	Precision	Recall	F1	Support
correct	0.417	0.481	0.446	52
incorrect	0.806	0.762	0.783	147

Loading

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

REPO = "shareit/cycleinstruct-phi4-supervisor"

tok   = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForCausalLM.from_pretrained(
    REPO, torch_dtype=torch.bfloat16,
    attn_implementation="sdpa", device_map="auto").eval()

SYSTEM = "당신은 전자제품 CS 챗봇의 품질을 평가하는 수퍼바이저입니다."
USER   = "[Category] W/M\n[Conversation Transcript] …\n[Retrieved Document] …"

# Phi-4-reasoning ChatML with our clean system prompt (skip default Thought scaffold)
prompt = (
    f"<|im_start|>system<|im_sep|>{SYSTEM}<|im_end|>"
    f"<|im_start|>user<|im_sep|>{USER}<|im_end|>"
    f"<|im_start|>assistant<|im_sep|>"
)
out = model.generate(
    **tok(prompt, return_tensors="pt", add_special_tokens=False).to(model.device),
    do_sample=False, max_new_tokens=1200,
    pad_token_id=tok.pad_token_id,
)
print(tok.decode(out[0], skip_special_tokens=False))

max_new_tokens=1200 matters — the <think> block usually consumes 500-900 tokens before the final JSON verdict.

Training details (stage 2, on top of stage-1-merged base)

PEFT: LoRA r=16, α=32, dropout 0.05, target_modules=all-linear, bias='none'
Optimizer: Muon on 2D matrices (Newton-Schulz orthogonalisation) + AdamW on 1D params
LR: 1e-3 (matrix) / 1e-4 (aux), cosine decay with 3 % warmup, grad-clip 1.0
Batch: per-device 1 × grad-accum 16 (effective 16)
Seq len: 4096 (user text char-clipped if exceeds; assistant always preserved)
Seed: 42, Epochs: 7
Attention: SDPA (bf16 native on H200)
Wall clock: 5h48m on a half-H200 (48 GB active)

Data

Stage-1 train: 9,868 (q, a) pairs from data/processed/train_pairs.jsonl (multilingual, mostly English, ~50 % English, ~15 % German, then FR/ES/IT/JA/ZH…)
Stage-2 train: 3,771 supervisor-annotated rows {"conversations": [{"from":"system", …}, {"from":"user", …}, {"from":"assistant", …}]} with the assistant response being a <think>…</think>{"label":…,"reason":…} judgement.
Test: 199 held-out supervisor rows (unseen during either stage).

Intended use / limitations

Intended for research reproduction of CycleInstruct-style continuation training on labeled downstream tasks.
The correct class has substantially lower F1 (0.446) than incorrect (0.783), reflecting the 39/61 % class imbalance in the training data. Class-weighted loss or balanced sampling would likely help.
The <think> reasoning is Korean; input transcripts may be any language.

License

MIT (inherits from the microsoft/Phi-4-reasoning base model).

Downloads last month: 229

Safetensors

Model size

15B params

Tensor type

BF16

Model tree for shareit/cycleinstruct-phi4-supervisor

Base model

microsoft/phi-4

Finetuned

microsoft/Phi-4-reasoning

Finetuned

(9)

this model

Paper for shareit/cycleinstruct-phi4-supervisor

Twist-angle tunable Josephson junctions in three-dimensional superconductors

Paper • 2508.09551 • Published Jan 13