Instructions to use shareit/cycleinstruct-phi4-supervisor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use shareit/cycleinstruct-phi4-supervisor with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="shareit/cycleinstruct-phi4-supervisor") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("shareit/cycleinstruct-phi4-supervisor") model = AutoModelForCausalLM.from_pretrained("shareit/cycleinstruct-phi4-supervisor") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use shareit/cycleinstruct-phi4-supervisor with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "shareit/cycleinstruct-phi4-supervisor" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shareit/cycleinstruct-phi4-supervisor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/shareit/cycleinstruct-phi4-supervisor
- SGLang
How to use shareit/cycleinstruct-phi4-supervisor with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "shareit/cycleinstruct-phi4-supervisor" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shareit/cycleinstruct-phi4-supervisor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "shareit/cycleinstruct-phi4-supervisor" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "shareit/cycleinstruct-phi4-supervisor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use shareit/cycleinstruct-phi4-supervisor with Docker Model Runner:
docker model run hf.co/shareit/cycleinstruct-phi4-supervisor
cycleinstruct-phi4-supervisor
Fully merged microsoft/Phi-4-reasoning (14.66 B) fine-tuned in two
stages for the LG-Electronics customer-service quality-supervisor task.
Given a (Category, Conversation Transcript, Retrieved Document) triplet,
the model emits
<think>
[Query-Document Alignment] …
[Response-Document Consistency] …
[Response Completeness] …
</think>
{"label": "correct" | "incorrect", "reason": "…"}
This repo contains a single-file, ready-to-use checkpoint — no adapter merging required at load time.
Training pipeline (CycleInstruct-motivated, two-stage SFT)
Following the CycleInstruct paper (EMNLP 2025) as the augmentation strategy motivator:
- Stage 1 — CS-chatbot SFT on 9,868 natural
(question, answer)pairs built from LG feedback + general-inquiry data. LoRA r=16 α=32, Muon @ lr=2e-3, seed=1337, 8 epochs. - Stage 2 — Supervisor SFT on 3,771 human-annotated supervisor judgements. Stage-1 LoRA is merged into the base first, then a fresh LoRA r=16 α=32 is added and trained with Muon @ lr=1e-3, seed=42, 7 epochs on 4,096-token sequences.
The uploaded checkpoint is the result of merging both LoRA stages into
the base weights and re-saving with save_pretrained.
Metrics — 199-item held-out supervisor test set (T=0, max_new_tokens=1200)
| Metric | Stage-1 only | This model (full merged) |
|---|---|---|
| Parse-fail rate | 95.98 % | 0.00 % |
| Accuracy | 1.01 % | 68.84 % |
| Macro-F1 | 0.033 | 0.615 |
| chrF | 6.55 | 40.92 |
| ROUGE-L | 0.062 | 0.885 |
| BLEU-4 | 0.37 | 22.41 |
| BERTScore-F1 | 0.826 | 0.901 |
| SBERT-cos (multi-mpnet) | 0.437 | 0.830 |
Per-class:
| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| correct | 0.417 | 0.481 | 0.446 | 52 |
| incorrect | 0.806 | 0.762 | 0.783 | 147 |
Loading
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
REPO = "shareit/cycleinstruct-phi4-supervisor"
tok = AutoTokenizer.from_pretrained(REPO)
model = AutoModelForCausalLM.from_pretrained(
REPO, torch_dtype=torch.bfloat16,
attn_implementation="sdpa", device_map="auto").eval()
SYSTEM = "당신은 전자제품 CS 챗봇의 품질을 평가하는 수퍼바이저입니다."
USER = "[Category] W/M\n[Conversation Transcript] …\n[Retrieved Document] …"
# Phi-4-reasoning ChatML with our clean system prompt (skip default Thought scaffold)
prompt = (
f"<|im_start|>system<|im_sep|>{SYSTEM}<|im_end|>"
f"<|im_start|>user<|im_sep|>{USER}<|im_end|>"
f"<|im_start|>assistant<|im_sep|>"
)
out = model.generate(
**tok(prompt, return_tensors="pt", add_special_tokens=False).to(model.device),
do_sample=False, max_new_tokens=1200,
pad_token_id=tok.pad_token_id,
)
print(tok.decode(out[0], skip_special_tokens=False))
max_new_tokens=1200 matters — the <think> block usually consumes
500-900 tokens before the final JSON verdict.
Training details (stage 2, on top of stage-1-merged base)
- PEFT: LoRA r=16, α=32, dropout 0.05,
target_modules=all-linear, bias='none' - Optimizer: Muon on 2D matrices (Newton-Schulz orthogonalisation) + AdamW on 1D params
- LR: 1e-3 (matrix) / 1e-4 (aux), cosine decay with 3 % warmup, grad-clip 1.0
- Batch: per-device 1 × grad-accum 16 (effective 16)
- Seq len: 4096 (user text char-clipped if exceeds; assistant always preserved)
- Seed: 42, Epochs: 7
- Attention: SDPA (bf16 native on H200)
- Wall clock: 5h48m on a half-H200 (48 GB active)
Data
- Stage-1 train: 9,868
(q, a)pairs fromdata/processed/train_pairs.jsonl(multilingual, mostly English, ~50 % English, ~15 % German, then FR/ES/IT/JA/ZH…) - Stage-2 train: 3,771 supervisor-annotated rows
{"conversations": [{"from":"system", …}, {"from":"user", …}, {"from":"assistant", …}]}with the assistant response being a<think>…</think>{"label":…,"reason":…}judgement. - Test: 199 held-out supervisor rows (unseen during either stage).
Intended use / limitations
- Intended for research reproduction of CycleInstruct-style continuation training on labeled downstream tasks.
- The
correctclass has substantially lower F1 (0.446) thanincorrect(0.783), reflecting the 39/61 % class imbalance in the training data. Class-weighted loss or balanced sampling would likely help. - The
<think>reasoning is Korean; input transcripts may be any language.
License
MIT (inherits from the microsoft/Phi-4-reasoning base model).
- Downloads last month
- 229