Instructions to use srivarenya/MoM-python-slm-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use srivarenya/MoM-python-slm-grpo with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="srivarenya/MoM-python-slm-grpo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm-grpo")
model = AutoModelForCausalLM.from_pretrained("srivarenya/MoM-python-slm-grpo")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use srivarenya/MoM-python-slm-grpo with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "srivarenya/MoM-python-slm-grpo"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/srivarenya/MoM-python-slm-grpo

SGLang

How to use srivarenya/MoM-python-slm-grpo with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "srivarenya/MoM-python-slm-grpo" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "srivarenya/MoM-python-slm-grpo" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "srivarenya/MoM-python-slm-grpo",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use srivarenya/MoM-python-slm-grpo with Docker Model Runner:
```
docker model run hf.co/srivarenya/MoM-python-slm-grpo
```

MoM-Python-SLM-GRPO (1.5B)

The spec-driven code-generation node of a Mixture-of-Models (MoM) mesh — the GRPO/RLVR-tuned successor to srivarenya/MoM-python-slm. Given a Python task (optionally with an upstream context packet), it returns reasoning followed by a function. It shares the Qwen2.5-Coder tokenizer with the other generative nodes, which is what makes logit-space fusion across the mesh valid.

Warm-started from: srivarenya/MoM-python-slm (DoRA r=64 SFT of Qwen2.5-Coder-1.5B-Instruct)
Method: GRPO (Group Relative Policy Optimization, RLVR) — a fresh DoRA r=64 adapter trained 500 steps, then merged.
Reward: 0.8 · execution + 0.1 · format + 0.1 · LLM-judge. Execution reward runs each completion against the problem's assert tests in a sandbox (binary pass/fail) — this is the load-bearing signal. Two-sided abstention: NEED_INPUT is rewarded only on underspecified prompts.
GRPO config: β=0 (no KL), asymmetric clip ε=[0.2, 0.25], G=8 completions/prompt, temp=0.9, top_p=0.95, lr=1e-5. Problems: 6k execution-verifiable (problem_solving + spec_to_code) + abstention records.

Benchmarks (greedy pass@1, same Colab/evalplus harness for all three)

Metric	base	SFT (`MoM-python-slm`)	this model (GRPO)	GRPO vs SFT
MBPP	66.7	69.6	72.5	+2.9
MBPP+	—	—	62.7	—
domain `problem_solving` (exec)	0.700	0.713	0.767	+5.4
domain `spec_to_code` (exec)	0.632	0.714	0.729	+1.5
domain `api_usage` (application)	—	0.855	0.900	+4.5
HumanEval	68.9	70.7	67.7	−3.0
HumanEval+	—	—	62.2	—
domain `api_signature` (param-recall)	0.217	0.299	0.301	+0.0

What GRPO did (load-bearing read)

GRPO is a specialization trade, not a free lunch. Gains land on exactly the execution-rewarded, spec-driven dimensions — MBPP +2.9 and domain problem_solving +5.4 over SFT — while the un-reinforced HumanEval completion format gives back −3.0 (slightly under base). That's the textbook RLVR signature: the model sharpens "write a correct function from a spec" (what the MoM node actually does) at a small cost to "graft a body under a fixed signature" (a format it never saw a reward for).

Use this model for the spec-driven node role — it's the strongest on MBPP and the held-out domain eval.
Use the SFT sibling if HumanEval-completion is a hard gate — it remains the HumanEval-strongest checkpoint (70.7).

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm-grpo")
model = AutoModelForCausalLM.from_pretrained(
    "srivarenya/MoM-python-slm-grpo", dtype="bfloat16", device_map="auto")

Prompt with the training system prompt + a Python task; the model returns reasoning then code. Reward, training recipe, and the self-contained GRPO Colab notebook are in the project repository.

Next cross-check: LiveCodeBench (contamination-resistant), before/after vs the SFT sibling.

Downloads last month: 20

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for srivarenya/MoM-python-slm-grpo

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-Coder-1.5B

Finetuned

Qwen/Qwen2.5-Coder-1.5B-Instruct

Finetuned

srivarenya/MoM-python-slm

Finetuned

(1)

this model