Instructions to use srivarenya/MoM-python-slm-grpo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use srivarenya/MoM-python-slm-grpo with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="srivarenya/MoM-python-slm-grpo") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm-grpo") model = AutoModelForCausalLM.from_pretrained("srivarenya/MoM-python-slm-grpo") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use srivarenya/MoM-python-slm-grpo with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "srivarenya/MoM-python-slm-grpo" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srivarenya/MoM-python-slm-grpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/srivarenya/MoM-python-slm-grpo
- SGLang
How to use srivarenya/MoM-python-slm-grpo with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "srivarenya/MoM-python-slm-grpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srivarenya/MoM-python-slm-grpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "srivarenya/MoM-python-slm-grpo" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "srivarenya/MoM-python-slm-grpo", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use srivarenya/MoM-python-slm-grpo with Docker Model Runner:
docker model run hf.co/srivarenya/MoM-python-slm-grpo
MoM-Python-SLM-GRPO (1.5B)
The spec-driven code-generation node of a Mixture-of-Models (MoM) mesh — the GRPO/RLVR-tuned
successor to srivarenya/MoM-python-slm. Given a
Python task (optionally with an upstream context packet), it returns reasoning followed by a function.
It shares the Qwen2.5-Coder tokenizer with the other generative nodes, which is what makes logit-space
fusion across the mesh valid.
- Warm-started from:
srivarenya/MoM-python-slm(DoRA r=64 SFT of Qwen2.5-Coder-1.5B-Instruct) - Method: GRPO (Group Relative Policy Optimization, RLVR) — a fresh DoRA r=64 adapter trained 500 steps, then merged.
- Reward:
0.8 · execution + 0.1 · format + 0.1 · LLM-judge. Execution reward runs each completion against the problem'sasserttests in a sandbox (binary pass/fail) — this is the load-bearing signal. Two-sided abstention:NEED_INPUTis rewarded only on underspecified prompts. - GRPO config: β=0 (no KL), asymmetric clip ε=[0.2, 0.25], G=8 completions/prompt, temp=0.9, top_p=0.95, lr=1e-5. Problems: 6k execution-verifiable (problem_solving + spec_to_code) + abstention records.
Benchmarks (greedy pass@1, same Colab/evalplus harness for all three)
| Metric | base | SFT (MoM-python-slm) |
this model (GRPO) | GRPO vs SFT |
|---|---|---|---|---|
| MBPP | 66.7 | 69.6 | 72.5 | +2.9 |
| MBPP+ | — | — | 62.7 | — |
domain problem_solving (exec) |
0.700 | 0.713 | 0.767 | +5.4 |
domain spec_to_code (exec) |
0.632 | 0.714 | 0.729 | +1.5 |
domain api_usage (application) |
— | 0.855 | 0.900 | +4.5 |
| HumanEval | 68.9 | 70.7 | 67.7 | −3.0 |
| HumanEval+ | — | — | 62.2 | — |
domain api_signature (param-recall) |
0.217 | 0.299 | 0.301 | +0.0 |
What GRPO did (load-bearing read)
GRPO is a specialization trade, not a free lunch. Gains land on exactly the execution-rewarded, spec-driven dimensions — MBPP +2.9 and domain problem_solving +5.4 over SFT — while the un-reinforced HumanEval completion format gives back −3.0 (slightly under base). That's the textbook RLVR signature: the model sharpens "write a correct function from a spec" (what the MoM node actually does) at a small cost to "graft a body under a fixed signature" (a format it never saw a reward for).
- Use this model for the spec-driven node role — it's the strongest on MBPP and the held-out domain eval.
- Use the SFT sibling if HumanEval-completion is a hard gate — it remains the HumanEval-strongest checkpoint (70.7).
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("srivarenya/MoM-python-slm-grpo")
model = AutoModelForCausalLM.from_pretrained(
"srivarenya/MoM-python-slm-grpo", dtype="bfloat16", device_map="auto")
Prompt with the training system prompt + a Python task; the model returns reasoning then code. Reward, training recipe, and the self-contained GRPO Colab notebook are in the project repository.
Next cross-check: LiveCodeBench (contamination-resistant), before/after vs the SFT sibling.
- Downloads last month
- 20
Model tree for srivarenya/MoM-python-slm-grpo
Base model
Qwen/Qwen2.5-1.5B