Instructions to use josephmayo/Qwen2.5-agentic-7B-SLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use josephmayo/Qwen2.5-agentic-7B-SLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="josephmayo/Qwen2.5-agentic-7B-SLM") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("josephmayo/Qwen2.5-agentic-7B-SLM") model = AutoModelForCausalLM.from_pretrained("josephmayo/Qwen2.5-agentic-7B-SLM") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use josephmayo/Qwen2.5-agentic-7B-SLM with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "josephmayo/Qwen2.5-agentic-7B-SLM" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "josephmayo/Qwen2.5-agentic-7B-SLM", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/josephmayo/Qwen2.5-agentic-7B-SLM
- SGLang
How to use josephmayo/Qwen2.5-agentic-7B-SLM with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "josephmayo/Qwen2.5-agentic-7B-SLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "josephmayo/Qwen2.5-agentic-7B-SLM", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "josephmayo/Qwen2.5-agentic-7B-SLM" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "josephmayo/Qwen2.5-agentic-7B-SLM", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use josephmayo/Qwen2.5-agentic-7B-SLM with Docker Model Runner:
docker model run hf.co/josephmayo/Qwen2.5-agentic-7B-SLM
Qwen2.5-Coder-7B Agentic SLM v5 Merged
This repository contains the merged 7B model:
Qwen/Qwen2.5-Coder-7B-Instruct + v5 LoRA adapter.
It is the deployable dense 7B component of the v5 agentic coding system. The best measured result comes from running this model inside a deterministic verifier/rescue harness, not from raw chat usage alone.
Current Proof Gate
Kaggle proof kernel: holykeys/qwen25-coder-agentic-slm-v5-rescue
Evaluation set: 50 HumanEval/MBPP-style tasks used for fast iteration.
| Phase | Greedy pass@1 | Coverage@K | Selected@K | Repair | Final |
|---|---|---|---|---|---|
| Qwen2.5-Coder-7B reference harness | 37/50 | 40/50 | 40/50 | 2/50 | 42/50 |
| v5 7B adapter/merged primary | 37/50 | 42/50 | 42/50 | 2/50 | 44/50 |
| 14B rescue on primary misses | 1/6 | 3/6 | 3/6 | 1/6 | 4/6 |
| v5 combined rescue system | 38/50 | 45/50 | 45/50 | 3/50 | 48/50 |
Lift Summary
The 7B merged model alone improved the final harness score from 42/50 to 44/50.
That is:
+2/50absolute tasks.+4percentage points.+4.76%relative improvement over the42/50reference.
The full v5 rescue system improved from 42/50 to 48/50.
That is:
+6/50absolute tasks.+12percentage points.+14.29%relative improvement.75%failure reduction, from8failures to2failures.
Interpretation
This model should be viewed as a compact coding component, not a frontier-model replacement by itself.
The practical artifact is:
- 7B merged model for primary code generation.
- Deterministic verifier/test runner.
- Candidate selection by executable tests.
- Repair pass for failed candidates.
- Optional rescue model for missed tasks.
The strongest result requires the harness.
Limitations
- The current proof gate is small.
- HumanEval/MBPP-style tasks are not enough to establish broad coding-agent quality.
- No broad SWE-bench claim is made.
- No Claude Sonnet 4.5 win is claimed.
- Contamination risk must be handled carefully on common public coding benchmarks.
Required Next Benchmarks
Future claims should be gated by a broader eval suite:
- LiveCodeBench, using recent and non-training-contaminated slices.
- BigCodeBench, including realistic library/function behavior.
- SWE-bench Lite, then SWE-bench Verified if the lite run is promising.
- Repo-edit tasks with hidden tests.
- Agentic tool-use tasks: edit, run tests, inspect failures, patch again.
- Cost and latency: total wall-clock, GPU type, tokens per task, repair count, and success per dollar.
- Abstention and invalid-output rates.
- Robustness under strict code-only output constraints.
Batch-Based Release Discipline
The next iteration should avoid giant all-in-one notebooks.
Preferred release process:
baseline: evaluate base model only.candidate: evaluate one candidate change only.failure_forge: collect failed attempts and verifier observations.repair_train: train only on verified minimal repairs.heldout_eval: rerun held-out benchmark tasks.release: push LoRA, merged model, and GGUF only after the gate passes.
Each batch should have a separate Kaggle notebook, capped runtime, deterministic output files, and explicit pass/fail criteria.
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "josephmayo/Qwen2.5-Coder-7B-agentic-SLM"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
trust_remote_code=True,
)
For meaningful results, run the model in a verifier harness rather than judging raw single responses.
- Downloads last month
- -