Instructions to use summerMC/ume with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use summerMC/ume with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="summerMC/ume") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("summerMC/ume") model = AutoModelForCausalLM.from_pretrained("summerMC/ume") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use summerMC/ume with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use summerMC/ume with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "summerMC/ume" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "summerMC/ume", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/summerMC/ume
- SGLang
How to use summerMC/ume with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "summerMC/ume" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "summerMC/ume", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "summerMC/ume" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "summerMC/ume", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use summerMC/ume with Docker Model Runner:
docker model run hf.co/summerMC/ume
ume
ume is a GRPO fine-tuned derivative of summerMC/matutake, trained with LoRA on Python code-generation tasks and merged back into the base model for standalone inference.
Model Summary
- Model name:
summerMC/ume - Base model:
summerMC/matutake - Training method: GRPO (Group Relative Policy Optimization)
- Parameter-efficient tuning: LoRA
- Training dataset:
Hoglet-33/python-coding-dataset - Final artifact: merged checkpoint for direct inference
This model is intended to improve Python code generation behavior using lightweight reward functions that favor syntactically valid, code-like outputs.
Training Details
Base model
summerMC/matutake
Dataset
Hoglet-33/python-coding-dataset
Fine-tuning method
- Trainer: TRL
GRPOTrainer - Adapter method: LoRA
- Final export: merged LoRA weights into the base model
Reward functions
Training used simple heuristic reward functions:
1) Syntax reward
Rewards outputs that can be parsed as valid Python:
1.0ifast.parse(output)succeeds0.0otherwise
2) Code-shape reward
Rewards outputs that look more like actual Python code:
- no Markdown code fences
- contains Python-like tokens such as
def,import,return,class - non-trivially long output
- avoids extremely long generations
These rewards are intentionally lightweight and should be treated as a baseline GRPO setup rather than a production-grade evaluation system.
Prompt Format
The training data was converted into a chat-style coding prompt like this:
[
{
"role": "user",
"content": (
"Write correct Python code for the following task.\n"
"Return only Python code. Do not use markdown.\n\n"
"<task text>"
),
}
]
For best results, prompt the model with a direct coding task and explicitly request code only.
Usage
Transformers
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "summerMC/ume"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
device_map="auto",
trust_remote_code=True,
)
messages = [
{
"role": "user",
"content": "Write a Python function that computes fibonacci numbers with memoization."
}
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
)
response = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[-1]:],
skip_special_tokens=True,
)
print(response)
Example Prompt
Input
Write a Python function that returns the longest common prefix of a list of strings.
Return only Python code.
Expected output style
def longest_common_prefix(strs):
if not strs:
return ""
prefix = strs[0]
for s in strs[1:]:
while not s.startswith(prefix):
prefix = prefix[:-1]
if not prefix:
return ""
return prefix
Training Configuration
The model was trained with a setup similar to the following:
- LoRA rank (
r): 16 - LoRA alpha: 32
- LoRA dropout: 0.05
- Learning rate: 5e-6
- Batch size: 1
- Gradient accumulation: 8
- Generation batch size: 2
- Number of generations: 2
- Epochs: 1
LoRA target modules
[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
]
Limitations
- Training rewards are heuristic and do not verify functional correctness with unit tests.
- The model may still produce syntactically valid but logically incorrect code.
- Outputs may include hallucinated APIs, inefficient solutions, or incomplete implementations.
- Performance depends heavily on the capabilities and constraints of the base model
summerMC/matutake.
Intended Use
summerMC/ume is intended for:
- Python code generation experiments
- GRPO / RLHF-style fine-tuning experiments
- LoRA + merge workflows
- lightweight coding assistant prototyping
- research and hobbyist use
It is not validated for:
- production-critical software generation
- security-sensitive code
- safety-critical systems
- correctness-sensitive automated coding pipelines without external verification
Reproducibility
The training pipeline used:
transformersdatasetstrlpefttorch
A simplified training flow:
- Load
summerMC/matutake - Convert the dataset into chat prompts
- Train with
GRPOTrainerusing LoRA adapters - Save the LoRA adapter
- Merge adapter weights back into the base model
- Save the merged model as
summerMC/ume
Base Model and Dataset Attribution
Base model
Dataset
License
Please follow the licenses and usage terms of:
- the original base model
summerMC/matutake - the training dataset
Hoglet-33/python-coding-dataset
If you redistribute or publish derivative checkpoints, confirm that your use is compatible with both upstream licenses.
Citation
If you use this model in a project or experiment, please cite the upstream base model and dataset.
- Downloads last month
- 46