Instructions to use summerMC/ume with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use summerMC/ume with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="summerMC/ume")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("summerMC/ume")
model = AutoModelForCausalLM.from_pretrained("summerMC/ume")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use summerMC/ume with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use summerMC/ume with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "summerMC/ume"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "summerMC/ume",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/summerMC/ume

SGLang

How to use summerMC/ume with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "summerMC/ume" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "summerMC/ume",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "summerMC/ume" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "summerMC/ume",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use summerMC/ume with Docker Model Runner:
```
docker model run hf.co/summerMC/ume
```

ume

ume is a GRPO fine-tuned derivative of summerMC/matutake, trained with LoRA on Python code-generation tasks and merged back into the base model for standalone inference.

Model Summary

Model name: summerMC/ume
Base model: summerMC/matutake
Training method: GRPO (Group Relative Policy Optimization)
Parameter-efficient tuning: LoRA
Training dataset: Hoglet-33/python-coding-dataset
Final artifact: merged checkpoint for direct inference

This model is intended to improve Python code generation behavior using lightweight reward functions that favor syntactically valid, code-like outputs.

Training Details

Base model

summerMC/matutake

Dataset

Hoglet-33/python-coding-dataset

Fine-tuning method

Trainer: TRL GRPOTrainer
Adapter method: LoRA
Final export: merged LoRA weights into the base model

Reward functions

Training used simple heuristic reward functions:

1) Syntax reward

Rewards outputs that can be parsed as valid Python:

1.0 if ast.parse(output) succeeds
0.0 otherwise

2) Code-shape reward

Rewards outputs that look more like actual Python code:

no Markdown code fences
contains Python-like tokens such as def, import, return, class
non-trivially long output
avoids extremely long generations

These rewards are intentionally lightweight and should be treated as a baseline GRPO setup rather than a production-grade evaluation system.

Prompt Format

The training data was converted into a chat-style coding prompt like this:

[
    {
        "role": "user",
        "content": (
            "Write correct Python code for the following task.\n"
            "Return only Python code. Do not use markdown.\n\n"
            "<task text>"
        ),
    }
]

For best results, prompt the model with a direct coding task and explicitly request code only.

Usage

Transformers

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "summerMC/ume"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",
    trust_remote_code=True,
)

messages = [
    {
        "role": "user",
        "content": "Write a Python function that computes fibonacci numbers with memoization."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    temperature=0.7,
)

response = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[-1]:],
    skip_special_tokens=True,
)

print(response)

Example Prompt

Input

Write a Python function that returns the longest common prefix of a list of strings.
Return only Python code.

Expected output style

def longest_common_prefix(strs):
    if not strs:
        return ""

    prefix = strs[0]
    for s in strs[1:]:
        while not s.startswith(prefix):
            prefix = prefix[:-1]
            if not prefix:
                return ""
    return prefix

Training Configuration

The model was trained with a setup similar to the following:

LoRA rank (r): 16
LoRA alpha: 32
LoRA dropout: 0.05
Learning rate: 5e-6
Batch size: 1
Gradient accumulation: 8
Generation batch size: 2
Number of generations: 2
Epochs: 1

LoRA target modules

[
    "q_proj", "k_proj", "v_proj", "o_proj",
    "gate_proj", "up_proj", "down_proj",
]

Limitations

Training rewards are heuristic and do not verify functional correctness with unit tests.
The model may still produce syntactically valid but logically incorrect code.
Outputs may include hallucinated APIs, inefficient solutions, or incomplete implementations.
Performance depends heavily on the capabilities and constraints of the base model summerMC/matutake.

Intended Use

summerMC/ume is intended for:

Python code generation experiments
GRPO / RLHF-style fine-tuning experiments
LoRA + merge workflows
lightweight coding assistant prototyping
research and hobbyist use

It is not validated for:

production-critical software generation
security-sensitive code
safety-critical systems
correctness-sensitive automated coding pipelines without external verification

Reproducibility

The training pipeline used:

transformers
datasets
trl
peft
torch

A simplified training flow:

Load summerMC/matutake
Convert the dataset into chat prompts
Train with GRPOTrainer using LoRA adapters
Save the LoRA adapter
Merge adapter weights back into the base model
Save the merged model as summerMC/ume

Base Model and Dataset Attribution

Base model

summerMC/matutake

Dataset

Hoglet-33/python-coding-dataset

License

Please follow the licenses and usage terms of:

the original base model summerMC/matutake
the training dataset Hoglet-33/python-coding-dataset

If you redistribute or publish derivative checkpoints, confirm that your use is compatible with both upstream licenses.

Citation

If you use this model in a project or experiment, please cite the upstream base model and dataset.

Downloads last month: 46

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for summerMC/ume

Base model

summerMC/Sakura

Finetuned

summerMC/matutake

Adapter

(1)

this model