Instructions to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune", filename="qwen2.5-coder-0.5b-instruct.F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16 # Run inference directly in the terminal: llama-cli -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16 # Run inference directly in the terminal: llama-cli -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16 # Run inference directly in the terminal: ./llama-cli -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Use Docker
docker model run hf.co/0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
- LM Studio
- Jan
- Ollama
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with Ollama:
ollama run hf.co/0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
- Unsloth Studio new
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune to start chatting
- Pi new
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Run Hermes
hermes
- Docker Model Runner
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with Docker Model Runner:
docker model run hf.co/0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
- Lemonade
How to use 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull 0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune:F16
Run and chat with the model
lemonade run user.Qwen-2.5-Coder-Instruct-Mermaid-finetune-F16
List all available models
lemonade list
Google Colab Prompt Template
Local Inference on GPU with Qwen 2.5 Coder Mermaid Fine-tune
Install Dependencies
pip install -U llama-cpp-python
Local Inference on GPU
Model Page
https://huggingface.co/0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune
⚠️ If the generated code snippets do not work, please open an issue on either:
Python Example
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="0nell0/Qwen-2.5-Coder-Instruct-Mermaid-finetune",
filename="qwen2.5-coder-0.5b-instruct.F16.gguf",
chat_format="chatml",
n_gpu_layers=-1, # full GPU offload
n_ctx=8192,
verbose=False
)
input_prompt = "make a system design mermaid for todo CRUD app"
response = llm.create_chat_completion(
messages=[
{
"role": "system",
"content": "You are a System Architect who writes Mermaid diagrams only."
},
{
"role": "user",
"content": input_prompt
}
],
temperature=0.7,
stop=["<|im_end|>", "<|im_start|>"]
)
print(response["choices"][0]["message"]["content"])
Explanation
This notebook demonstrates how to:
- Run a GGUF LLM locally using
llama-cpp-python - Fully offload inference to the GPU
- Use the fine-tuned Qwen 2.5 Coder Mermaid model
- Generate Mermaid system design diagrams from prompts
Parameter Breakdown
| Parameter | Description |
|---|---|
repo_id |
Hugging Face model repository |
filename |
GGUF model file |
chat_format="chatml" |
Uses ChatML prompt formatting |
n_gpu_layers=-1 |
Offloads all layers to GPU |
n_ctx=8192 |
Context window size |
verbose=False |
Disables verbose logs |
Example Prompt
make a system design mermaid for todo CRUD app
Example Output
flowchart LR
A(["Add New Task"])
B(["Complete Task"])
C("completed": true)
D["Delete Task"]
E["Mark as Done"]
A -.-> B
B -->|Done| C
B -.-> D
C --> E
D -->|Completed| E