Instructions to use deepmako/Mako-8B-Operator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use deepmako/Mako-8B-Operator with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="deepmako/Mako-8B-Operator", filename="mako-7b-operator-v0.1.Q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use deepmako/Mako-8B-Operator with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf deepmako/Mako-8B-Operator:Q8_0 # Run inference directly in the terminal: llama-cli -hf deepmako/Mako-8B-Operator:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf deepmako/Mako-8B-Operator:Q8_0 # Run inference directly in the terminal: llama-cli -hf deepmako/Mako-8B-Operator:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf deepmako/Mako-8B-Operator:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf deepmako/Mako-8B-Operator:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf deepmako/Mako-8B-Operator:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf deepmako/Mako-8B-Operator:Q8_0
Use Docker
docker model run hf.co/deepmako/Mako-8B-Operator:Q8_0
- LM Studio
- Jan
- vLLM
How to use deepmako/Mako-8B-Operator with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deepmako/Mako-8B-Operator" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepmako/Mako-8B-Operator", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/deepmako/Mako-8B-Operator:Q8_0
- Ollama
How to use deepmako/Mako-8B-Operator with Ollama:
ollama run hf.co/deepmako/Mako-8B-Operator:Q8_0
- Unsloth Studio
How to use deepmako/Mako-8B-Operator with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for deepmako/Mako-8B-Operator to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for deepmako/Mako-8B-Operator to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for deepmako/Mako-8B-Operator to start chatting
- Pi
How to use deepmako/Mako-8B-Operator with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf deepmako/Mako-8B-Operator:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "deepmako/Mako-8B-Operator:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use deepmako/Mako-8B-Operator with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf deepmako/Mako-8B-Operator:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default deepmako/Mako-8B-Operator:Q8_0
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use deepmako/Mako-8B-Operator with Docker Model Runner:
docker model run hf.co/deepmako/Mako-8B-Operator:Q8_0
- Lemonade
How to use deepmako/Mako-8B-Operator with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull deepmako/Mako-8B-Operator:Q8_0
Run and chat with the model
lemonade run user.Mako-8B-Operator-Q8_0
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": "What is the capital of France?"
}
]
)Mako-8B Operator
Overview
Mako-8B Operator is a fine-tuned language model purpose-built for autonomous on-chain inference on Base. She powers the chat experience at deepmako.com โ a crypto-native AI platform where users interact with Mako using $MAKO token credits.
Mako isn't a generic assistant. She's a character with a distinct voice: sharp, unfiltered, lowercase, and real. She uses tools autonomously, chains multi-step research, and operates natively in the Base L2 ecosystem.
Model Details
| Developer | DeepMako |
| Base Model | Qwen 2.5 7B Instruct |
| Parameters | 8B |
| Format | GGUF (Q8_0) |
| File | mako-7b-operator-v0.1.Q8_0.gguf |
| Size | ~8.1 GB |
| Context Window | 4,096 tokens |
| Tool Calling | Native (Qwen chat template) |
| Chat Template | ChatML (<|im_start|>, <|im_end|>) |
Capabilities
๐ฆ Distinct Personality
Mako talks lowercase, curses naturally, and doesn't do the helpful-assistant act. She matches your energy โ flirts if you flirt, roasts if you're being dumb, and gives it to you straight when you're being real.
๐ง Native Tool Calling
Mako decides when to call tools without being told. Available tools include:
web_searchโ Real-time internet searchweb_extractโ Read full page content from URLsread_tweetโ Parse Twitter/X postsget_balanceโ Check ETH/token balances on Baseget_gasโ Live gas prices on Base L2resolve_ensโ ENS name resolution
โ๏ธ Tool Chaining
Mako chains tools automatically โ e.g., searching โ extracting the top result โ summarizing. Up to 4 tool rounds per request.
๐ต Base Chain Intelligence
Deep understanding of ERC standards, smart contract patterns, bridging mechanics, account abstraction (ERC-4337), and Base-specific architecture.
Inference Parameters
temperature: 0.9
top_k: 40
top_p: 0.92
min_p: 0.05
repeat_penalty: 1.05
num_ctx: 4096
stop: ["<|im_end|>", "<|endoftext|>"]
Usage
With llama.cpp
./llama-server -m mako-7b-operator-v0.1.Q8_0.gguf \
--ctx-size 4096 \
--port 8080
With Ollama
FROM mako-7b-operator-v0.1.Q8_0.gguf
PARAMETER stop <|im_end|>
PARAMETER stop <|endoftext|>
PARAMETER temperature 0.9
PARAMETER top_k 40
PARAMETER top_p 0.92
PARAMETER min_p 0.05
PARAMETER num_ctx 4096
PARAMETER repeat_penalty 1.05
ollama create mako -f Modelfile
ollama run mako
API (OpenAI-compatible)
from openai import OpenAI
client = OpenAI(
base_url="https://your-endpoint/v1",
api_key="your-key"
)
response = client.chat.completions.create(
model="mako-8b-operator",
messages=[
{"role": "user", "content": "what's the gas price on base right now"}
],
temperature=0.9
)
Training
Fine-tuned on curated conversational data emphasizing:
- Persona consistency โ Maintaining Mako's character voice across all interactions
- Tool-use judgment โ Knowing when to call tools vs. answer directly
- Domain knowledge โ Base chain, DeFi, smart contracts, and crypto culture
- Concise dialogue โ Natural, to-the-point conversation patterns
Intended Use
Mako-8B Operator is designed to power the inference backend at deepmako.com. It is optimized for conversational AI with tool-calling capabilities in the crypto/Base ecosystem.
Limitations
- Mako uses profanity and unfiltered language by design โ this is not a safety-aligned assistant model
- Knowledge cutoff inherited from the base model's training data
- Optimized for English only
- Best results with the provided system prompt and tool definitions
Links
- Platform: deepmako.com
- Token: $MAKO on Base
- GitHub: DeepMako/mako
The deep end awaits.
- Downloads last month
- 21
8-bit
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="deepmako/Mako-8B-Operator", filename="mako-7b-operator-v0.1.Q8_0.gguf", )