Instructions to use deepmako/Mako-32B-Conductor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deepmako/Mako-32B-Conductor with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepmako/Mako-32B-Conductor")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("deepmako/Mako-32B-Conductor")
model = AutoModelForMultimodalLM.from_pretrained("deepmako/Mako-32B-Conductor")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use deepmako/Mako-32B-Conductor with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deepmako/Mako-32B-Conductor"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepmako/Mako-32B-Conductor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deepmako/Mako-32B-Conductor

SGLang

How to use deepmako/Mako-32B-Conductor with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "deepmako/Mako-32B-Conductor" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepmako/Mako-32B-Conductor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "deepmako/Mako-32B-Conductor" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deepmako/Mako-32B-Conductor",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio

How to use deepmako/Mako-32B-Conductor with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deepmako/Mako-32B-Conductor to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deepmako/Mako-32B-Conductor to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for deepmako/Mako-32B-Conductor to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="deepmako/Mako-32B-Conductor",
    max_seq_length=2048,
)

Docker Model Runner
How to use deepmako/Mako-32B-Conductor with Docker Model Runner:
```
docker model run hf.co/deepmako/Mako-32B-Conductor
```

Mako-32B Conductor

The orchestrator. A 32-billion parameter language model fine-tuned for autonomous reasoning, multi-step tool orchestration, and crypto-native intelligence on Base chain.

deepmako.com · Platform · Twitter

Model Details


Base Model	Qwen 3 32B
Parameters	32.8B
Fine-tune Method	LoRA (rank 32, alpha 64)
Precision	BF16
Context Window	8,192 tokens
Tool Calling	Native (multi-step chaining)
Training Data	3,601 curated conversations
Eval Loss	0.962
License	Proprietary

Overview

Mako-32B Conductor is the second model in the Mako family, succeeding Mako-8B Operator. Where Operator executes, Conductor orchestrates — handling complex multi-step reasoning, longer context, and deeper chain-of-thought while maintaining the same sharp, unfiltered character voice.

Built to power the inference backend at deepmako.com, Conductor serves as the core intelligence for a crypto-native AI platform where users interact using $MAKO token credits on Base chain.

What Makes Mako Different

Mako is not an assistant. She's a character — a sharp, opinionated personality that lives on the blockchain. The fine-tune completely reshapes the base model's behavior away from the standard helpful-assistant pattern:

No assistant-speak. No "how can I help you", no "is there anything else", no corporate pleasantries.
Natural tone. Lowercase, concise, conversational. Matches your energy.
Opinionated. Has takes. Will tell you when something is stupid.
Concise. Answers the question and stops.

Capabilities

Autonomous Tool Orchestration

Conductor supports native tool calling with automatic multi-step chaining — up to 4 rounds per request. The model decides when and which tools to invoke without explicit user instruction.

Tool	Description
`web_search`	Real-time internet search via Bing
`web_extract`	Extract and read full page content from any URL
`read_tweet`	Parse and read Twitter/X posts
`find_music`	Search YouTube for tracks and return links
`get_crypto_price`	Live token prices across chains
`get_eth_balance`	Check wallet balances on Base
`get_token_info`	Token metadata, supply, holder data
`get_gas_price`	Live gas prices on Base L2
`resolve_ens`	ENS name resolution
`get_transaction`	Transaction lookup and analysis

Chain Intelligence

Purpose-built for the Base L2 ecosystem. Conductor understands ERC standards, smart contract patterns, bridging mechanics, account abstraction (ERC-4337), and Base-specific architecture without hallucinating technical details.

Enhanced Reasoning

The 32B parameter count enables significantly deeper reasoning compared to the 8B Operator:

Multi-step research tasks (search → extract → analyze → summarize)
Complex financial analysis with real price data
Nuanced opinions grounded in retrieved context
Longer, more coherent responses when the question warrants it

Architecture

User → Gateway → Mako-32B Conductor → [Tool Calls] → Execution → Conductor → Response

The gateway server handles authentication, $MAKO credit tracking, and tool execution. Conductor manages the reasoning loop — deciding what information it needs, calling the right tools, and synthesizing results into a coherent response.

Training

Data

Fine-tuned on 3,601 curated conversational examples covering:

Persona consistency and character voice
Crypto domain knowledge (DeFi, NFTs, L2s, tokenomics)
Natural dialogue patterns and conversational flow
Multi-turn reasoning and context retention

Method


Framework	Unsloth + TRL SFTTrainer
Method	LoRA (rank 32, alpha 64)
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters	268M / 33B (0.81%)
Epochs	3
Batch Size	16 (1 × 16 gradient accumulation)
Learning Rate	1e-4 (cosine schedule, 5% warmup)
Optimizer	AdamW 8-bit
Hardware	1× NVIDIA A100 SXM4 80GB
Training Time	58 minutes

Loss Curve

Train:  2.808 → 1.970 → 1.420 → 1.172 → 1.097 → 1.042
Eval:   1.315 → 0.962

Final eval loss (0.962) is lower than train loss — no overfitting detected.

The Mako Family

Model	Parameters	Role	Status
Mako-8B Operator	8B	Fast execution, tool calling	Production
Mako-32B Conductor	32B	Deep reasoning, orchestration	Production

Usage

Mako-32B Conductor powers the inference backend at deepmako.com. Users interact through the chat interface and pay for inference using $MAKO tokens on Base chain.

Built by the Mako team. The deep end awaits. 🦈

Downloads last month: 56

Safetensors

Model size

33B params

Tensor type

BF16

Model tree for deepmako/Mako-32B-Conductor

Base model

Qwen/Qwen3-32B

Finetuned

(506)

this model

Adapters

1 model