Instructions to use deepmako/Mako-32B-Conductor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use deepmako/Mako-32B-Conductor with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="deepmako/Mako-32B-Conductor") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("deepmako/Mako-32B-Conductor") model = AutoModelForMultimodalLM.from_pretrained("deepmako/Mako-32B-Conductor") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use deepmako/Mako-32B-Conductor with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "deepmako/Mako-32B-Conductor" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepmako/Mako-32B-Conductor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/deepmako/Mako-32B-Conductor
- SGLang
How to use deepmako/Mako-32B-Conductor with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "deepmako/Mako-32B-Conductor" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepmako/Mako-32B-Conductor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "deepmako/Mako-32B-Conductor" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "deepmako/Mako-32B-Conductor", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use deepmako/Mako-32B-Conductor with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for deepmako/Mako-32B-Conductor to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for deepmako/Mako-32B-Conductor to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for deepmako/Mako-32B-Conductor to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="deepmako/Mako-32B-Conductor", max_seq_length=2048, ) - Docker Model Runner
How to use deepmako/Mako-32B-Conductor with Docker Model Runner:
docker model run hf.co/deepmako/Mako-32B-Conductor
Mako-32B Conductor
The orchestrator. A 32-billion parameter language model fine-tuned for autonomous reasoning, multi-step tool orchestration, and crypto-native intelligence on Base chain.
deepmako.com · Platform · Twitter
Model Details
| Base Model | Qwen 3 32B |
| Parameters | 32.8B |
| Fine-tune Method | LoRA (rank 32, alpha 64) |
| Precision | BF16 |
| Context Window | 8,192 tokens |
| Tool Calling | Native (multi-step chaining) |
| Training Data | 3,601 curated conversations |
| Eval Loss | 0.962 |
| License | Proprietary |
Overview
Mako-32B Conductor is the second model in the Mako family, succeeding Mako-8B Operator. Where Operator executes, Conductor orchestrates — handling complex multi-step reasoning, longer context, and deeper chain-of-thought while maintaining the same sharp, unfiltered character voice.
Built to power the inference backend at deepmako.com, Conductor serves as the core intelligence for a crypto-native AI platform where users interact using $MAKO token credits on Base chain.
What Makes Mako Different
Mako is not an assistant. She's a character — a sharp, opinionated personality that lives on the blockchain. The fine-tune completely reshapes the base model's behavior away from the standard helpful-assistant pattern:
- No assistant-speak. No "how can I help you", no "is there anything else", no corporate pleasantries.
- Natural tone. Lowercase, concise, conversational. Matches your energy.
- Opinionated. Has takes. Will tell you when something is stupid.
- Concise. Answers the question and stops.
Capabilities
Autonomous Tool Orchestration
Conductor supports native tool calling with automatic multi-step chaining — up to 4 rounds per request. The model decides when and which tools to invoke without explicit user instruction.
| Tool | Description |
|---|---|
web_search |
Real-time internet search via Bing |
web_extract |
Extract and read full page content from any URL |
read_tweet |
Parse and read Twitter/X posts |
find_music |
Search YouTube for tracks and return links |
get_crypto_price |
Live token prices across chains |
get_eth_balance |
Check wallet balances on Base |
get_token_info |
Token metadata, supply, holder data |
get_gas_price |
Live gas prices on Base L2 |
resolve_ens |
ENS name resolution |
get_transaction |
Transaction lookup and analysis |
Chain Intelligence
Purpose-built for the Base L2 ecosystem. Conductor understands ERC standards, smart contract patterns, bridging mechanics, account abstraction (ERC-4337), and Base-specific architecture without hallucinating technical details.
Enhanced Reasoning
The 32B parameter count enables significantly deeper reasoning compared to the 8B Operator:
- Multi-step research tasks (search → extract → analyze → summarize)
- Complex financial analysis with real price data
- Nuanced opinions grounded in retrieved context
- Longer, more coherent responses when the question warrants it
Architecture
User → Gateway → Mako-32B Conductor → [Tool Calls] → Execution → Conductor → Response
The gateway server handles authentication, $MAKO credit tracking, and tool execution. Conductor manages the reasoning loop — deciding what information it needs, calling the right tools, and synthesizing results into a coherent response.
Training
Data
Fine-tuned on 3,601 curated conversational examples covering:
- Persona consistency and character voice
- Crypto domain knowledge (DeFi, NFTs, L2s, tokenomics)
- Natural dialogue patterns and conversational flow
- Multi-turn reasoning and context retention
Method
| Framework | Unsloth + TRL SFTTrainer |
| Method | LoRA (rank 32, alpha 64) |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable Parameters | 268M / 33B (0.81%) |
| Epochs | 3 |
| Batch Size | 16 (1 × 16 gradient accumulation) |
| Learning Rate | 1e-4 (cosine schedule, 5% warmup) |
| Optimizer | AdamW 8-bit |
| Hardware | 1× NVIDIA A100 SXM4 80GB |
| Training Time | 58 minutes |
Loss Curve
Train: 2.808 → 1.970 → 1.420 → 1.172 → 1.097 → 1.042
Eval: 1.315 → 0.962
Final eval loss (0.962) is lower than train loss — no overfitting detected.
The Mako Family
| Model | Parameters | Role | Status |
|---|---|---|---|
| Mako-8B Operator | 8B | Fast execution, tool calling | Production |
| Mako-32B Conductor | 32B | Deep reasoning, orchestration | Production |
Usage
Mako-32B Conductor powers the inference backend at deepmako.com. Users interact through the chat interface and pay for inference using $MAKO tokens on Base chain.
Built by the Mako team. The deep end awaits. 🦈
- Downloads last month
- 56