markdown-1

VibeThinker-3B fine-tuned (LoRA, merged) for tool calling + long agent traces.

This repo contains the merged fp16 weights plus ready-to-run GGUF quants for llama.cpp / Ollama / LM Studio.

File Size Use
markdown-1-Q4_K_M.gguf ~1.9 GB smaller / faster, great default
markdown-1-Q8_0.gguf ~3.3 GB higher fidelity
model-*.safetensors ~6.2 GB merged fp16 (vLLM / transformers)

LoRA adapter only: notshekhar/vibethinker-finetuned-tool.

Run with llama.cpp

llama-cli -hf notshekhar/markdown-1:Q4_K_M -p "Hello"
# or local:
llama-cli -m markdown-1-Q4_K_M.gguf -p "Hello"

Run with Ollama

# Modelfile
printf 'FROM ./markdown-1-Q4_K_M.gguf\n' > Modelfile
ollama create markdown-1 -f Modelfile
ollama run markdown-1

Base reasoning model uses <think> traces and ChatML (<|im_start|>) with tool-calling via <tool_call> / <tool_response> blocks (see chat_template.jinja).

Downloads last month
43
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for notshekhar/markdown-1

Base model

Qwen/Qwen2.5-3B
Quantized
(53)
this model