How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="notshekhar/markdown-1",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

markdown-1

VibeThinker-3B fine-tuned (LoRA, merged) for tool calling + long agent traces.

This repo contains the merged fp16 weights plus ready-to-run GGUF quants for llama.cpp / Ollama / LM Studio.

File Size Use
markdown-1-Q4_K_M.gguf ~1.9 GB smaller / faster, great default
markdown-1-Q8_0.gguf ~3.3 GB higher fidelity
model-*.safetensors ~6.2 GB merged fp16 (vLLM / transformers)

LoRA adapter only: notshekhar/vibethinker-finetuned-tool.

Run with llama.cpp

llama-cli -hf notshekhar/markdown-1:Q4_K_M -p "Hello"
# or local:
llama-cli -m markdown-1-Q4_K_M.gguf -p "Hello"

Run with Ollama

# Modelfile
printf 'FROM ./markdown-1-Q4_K_M.gguf\n' > Modelfile
ollama create markdown-1 -f Modelfile
ollama run markdown-1

Base reasoning model uses <think> traces and ChatML (<|im_start|>) with tool-calling via <tool_call> / <tool_response> blocks (see chat_template.jinja).

Downloads last month
69
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for notshekhar/markdown-1

Base model

Qwen/Qwen2.5-3B
Quantized
(53)
this model