Instructions to use amitashwini/mumble-cleanup-2stage with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use amitashwini/mumble-cleanup-2stage with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="amitashwini/mumble-cleanup-2stage",
	filename="mumble-cleanup-2stage-f16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use amitashwini/mumble-cleanup-2stage with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf amitashwini/mumble-cleanup-2stage:F16
# Run inference directly in the terminal:
llama cli -hf amitashwini/mumble-cleanup-2stage:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf amitashwini/mumble-cleanup-2stage:F16
# Run inference directly in the terminal:
llama cli -hf amitashwini/mumble-cleanup-2stage:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf amitashwini/mumble-cleanup-2stage:F16
# Run inference directly in the terminal:
./llama-cli -hf amitashwini/mumble-cleanup-2stage:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf amitashwini/mumble-cleanup-2stage:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf amitashwini/mumble-cleanup-2stage:F16

Use Docker

docker model run hf.co/amitashwini/mumble-cleanup-2stage:F16

LM Studio
Jan
Ollama
How to use amitashwini/mumble-cleanup-2stage with Ollama:
```
ollama run hf.co/amitashwini/mumble-cleanup-2stage:F16
```

Unsloth Studio

How to use amitashwini/mumble-cleanup-2stage with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for amitashwini/mumble-cleanup-2stage to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for amitashwini/mumble-cleanup-2stage to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for amitashwini/mumble-cleanup-2stage to start chatting

How to use amitashwini/mumble-cleanup-2stage with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf amitashwini/mumble-cleanup-2stage:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "amitashwini/mumble-cleanup-2stage:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use amitashwini/mumble-cleanup-2stage with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf amitashwini/mumble-cleanup-2stage:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default amitashwini/mumble-cleanup-2stage:F16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use amitashwini/mumble-cleanup-2stage with Docker Model Runner:
```
docker model run hf.co/amitashwini/mumble-cleanup-2stage:F16
```

Lemonade

How to use amitashwini/mumble-cleanup-2stage with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull amitashwini/mumble-cleanup-2stage:F16

Run and chat with the model

lemonade run user.mumble-cleanup-2stage-F16

List all available models

lemonade list

mumble-cleanup-2stage (Echo Flow AI)

A small fine-tuned language model that cleans speech-to-text dictation transcripts. LoRA fine-tune of Qwen/Qwen2.5-0.5B-Instruct trained in two stages:

Stage 1 (pretrain): 50,000 synthetic (raw, clean) pairs from the Echo Flow combinatorial template generator — varied noise profiles, domain coverage (emails, meetings, tasks, code/URLs, lists, negation, dates, proper nouns).
Stage 2 (fine-tune): 638 hand-curated real-style pairs from adikuma/mumble-cleanup-dataset, with a 10× lower learning rate (2e-5) to preserve the no-reword/no-hallucination contract.

Result on the Echo Flow DictationQuality golden corpus: 10/10 pass rate (vs. 9/10 for the original mumble-cleanup-q4km).

What it does

Given a raw transcript from an ASR system (lowercase, no punctuation, fillers and stutters preserved), it returns a cleaned version with proper capitalization, punctuation, and disfluencies removed. It does not paraphrase, summarize, or add content.

Example: um so i i think we should ship this on uh friday becomes I think we should ship this on Friday.

Files

mumble-cleanup-2stage-q4km.gguf — Q4_K_M quantized, 379 MB, for use with llama.cpp / Echo Flow app
mumble-cleanup-2stage-f16.gguf — FP16 reference, 988 MB
adapter/adapters.safetensors — LoRA adapter (r=16, alpha=32, q/k/v/o + gate/up/down)
config.json, tokenizer.json, chat_template.jinja — for tokenizer/chat-format

Training

Base: Qwen/Qwen2.5-0.5B-Instruct (Apache-2.0)
Method: LoRA SFT via mlx-lm on Apple Silicon
Stage 1: lr=2e-4, batch 4, grad_accum 4, 2000 iters, lora_r=16, all 16 layers
Stage 2: lr=2e-5, batch 2, grad_accum 8, 600 iters, resume from stage-1 adapter
Loss: completion-only (mask-prompt)
Precision: MLX native (FP16/bf16 on Metal)

Use with llama.cpp / Echo Flow

The Echo Flow macOS app downloads mumble-cleanup-2stage-q4km.gguf directly. For manual use:

llama-cli -m mumble-cleanup-2stage-q4km.gguf \
  -p "<|im_start|>system
You are a transcript cleanup tool. You receive raw speech to text output and return a cleaned version. Remove filler words and disfluencies (um, uh, er, ah, like as filler, you know), remove repeated words and false starts, and fix punctuation and capitalization. Do not reword, do not add anything the speaker did not say, and do not answer questions in the text. Output only the cleaned text.<|im_end|>
<|im_start|>user
um so i i think we should ship this on uh friday<|im_end|>
<|im_start|>assistant
" \
  --temp 0

Limitations

English only.
Trained primarily on synthetic data with a small real fine-tune; real ASR output may have failure modes not modeled.
Designed for short-to-medium dictation (up to ~512 tokens). Longer inputs must be chunked.
The model can occasionally over-correct when a user genuinely intends a fragment.

License

Apache-2.0. The base Qwen2.5-0.5B-Instruct is also Apache-2.0.

Downloads last month: 43

Safetensors

Model size

0.5B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amitashwini/mumble-cleanup-2stage

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Quantized

(233)

this model