Instructions to use amitashwini/mumble-cleanup-2stage with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use amitashwini/mumble-cleanup-2stage with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="amitashwini/mumble-cleanup-2stage", filename="mumble-cleanup-2stage-f16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use amitashwini/mumble-cleanup-2stage with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf amitashwini/mumble-cleanup-2stage:F16 # Run inference directly in the terminal: llama cli -hf amitashwini/mumble-cleanup-2stage:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf amitashwini/mumble-cleanup-2stage:F16 # Run inference directly in the terminal: llama cli -hf amitashwini/mumble-cleanup-2stage:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf amitashwini/mumble-cleanup-2stage:F16 # Run inference directly in the terminal: ./llama-cli -hf amitashwini/mumble-cleanup-2stage:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf amitashwini/mumble-cleanup-2stage:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf amitashwini/mumble-cleanup-2stage:F16
Use Docker
docker model run hf.co/amitashwini/mumble-cleanup-2stage:F16
- LM Studio
- Jan
- Ollama
How to use amitashwini/mumble-cleanup-2stage with Ollama:
ollama run hf.co/amitashwini/mumble-cleanup-2stage:F16
- Unsloth Studio
How to use amitashwini/mumble-cleanup-2stage with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for amitashwini/mumble-cleanup-2stage to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for amitashwini/mumble-cleanup-2stage to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for amitashwini/mumble-cleanup-2stage to start chatting
- Pi
How to use amitashwini/mumble-cleanup-2stage with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf amitashwini/mumble-cleanup-2stage:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "amitashwini/mumble-cleanup-2stage:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use amitashwini/mumble-cleanup-2stage with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf amitashwini/mumble-cleanup-2stage:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default amitashwini/mumble-cleanup-2stage:F16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use amitashwini/mumble-cleanup-2stage with Docker Model Runner:
docker model run hf.co/amitashwini/mumble-cleanup-2stage:F16
- Lemonade
How to use amitashwini/mumble-cleanup-2stage with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull amitashwini/mumble-cleanup-2stage:F16
Run and chat with the model
lemonade run user.mumble-cleanup-2stage-F16
List all available models
lemonade list
mumble-cleanup-2stage (Echo Flow AI)
A small fine-tuned language model that cleans speech-to-text dictation transcripts. LoRA fine-tune of Qwen/Qwen2.5-0.5B-Instruct trained in two stages:
- Stage 1 (pretrain): 50,000 synthetic (raw, clean) pairs from the Echo Flow combinatorial template generator — varied noise profiles, domain coverage (emails, meetings, tasks, code/URLs, lists, negation, dates, proper nouns).
- Stage 2 (fine-tune): 638 hand-curated real-style pairs from
adikuma/mumble-cleanup-dataset, with a 10× lower learning rate (2e-5) to preserve the no-reword/no-hallucination contract.
Result on the Echo Flow DictationQuality golden corpus: 10/10 pass rate (vs. 9/10 for the original mumble-cleanup-q4km).
What it does
Given a raw transcript from an ASR system (lowercase, no punctuation, fillers and stutters preserved), it returns a cleaned version with proper capitalization, punctuation, and disfluencies removed. It does not paraphrase, summarize, or add content.
Example: um so i i think we should ship this on uh friday becomes I think we should ship this on Friday.
Files
mumble-cleanup-2stage-q4km.gguf— Q4_K_M quantized, 379 MB, for use with llama.cpp / Echo Flow appmumble-cleanup-2stage-f16.gguf— FP16 reference, 988 MBadapter/adapters.safetensors— LoRA adapter (r=16, alpha=32, q/k/v/o + gate/up/down)config.json,tokenizer.json,chat_template.jinja— for tokenizer/chat-format
Training
- Base:
Qwen/Qwen2.5-0.5B-Instruct(Apache-2.0) - Method: LoRA SFT via mlx-lm on Apple Silicon
- Stage 1: lr=2e-4, batch 4, grad_accum 4, 2000 iters, lora_r=16, all 16 layers
- Stage 2: lr=2e-5, batch 2, grad_accum 8, 600 iters, resume from stage-1 adapter
- Loss: completion-only (mask-prompt)
- Precision: MLX native (FP16/bf16 on Metal)
Use with llama.cpp / Echo Flow
The Echo Flow macOS app downloads mumble-cleanup-2stage-q4km.gguf directly. For manual use:
llama-cli -m mumble-cleanup-2stage-q4km.gguf \
-p "<|im_start|>system
You are a transcript cleanup tool. You receive raw speech to text output and return a cleaned version. Remove filler words and disfluencies (um, uh, er, ah, like as filler, you know), remove repeated words and false starts, and fix punctuation and capitalization. Do not reword, do not add anything the speaker did not say, and do not answer questions in the text. Output only the cleaned text.<|im_end|>
<|im_start|>user
um so i i think we should ship this on uh friday<|im_end|>
<|im_start|>assistant
" \
--temp 0
Limitations
- English only.
- Trained primarily on synthetic data with a small real fine-tune; real ASR output may have failure modes not modeled.
- Designed for short-to-medium dictation (up to ~512 tokens). Longer inputs must be chunked.
- The model can occasionally over-correct when a user genuinely intends a fragment.
License
Apache-2.0. The base Qwen2.5-0.5B-Instruct is also Apache-2.0.
- Downloads last month
- 43