Instructions to use BlueberryOreo/realtime_pipeline_runtime_models_20260605 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BlueberryOreo/realtime_pipeline_runtime_models_20260605 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BlueberryOreo/realtime_pipeline_runtime_models_20260605", filename="tts_gguf_model_base/qwen3_tts_predictor.q8_0.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use BlueberryOreo/realtime_pipeline_runtime_models_20260605 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0 # Run inference directly in the terminal: llama-cli -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0 # Run inference directly in the terminal: llama-cli -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
Use Docker
docker model run hf.co/BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
- LM Studio
- Jan
- Ollama
How to use BlueberryOreo/realtime_pipeline_runtime_models_20260605 with Ollama:
ollama run hf.co/BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
- Unsloth Studio
How to use BlueberryOreo/realtime_pipeline_runtime_models_20260605 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BlueberryOreo/realtime_pipeline_runtime_models_20260605 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BlueberryOreo/realtime_pipeline_runtime_models_20260605 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BlueberryOreo/realtime_pipeline_runtime_models_20260605 to start chatting
- Atomic Chat new
- Docker Model Runner
How to use BlueberryOreo/realtime_pipeline_runtime_models_20260605 with Docker Model Runner:
docker model run hf.co/BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
- Lemonade
How to use BlueberryOreo/realtime_pipeline_runtime_models_20260605 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BlueberryOreo/realtime_pipeline_runtime_models_20260605:Q8_0
Run and chat with the model
lemonade run user.realtime_pipeline_runtime_models_20260605-Q8_0
List all available models
lemonade list
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
runtime_model
本目录只用于当前机器上的本地模型入口软链接汇总,不提交到 git。
如果从百度网盘下载模型到本目录,请按下面的“标准文件名”重命名。
项目代码仓库:
https://github.com/BlueberryOreo/Wuhan-project.git
百度网盘分享:
- 分享目录:
runtime_models_20260605 - 链接:
https://pan.baidu.com/s/1iLfFaM6d4kwrFHjnvTQuEw - 提取码:
7uj8
当前默认运行链:
- VAD:
fsmn-vad - ASR-ZH:
Paraformer-zh - ASR-EN:
Whisper-large-v3-turbo - ASR-AUTO:
SenseVoice-Small - MT:
Torch Qwen2.5-3B-Instruct - TTS:
GGUF Qwen3-TTS
同时保留了可选 TTS 入口:
- TTS ONNX:
Qwen3-TTS-ONNX/
网盘文件名与本地标准文件名对照
把网盘目录 /AI/realtime_pipeline/runtime_models_20260605/ 里的文件或目录下载到这里后,改名如下:
| 网盘文件名 | 本地标准文件名 |
|---|---|
speech_fsmn_vad_zh-cn-16k-common-pytorch/ |
vad_fsmn/ |
speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/ |
asr_paraformer_zh/ |
models--openai--whisper-large-v3-turbo/ |
asr_whisper_large_v3_turbo/ |
SenseVoiceSmall/ |
asr_sensevoice_small/ |
Qwen2___5-3B-Instruct/ |
mt_torch_qwen25_3b_instruct/ |
model-base/ |
tts_gguf_model_base/ |
Qwen3-TTS-ONNX/ |
tts_onnx_root/ |
非模型运行时文件
下面这些不是网盘模型目录,需要从源码仓库或本机编译产物准备。
tts_gguf_root/
tts_gguf_root/ 指向 Qwen3-TTS-GGUF 仓库根目录。
仓库地址:
https://github.com/HaujetZhao/Qwen3-TTS-GGUF
准备方式:
cd /path/to/Wuhan-project/realtime_pipeline
git clone https://github.com/HaujetZhao/Qwen3-TTS-GGUF runtime_model/tts_gguf_root
如果机器上已经有这个仓库,也可以建软链接:
cd /path/to/Wuhan-project/realtime_pipeline
ln -s /path/to/Qwen3-TTS-GGUF runtime_model/tts_gguf_root
默认运行链需要的文件
tts_gguf_root/不是模型目录,它指向Qwen3-TTS-GGUF仓库根目录。- 如果只想先把当前默认运行链跑起来,最少需要准备:
vad_fsmn/asr_paraformer_zh/asr_whisper_large_v3_turbo/asr_sensevoice_small/mt_torch_qwen25_3b_instruct/tts_gguf_root/tts_gguf_model_base/
- 如果改跑 ONNX TTS,则用
tts_onnx_root/替代 GGUF TTS 相关目录。
- Downloads last month
- 1,760
8-bit