Instructions to use kingkw1/minicpm-phonetic-evaluator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use kingkw1/minicpm-phonetic-evaluator with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="kingkw1/minicpm-phonetic-evaluator", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("kingkw1/minicpm-phonetic-evaluator", trust_remote_code=True, dtype="auto") - llama-cpp-python
How to use kingkw1/minicpm-phonetic-evaluator with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="kingkw1/minicpm-phonetic-evaluator", filename="minicpm-phonetic-evaluator-q4_k_m.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use kingkw1/minicpm-phonetic-evaluator with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M # Run inference directly in the terminal: llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M # Run inference directly in the terminal: llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
Use Docker
docker model run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use kingkw1/minicpm-phonetic-evaluator with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "kingkw1/minicpm-phonetic-evaluator" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kingkw1/minicpm-phonetic-evaluator", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M
- SGLang
How to use kingkw1/minicpm-phonetic-evaluator with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "kingkw1/minicpm-phonetic-evaluator" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kingkw1/minicpm-phonetic-evaluator", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "kingkw1/minicpm-phonetic-evaluator" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "kingkw1/minicpm-phonetic-evaluator", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use kingkw1/minicpm-phonetic-evaluator with Ollama:
ollama run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M
- Unsloth Studio
How to use kingkw1/minicpm-phonetic-evaluator with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kingkw1/minicpm-phonetic-evaluator to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for kingkw1/minicpm-phonetic-evaluator to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for kingkw1/minicpm-phonetic-evaluator to start chatting
- Atomic Chat new
- Docker Model Runner
How to use kingkw1/minicpm-phonetic-evaluator with Docker Model Runner:
docker model run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M
- Lemonade
How to use kingkw1/minicpm-phonetic-evaluator with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull kingkw1/minicpm-phonetic-evaluator:Q4_K_M
Run and chat with the model
lemonade run user.minicpm-phonetic-evaluator-Q4_K_M
List all available models
lemonade list
MiniCPM Phonetic Evaluator
kingkw1/minicpm-phonetic-evaluator is a small binary evaluator for Read-Along AI. Given a target reading item and an ASR transcript, it answers only True or False: whether the transcript is an acceptable phonetic match for what the child was asked to read.
The model is designed as a second-stage judge after a simple normalized exact-match check. Exact matches are accepted immediately by the app; this model is used for close or ambiguous transcripts, such as plurals, word-boundary splits, or plausible child-speech/ASR substitutions.
Model Details
- Developed by: Kevin King
- Model ID:
kingkw1/minicpm-phonetic-evaluator - Model type: Causal language model fine-tuned for binary text classification through instruction following
- Base model:
openbmb/MiniCPM-2B-sft-bf16 - Language: English
- Primary task: Phonetic accept/reject judgment for ASR transcripts in an early-reading app
- Project repository: https://github.com/kingkw1/read-along-ai
- Demo Space: https://huggingface.co/spaces/build-small-hackathon/read-along-ai
- Quantized local artifact:
minicpm-phonetic-evaluator-q4_k_m.gguf
Intended Use
Direct Use
Use this model to classify whether an ASR transcript preserves the target word or short target sentence closely enough to count as a valid read-aloud attempt. The expected prompt format is:
### Instruction:
Determine if the ASR transcript is a valid phonetic match for the target word. Output only True or False.
### Input:
Target: scientist | ASR: scientists
### Output:
The expected output is exactly one boolean token:
True
Downstream Use
In Read-Along AI, the evaluator sits behind an exact normalized string match:
- Normalize the target text and ASR transcript.
- Accept exact matches without calling the model.
- Ask the MiniCPM evaluator only when the transcript is close or ambiguous.
- Treat unparseable responses as
False.
This keeps the app fast for obvious correct readings and fail-closed for uncertain cases.
Out-of-Scope Use
This model is not a general pronunciation scorer, speech therapist, literacy diagnostic, grading system, or safety-critical educational assessment tool. It does not receive audio, phoneme timings, confidence scores, or child-specific context. It should not be used to make high-stakes decisions about reading ability.
How to Get Started
Transformers
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "kingkw1/minicpm-phonetic-evaluator"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
model.eval()
prompt = """### Instruction:
Determine if the ASR transcript is a valid phonetic match for the target word. Output only True or False.
### Input:
Target: scientist | ASR: scientists
### Output:
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
output_ids = model.generate(
**inputs,
max_new_tokens=8,
do_sample=False,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
generated = output_ids[0, inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(generated, skip_special_tokens=True).strip())
GGUF / llama.cpp
The Read-Along AI local path uses the Q4 GGUF artifact with llama-cpp-python. The application resolves the file from LOCAL_MINICPM_GGUF_PATH, models/gguf/minicpm-phonetic-evaluator-q4_k_m.gguf, or the Hugging Face model repository cache.
Training Details
Training Data
The fine-tuning dataset is data/train.jsonl in the Read-Along AI project repository. It contains 50 instruction examples derived from a cleaned 50-word child-speech ASR baseline set.
- 38 examples were exact ASR matches and became positive
Truelabels. - 12 strict-match failures were manually reviewed.
- 5 of those failures were labeled acceptable phonetic/ASR variants.
- 7 were labeled wrong-content or insufficient-evidence negatives.
- Final label balance: 43
True, 7False.
Example positive variants include:
scientist->scientistssunflower->sunny flowerswindow->wind up
Example negative variants include:
sudden->seveninvisible->and where's the ballpyramid->apparently
The dataset is intentionally small and task-specific. It was built for a hackathon MVP and should be expanded before broad deployment.
Training Procedure
Training was run with Modal using scripts/finetune_minicpm.py from the project repository.
- Base model:
openbmb/MiniCPM-2B-sft-bf16 - Trainer: TRL
SFTTrainer - Fine-tuning method: 4-bit parameter-efficient training with LoRA
- Quantization during training: bitsandbytes NF4, double quantization, bfloat16 compute
- LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
- Target modules:
q_proj,v_proj - Epochs: 5
- Batch size: 2 per device
- Gradient accumulation: 4
- Learning rate:
2e-4 - Optimizer:
paged_adamw_8bit - Warmup ratio: 0.03
- Max sequence length: 512
- Training hardware: Modal A100
After training, the LoRA adapter was merged into the base model and pushed to the Hugging Face model repository as safetensors. A local conversion script, scripts/convert_to_gguf.py, then converted the merged model to FP16 GGUF and quantized it to Q4_K_M for llama.cpp / llama-cpp-python inference.
Evaluation
The project evaluation is documented in notebooks/02_post_tuning_evaluation.ipynb.
Important caveat: this is a small provenance and smoke evaluation on the same 50 examples used to derive the training data, not a held-out benchmark. Treat these numbers as evidence that the integration works and that the model learned the intended boundary on the project examples, not as a generalization claim.
Results
| Evaluation view | Result |
|---|---|
| Baseline ASR exact-match acceptance | 38/50, 76.0% |
| ASR plus tuned MiniCPM acceptance | 42/50, 84.0% |
| Strict-match failures reviewed by model | 12 |
| Manual-label agreement on strict-match failures | 9/12, 75.0% |
| Overall agreement with manual labels, counting exact matches as true | 47/50, 94.0% |
On the 12 strict-match failures, the tuned evaluator accepted scientist -> scientists, sunflower -> sunny flowers, window -> wind up, and also accepted guessing -> yeah same. It rejected several intended negatives correctly, but it missed some manually accepted variants such as safari -> so far and compass -> come this.
Interpretation
The model improved the product acceptance path over strict string matching, but the error analysis shows the current dataset is too small and imbalanced. More labeled child-speech and ASR examples are needed, especially negative examples and sentence-level examples, before relying on this model outside the Read-Along AI prototype.
Bias, Risks, and Limitations
- The model was trained on only 50 examples and may overfit the specific words, ASR system, speaker, and labeling choices.
- Training data is word-level, while the app may also ask the same judge about short sentences. Sentence-level behavior needs more evaluation.
- The model sees text transcripts only. It does not hear the audio and cannot distinguish ASR errors from actual reading errors.
- Dialect, accent, age, articulation patterns, microphone quality, and ASR behavior can all affect transcripts and therefore model decisions.
- False positives can reward an incorrect reading; false negatives can frustrate a child who made a reasonable attempt.
- The base MiniCPM model can be prompt-sensitive. The application should use deterministic generation and parse only
True/False.
Recommendations
- Use exact normalized matching before calling the model.
- Use deterministic decoding with a very small
max_new_tokens. - Parse boolean outputs defensively and fail closed when the response is unclear.
- Keep feedback gentle and low-stakes.
- Add a larger held-out test set before using this beyond prototype or demo settings.
- Prefer human review for curriculum decisions or any high-impact educational assessment.
Technical Specifications
Architecture and Objective
The model is a MiniCPM causal language model fine-tuned with instruction-formatted examples. The objective is to produce a single boolean answer for a target/transcript pair.
Software
The training script pinned:
transformers==4.40.2peft==0.10.0trl==0.8.6accelerate==0.29.3bitsandbytesdatasetssentencepiece
The Modal inference endpoint loads the model with AutoModelForCausalLM and AutoTokenizer using trust_remote_code=True. The local offline path uses the Q4_K_M GGUF through llama-cpp-python.
License
The Read-Along AI project code is MIT licensed. The fine-tuned model is derived from openbmb/MiniCPM-2B-sft-bf16; use of the model weights is subject to the upstream MiniCPM model license terms. The upstream MiniCPM card states that repository code is Apache-2.0 and that MiniCPM model weights are governed by the General Model License, with academic use allowed and commercial use requiring authorization from ModelBest/OpenBMB.
Review the base model license before redistributing or using this derivative model commercially:
Citation
If you use this model, cite the base MiniCPM work as requested by OpenBMB:
@inproceedings{minicpm2024,
title={MiniCPM: Unveiling the Potential of End-side Large Language Models},
booktitle={OpenBMB Blog},
year={2024}
}
Model Card Authors
Kevin King, with drafting support from OpenAI Codex.
Contact
Hugging Face: https://huggingface.co/kingkw1
- Downloads last month
- 67
Model tree for kingkw1/minicpm-phonetic-evaluator
Base model
openbmb/MiniCPM-2B-sft-bf16