Instructions to use kingkw1/minicpm-phonetic-evaluator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kingkw1/minicpm-phonetic-evaluator with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kingkw1/minicpm-phonetic-evaluator", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("kingkw1/minicpm-phonetic-evaluator", trust_remote_code=True, dtype="auto")

llama-cpp-python

How to use kingkw1/minicpm-phonetic-evaluator with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="kingkw1/minicpm-phonetic-evaluator",
	filename="minicpm-phonetic-evaluator-q4_k_m.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use kingkw1/minicpm-phonetic-evaluator with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf kingkw1/minicpm-phonetic-evaluator:Q4_K_M

Use Docker

docker model run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M

LM Studio
Jan

vLLM

How to use kingkw1/minicpm-phonetic-evaluator with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kingkw1/minicpm-phonetic-evaluator"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kingkw1/minicpm-phonetic-evaluator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M

SGLang

How to use kingkw1/minicpm-phonetic-evaluator with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kingkw1/minicpm-phonetic-evaluator" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kingkw1/minicpm-phonetic-evaluator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kingkw1/minicpm-phonetic-evaluator" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kingkw1/minicpm-phonetic-evaluator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use kingkw1/minicpm-phonetic-evaluator with Ollama:
```
ollama run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M
```

Unsloth Studio

How to use kingkw1/minicpm-phonetic-evaluator with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kingkw1/minicpm-phonetic-evaluator to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for kingkw1/minicpm-phonetic-evaluator to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for kingkw1/minicpm-phonetic-evaluator to start chatting

Atomic Chat new
Docker Model Runner
How to use kingkw1/minicpm-phonetic-evaluator with Docker Model Runner:
```
docker model run hf.co/kingkw1/minicpm-phonetic-evaluator:Q4_K_M
```

Lemonade

How to use kingkw1/minicpm-phonetic-evaluator with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull kingkw1/minicpm-phonetic-evaluator:Q4_K_M

Run and chat with the model

lemonade run user.minicpm-phonetic-evaluator-Q4_K_M

List all available models

lemonade list

MiniCPM Phonetic Evaluator

kingkw1/minicpm-phonetic-evaluator is a small binary evaluator for Read-Along AI. Given a target reading item and an ASR transcript, it answers only True or False: whether the transcript is an acceptable phonetic match for what the child was asked to read.

The model is designed as a second-stage judge after a simple normalized exact-match check. Exact matches are accepted immediately by the app; this model is used for close or ambiguous transcripts, such as plurals, word-boundary splits, or plausible child-speech/ASR substitutions.

Model Details

Developed by: Kevin King
Model ID: kingkw1/minicpm-phonetic-evaluator
Model type: Causal language model fine-tuned for binary text classification through instruction following
Base model: openbmb/MiniCPM-2B-sft-bf16
Language: English
Primary task: Phonetic accept/reject judgment for ASR transcripts in an early-reading app
Project repository: https://github.com/kingkw1/read-along-ai
Demo Space: https://huggingface.co/spaces/build-small-hackathon/read-along-ai
Quantized local artifact: minicpm-phonetic-evaluator-q4_k_m.gguf

Intended Use

Direct Use

Use this model to classify whether an ASR transcript preserves the target word or short target sentence closely enough to count as a valid read-aloud attempt. The expected prompt format is:

### Instruction:
Determine if the ASR transcript is a valid phonetic match for the target word. Output only True or False.

### Input:
Target: scientist | ASR: scientists

### Output:

The expected output is exactly one boolean token:

True

Downstream Use

In Read-Along AI, the evaluator sits behind an exact normalized string match:

Normalize the target text and ASR transcript.
Accept exact matches without calling the model.
Ask the MiniCPM evaluator only when the transcript is close or ambiguous.
Treat unparseable responses as False.

This keeps the app fast for obvious correct readings and fail-closed for uncertain cases.

Out-of-Scope Use

This model is not a general pronunciation scorer, speech therapist, literacy diagnostic, grading system, or safety-critical educational assessment tool. It does not receive audio, phoneme timings, confidence scores, or child-specific context. It should not be used to make high-stakes decisions about reading ability.

How to Get Started

Transformers

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "kingkw1/minicpm-phonetic-evaluator"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model.eval()

prompt = """### Instruction:
Determine if the ASR transcript is a valid phonetic match for the target word. Output only True or False.

### Input:
Target: scientist | ASR: scientists

### Output:
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
    output_ids = model.generate(
        **inputs,
        max_new_tokens=8,
        do_sample=False,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

generated = output_ids[0, inputs["input_ids"].shape[-1]:]
print(tokenizer.decode(generated, skip_special_tokens=True).strip())

GGUF / llama.cpp

The Read-Along AI local path uses the Q4 GGUF artifact with llama-cpp-python. The application resolves the file from LOCAL_MINICPM_GGUF_PATH, models/gguf/minicpm-phonetic-evaluator-q4_k_m.gguf, or the Hugging Face model repository cache.

Training Details

Training Data

The fine-tuning dataset is data/train.jsonl in the Read-Along AI project repository. It contains 50 instruction examples derived from a cleaned 50-word child-speech ASR baseline set.

38 examples were exact ASR matches and became positive True labels.
12 strict-match failures were manually reviewed.
5 of those failures were labeled acceptable phonetic/ASR variants.
7 were labeled wrong-content or insufficient-evidence negatives.
Final label balance: 43 True, 7 False.

Example positive variants include:

scientist -> scientists
sunflower -> sunny flowers
window -> wind up

Example negative variants include:

sudden -> seven
invisible -> and where's the ball
pyramid -> apparently

The dataset is intentionally small and task-specific. It was built for a hackathon MVP and should be expanded before broad deployment.

Training Procedure

Training was run with Modal using scripts/finetune_minicpm.py from the project repository.

Base model: openbmb/MiniCPM-2B-sft-bf16
Trainer: TRL SFTTrainer
Fine-tuning method: 4-bit parameter-efficient training with LoRA
Quantization during training: bitsandbytes NF4, double quantization, bfloat16 compute
LoRA rank: 16
LoRA alpha: 32
LoRA dropout: 0.05
Target modules: q_proj, v_proj
Epochs: 5
Batch size: 2 per device
Gradient accumulation: 4
Learning rate: 2e-4
Optimizer: paged_adamw_8bit
Warmup ratio: 0.03
Max sequence length: 512
Training hardware: Modal A100

After training, the LoRA adapter was merged into the base model and pushed to the Hugging Face model repository as safetensors. A local conversion script, scripts/convert_to_gguf.py, then converted the merged model to FP16 GGUF and quantized it to Q4_K_M for llama.cpp / llama-cpp-python inference.

Evaluation

The project evaluation is documented in notebooks/02_post_tuning_evaluation.ipynb.

Important caveat: this is a small provenance and smoke evaluation on the same 50 examples used to derive the training data, not a held-out benchmark. Treat these numbers as evidence that the integration works and that the model learned the intended boundary on the project examples, not as a generalization claim.

Results

Evaluation view	Result
Baseline ASR exact-match acceptance	38/50, 76.0%
ASR plus tuned MiniCPM acceptance	42/50, 84.0%
Strict-match failures reviewed by model	12
Manual-label agreement on strict-match failures	9/12, 75.0%
Overall agreement with manual labels, counting exact matches as true	47/50, 94.0%

On the 12 strict-match failures, the tuned evaluator accepted scientist -> scientists, sunflower -> sunny flowers, window -> wind up, and also accepted guessing -> yeah same. It rejected several intended negatives correctly, but it missed some manually accepted variants such as safari -> so far and compass -> come this.

Interpretation

The model improved the product acceptance path over strict string matching, but the error analysis shows the current dataset is too small and imbalanced. More labeled child-speech and ASR examples are needed, especially negative examples and sentence-level examples, before relying on this model outside the Read-Along AI prototype.

Bias, Risks, and Limitations

The model was trained on only 50 examples and may overfit the specific words, ASR system, speaker, and labeling choices.
Training data is word-level, while the app may also ask the same judge about short sentences. Sentence-level behavior needs more evaluation.
The model sees text transcripts only. It does not hear the audio and cannot distinguish ASR errors from actual reading errors.
Dialect, accent, age, articulation patterns, microphone quality, and ASR behavior can all affect transcripts and therefore model decisions.
False positives can reward an incorrect reading; false negatives can frustrate a child who made a reasonable attempt.
The base MiniCPM model can be prompt-sensitive. The application should use deterministic generation and parse only True/False.

Recommendations

Use exact normalized matching before calling the model.
Use deterministic decoding with a very small max_new_tokens.
Parse boolean outputs defensively and fail closed when the response is unclear.
Keep feedback gentle and low-stakes.
Add a larger held-out test set before using this beyond prototype or demo settings.
Prefer human review for curriculum decisions or any high-impact educational assessment.

Technical Specifications

Architecture and Objective

The model is a MiniCPM causal language model fine-tuned with instruction-formatted examples. The objective is to produce a single boolean answer for a target/transcript pair.

Software

The training script pinned:

transformers==4.40.2
peft==0.10.0
trl==0.8.6
accelerate==0.29.3
bitsandbytes
datasets
sentencepiece

The Modal inference endpoint loads the model with AutoModelForCausalLM and AutoTokenizer using trust_remote_code=True. The local offline path uses the Q4_K_M GGUF through llama-cpp-python.

License

The Read-Along AI project code is MIT licensed. The fine-tuned model is derived from openbmb/MiniCPM-2B-sft-bf16; use of the model weights is subject to the upstream MiniCPM model license terms. The upstream MiniCPM card states that repository code is Apache-2.0 and that MiniCPM model weights are governed by the General Model License, with academic use allowed and commercial use requiring authorization from ModelBest/OpenBMB.

Review the base model license before redistributing or using this derivative model commercially:

https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16

Citation

If you use this model, cite the base MiniCPM work as requested by OpenBMB:

@inproceedings{minicpm2024,
  title={MiniCPM: Unveiling the Potential of End-side Large Language Models},
  booktitle={OpenBMB Blog},
  year={2024}
}

Model Card Authors

Kevin King, with drafting support from OpenAI Codex.

Contact

Hugging Face: https://huggingface.co/kingkw1

Downloads last month: 67

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for kingkw1/minicpm-phonetic-evaluator

Base model

openbmb/MiniCPM-2B-sft-bf16

Quantized

(3)

this model

kingkw1
/

minicpm-phonetic-evaluator