punct_cap_seg_en β CoreML (INT8)
CoreML conversion of 1-800-BAD-CODE/punct_cap_seg_en (Apache 2.0), a 52M-parameter BERT-style token classifier that predicts, per subtoken: post-punctuation (period, comma, question mark, acronym dotting), per-character true-casing, and sentence boundaries for English text. All credit for the model itself goes to the original author.
Built for the Babble dictation app, where it provides on-device live punctuation alongside NVIDIA Nemotron streaming ASR.
Contents
punctuation.mlmodelc/β compiled CoreML model, INT8 weights / FP32 activations (per-block 32 quantization)tokenizer.modelβ SentencePiece unigram tokenizer (32k lowercase English vocabulary; bos=1, eos=2, pad=3, unk=0), from the original repo (spe_32k_lc_en.model)
Model details
- Input:
input_ids, int32[1, 256], padded with pad id 3; BOS/EOS added. The graph computes its own attention mask from the input ids. - Outputs (argmax baked into the graph):
pre_preds[1,256],post_preds[1,256](labels: null, acronym,.,,,?),cap_preds[1,256,16](per-character capitalization),seg_preds[1,256](sentence boundaries). - Input text must be lowercase with whitespace collapsed to single spaces β the vocabulary contains no uppercase characters.
- Latency: ~6 ms per 256-token window on Apple Silicon (CoreML, all compute units).
Conversion notes
Converted via ONNX β PyTorch (onnx2torch) β coremltools, validated for
end-to-end text parity against the original ONNX model. INT8 weight
quantization preserves parity except for rare near-tie boundary decisions.
Do not reconvert with FP16 activations: the graph's internal attention-mask constant overflows in half precision and silently degrades output quality. Use FP32 activations (weight-only quantization is fine).
License
Apache 2.0, inherited from the original model.