CAM++ Speaker Embedder (voxceleb) Β· OpenASR
CAM++ speaker embedder powering OpenASR speaker diarization β who said what, fully on-device
Speaker-diarization support pack for the OpenASR runtime β pure-Rust inference, no Python at inference time.
β¨ Highlights
- π£οΈ Speaker diarization for every OpenASR model β install this pack and
--diarizelabels segments with anonymous speakers (SPEAKER_00, SPEAKER_01, ...) for any ASR family - 𧬠512-dim d-vectors β CAM++ (context-aware masking, D-TDNN) speaker embeddings from 3D-Speaker, trained on VoxCeleb
- π Diarization, not identification β anonymous session-relative labels; embeddings are discarded after the request and never transmitted
- π― Bit-exact packaging β single raw-f32 build; the pure-Rust forward pass reproduces the upstream ONNX embeddings (cosine β 1.000000)
- π¦ Native in OpenASR β
.oasrpacks run with no Python at inference, engineered for peak performance on CPU & GPU
π Quickstart
# 1. Install the OpenASR CLI Β· https://openasr.org
# 2. Pull the pack
openasr pull campplus-voxceleb:f32
# 3. Diarize any transcription (works with every OpenASR ASR model)
openasr transcribe meeting.wav --model xasr-zh-en --diarize --format srt
π¦ Pack
| Quant | File (.oasr) |
Size |
|---|---|---|
| f32 | campplus-voxceleb-f32.oasr |
29 MB |
Single raw-f32 build: the pure-Rust forward pass consumes f32 directly and the parity gates assert bit-exact outputs vs the upstream weights, so no integer quantization is produced.
π§ About CAM++ Speaker Embedder (voxceleb)
CAM++ is a fast, accurate speaker-verification network from the 3D-Speaker project: a
densely-connected time-delay network (D-TDNN) with context-aware masking that turns a speech
segment into a 512-dimensional speaker embedding. OpenASR uses it as the speaker-embedding stage
of its model-agnostic diarization pipeline β speech regions are embedded, clustered, and the
resulting anonymous speaker turns are attributed back onto whichever ASR model produced the
transcript. Install this pack (and optionally the pyannote segmentation pack for finer
speaker-change boundaries) and every OpenASR transcription gains --diarize, batch and realtime.
Enrollment is optional and local-only: openasr enroll-speaker stores one on-device centroid so
diarized output can relabel your own cluster.
βοΈ How this pack was made
Converted from iic/speech_campplus_sv_zh-cn_16k-common with the OpenASR importer:
openasr model-pack import-campplus-local <src>.safetensors <out>.oasr \
--package-id campplus-voxceleb
The .oasr container is GGUF-backed; every tensor is stored as raw f32 so the
pack round-trips bit-identically against the source weights.
βοΈ License
This pack inherits the upstream model's license: Apache-2.0 (source). OpenASR packaging retains the upstream copyright; the only modification is format conversion.
π Acknowledgements
This pack is a redistribution of the CAM++ speaker-verification model from 3D-Speaker (iic/speech_campplus_sv_zh-cn_16k-common), created and open-sourced by the Speech Lab of Alibaba's Institute for Intelligent Computing. All credit for the architecture, training, and weights belongs to the upstream authors; the license is inherited from and identical to the upstream model (Apache-2.0).
π Links
- π¦ OpenASR β https://github.com/QuintinShaw/OpenASR
- π Website β https://openasr.org
- π€ Upstream model β iic/speech_campplus_sv_zh-cn_16k-common