CAM++ Speaker Embedder (voxceleb) Β· OpenASR

CAM++ speaker embedder powering OpenASR speaker diarization β€” who said what, fully on-device

License Format Runtime Base model

Speaker-diarization support pack for the OpenASR runtime β€” pure-Rust inference, no Python at inference time.


✨ Highlights

  • πŸ—£οΈ Speaker diarization for every OpenASR model β€” install this pack and --diarize labels segments with anonymous speakers (SPEAKER_00, SPEAKER_01, ...) for any ASR family
  • 🧬 512-dim d-vectors β€” CAM++ (context-aware masking, D-TDNN) speaker embeddings from 3D-Speaker, trained on VoxCeleb
  • πŸ”’ Diarization, not identification β€” anonymous session-relative labels; embeddings are discarded after the request and never transmitted
  • 🎯 Bit-exact packaging β€” single raw-f32 build; the pure-Rust forward pass reproduces the upstream ONNX embeddings (cosine β‰ˆ 1.000000)
  • πŸ¦€ Native in OpenASR β€” .oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

πŸš€ Quickstart

# 1. Install the OpenASR CLI  Β·  https://openasr.org
# 2. Pull the pack
openasr pull campplus-voxceleb:f32

# 3. Diarize any transcription (works with every OpenASR ASR model)
openasr transcribe meeting.wav --model xasr-zh-en --diarize --format srt

πŸ“¦ Pack

Quant File (.oasr) Size
f32 campplus-voxceleb-f32.oasr 29 MB

Single raw-f32 build: the pure-Rust forward pass consumes f32 directly and the parity gates assert bit-exact outputs vs the upstream weights, so no integer quantization is produced.

🧠 About CAM++ Speaker Embedder (voxceleb)

CAM++ is a fast, accurate speaker-verification network from the 3D-Speaker project: a densely-connected time-delay network (D-TDNN) with context-aware masking that turns a speech segment into a 512-dimensional speaker embedding. OpenASR uses it as the speaker-embedding stage of its model-agnostic diarization pipeline β€” speech regions are embedded, clustered, and the resulting anonymous speaker turns are attributed back onto whichever ASR model produced the transcript. Install this pack (and optionally the pyannote segmentation pack for finer speaker-change boundaries) and every OpenASR transcription gains --diarize, batch and realtime. Enrollment is optional and local-only: openasr enroll-speaker stores one on-device centroid so diarized output can relabel your own cluster.

βš™οΈ How this pack was made

Converted from iic/speech_campplus_sv_zh-cn_16k-common with the OpenASR importer:

openasr model-pack import-campplus-local <src>.safetensors <out>.oasr \
  --package-id campplus-voxceleb

The .oasr container is GGUF-backed; every tensor is stored as raw f32 so the pack round-trips bit-identically against the source weights.

βš–οΈ License

This pack inherits the upstream model's license: Apache-2.0 (source). OpenASR packaging retains the upstream copyright; the only modification is format conversion.

πŸ™ Acknowledgements

This pack is a redistribution of the CAM++ speaker-verification model from 3D-Speaker (iic/speech_campplus_sv_zh-cn_16k-common), created and open-sourced by the Speech Lab of Alibaba's Institute for Intelligent Computing. All credit for the architecture, training, and weights belongs to the upstream authors; the license is inherited from and identical to the upstream model (Apache-2.0).

πŸ”— Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support