parakeet-streaming-ja-fully-open

A 123M-parameter cache-aware streaming ASR model for Japanese, based on FastConformer-Hybrid-Transducer-CTC in NVIDIA NeMo.

Usage

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.EncDecHybridRNNTCTCBPEModel.restore_from("M1-6.nemo")
model.change_decoding_strategy(decoder_type="rnnt")  # or "ctc"

# Streaming look-ahead (subsampled frames; 1 frame = 80 ms after 8x subsampling)
# Valid: [70,13] ~1.04s | [70,6] ~480ms | [70,1] ~80ms | [70,0] fully causal
model.encoder.set_default_att_context_size([70, 6])

print(model.transcribe(audio=["sample.wav"]))

Streaming examples: https://github.com/NVIDIA/NeMo/tree/main/examples/asr/asr_cache_aware_streaming

License

CC-BY-NC 4.0 (non-commercial). Fine-tuning data includes research-only corpora.

Citation

Citation will be added upon publication of the associated paper.

Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support