Automatic Speech Recognition
Transformers
Safetensors
Persian
whisper
audio
speech
persian
nvfp4
compressed-tensors
quantized
8-bit precision
Instructions to use Reza2kn/vhdm_whisper-large-fa-v1-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Reza2kn/vhdm_whisper-large-fa-v1-NVFP4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="Reza2kn/vhdm_whisper-large-fa-v1-NVFP4")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("Reza2kn/vhdm_whisper-large-fa-v1-NVFP4") model = AutoModelForSpeechSeq2Seq.from_pretrained("Reza2kn/vhdm_whisper-large-fa-v1-NVFP4") - Notebooks
- Google Colab
- Kaggle
vhdm_whisper-large-fa-v1-NVFP4
NVFP4 (NVFP4, W4A4) post-training quantization of vhdm/whisper-large-fa-v1 β architecture: whisper.
- Format:
nvfp4-pack-quantized(compressed-tensors). 4-bit FP4 weights, per-block FP8 (E4M3) scales, per-tensor FP32 global scales; activations dynamically quantized to FP4. - Calibration: 32 Persian clips from
Reza2kn/persian-asr-eval-v0(held out from the WER eval set). - Hardware target: NVIDIA Blackwell tensor cores (sm_100+). Quantized on RTX 5080 Laptop (sm_120).
- Quantized layers: all Linear modules in the encoder/decoder (CTC
lm_head/proj_outleft full precision).
Eval β Reza2kn/persian-asr-eval-v0 (FLEURS-fa)
| Variant | WER β | CER β | clips | per-clip latency | peak VRAM |
|---|---|---|---|---|---|
| NVFP4 (this repo) | 15.25% | 5.99% | 200 | 653 ms | 2731 MiB |
Persian text normalization for WER/CER: NFKC, ZWNJ β space, ΩβΫ / ΩβΪ©, digit folding, punctuation stripping, whitespace collapse.
Usage
import torch
import soundfile as sf
from transformers import AutoProcessor, AutoModel
repo = "Reza2kn/vhdm_whisper-large-fa-v1-NVFP4"
processor = AutoProcessor.from_pretrained(repo)
# Load in bfloat16 β NVFP4 weights decompress to bf16 inside CompressedLinear.
model = AutoModel.from_pretrained(repo, dtype=torch.bfloat16).to("cuda").eval()
(See the original vhdm/whisper-large-fa-v1 model card for arch-specific decoding boilerplate.)
How it was made
llmcompressor QuantizationModifier(targets=["Linear"], scheme="NVFP4", ignore=...) β
compressed-tensors nvfp4-pack-quantized checkpoint.
License
Inherits the base model's license.
- Downloads last month
- 30
Model tree for Reza2kn/vhdm_whisper-large-fa-v1-NVFP4
Base model
openai/whisper-large-v3 Finetuned
openai/whisper-large-v3-turbo Finetuned
vhdm/whisper-large-fa-v1