Wav2Vec2 Malayalam INT8 (Quantized)
This is a dynamically quantized INT8 version of gvs/wav2vec2-large-xlsr-malayalam.
This version is highly optimized for CPU inference, reducing size to approx 338.16 MB.
INT8 Quantized Wav2Vec2 Malayalam
This is a dynamically quantized (INT8) version of gvs/wav2vec2-large-xlsr-malayalam. It drastically reduces the model size for CPU deployment while maintaining near-original accuracy.
How to Load and Use This Model
Because this model uses PyTorch's native dynamic quantization, it cannot be loaded using the standard from_pretrained method alone. You must build the base architecture, quantize it to match, and then load the weights.
import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
from huggingface_hub import hf_hub_download
repo_id = "trysem/wav2vec2-malayalam-int8"
# 1. Load the processor and empty base architecture from this repo
processor = Wav2Vec2Processor.from_pretrained(repo_id)
base_model = Wav2Vec2ForCTC.from_pretrained(repo_id)
# 2. Apply dynamic quantization to the skeleton
quantized_model = torch.quantization.quantize_dynamic(
base_model, {torch.nn.Linear}, dtype=torch.qint8
)
# 3. Download and load the INT8 weights
weight_path = hf_hub_download(repo_id=repo_id, filename="quantized_model_int8.pt")
quantized_model.load_state_dict(torch.load(weight_path))
print("Model successfully loaded!")
- Downloads last month
- 38