Wav2Vec2 Malayalam INT8 (Quantized)

This is a dynamically quantized INT8 version of gvs/wav2vec2-large-xlsr-malayalam.

This version is highly optimized for CPU inference, reducing size to approx 338.16 MB.

INT8 Quantized Wav2Vec2 Malayalam

This is a dynamically quantized (INT8) version of gvs/wav2vec2-large-xlsr-malayalam. It drastically reduces the model size for CPU deployment while maintaining near-original accuracy.

How to Load and Use This Model

Because this model uses PyTorch's native dynamic quantization, it cannot be loaded using the standard from_pretrained method alone. You must build the base architecture, quantize it to match, and then load the weights.

import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
from huggingface_hub import hf_hub_download

repo_id = "trysem/wav2vec2-malayalam-int8"

# 1. Load the processor and empty base architecture from this repo
processor = Wav2Vec2Processor.from_pretrained(repo_id)
base_model = Wav2Vec2ForCTC.from_pretrained(repo_id)

# 2. Apply dynamic quantization to the skeleton
quantized_model = torch.quantization.quantize_dynamic(
    base_model, {torch.nn.Linear}, dtype=torch.qint8
)

# 3. Download and load the INT8 weights
weight_path = hf_hub_download(repo_id=repo_id, filename="quantized_model_int8.pt")
quantized_model.load_state_dict(torch.load(weight_path))

print("Model successfully loaded!")
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support