MADLAD-400 INT8 Dynamic Quantized Model

Overview

This repository contains an INT8 dynamically quantized PyTorch version of the MADLAD-400 model, optimized for efficient multilingual machine translation and inference.

The model has been quantized to reduce memory consumption and improve CPU inference performance while maintaining high translation quality. It is intended for deployment in resource-constrained environments and applications requiring fast multilingual text translation.

Features

🚀 Dynamic INT8 quantization for reduced model size and faster inference.
🌍 Supports multilingual translation across hundreds of languages.
💾 Lower memory footprint compared to the original FP32 model.
⚡ Optimized for CPU execution.
🔧 Compatible with PyTorch.

Model Information

Property	Value
Base Model	MADLAD-400
Quantization	Dynamic INT8
Framework	PyTorch
File Format	`.pt`
Intended Task	Multilingual Machine Translation
Inference Device	CPU (Recommended)

Usage

Load the Model

import torch

model = torch.load(
    "madlad400_int8_dynamic.pt",
    map_location="cpu",
    weights_only=False
)

model.eval()

Replace the loading logic if your project uses a custom model wrapper or architecture.

Requirements

Python 3.10+
PyTorch
Transformers
SentencePiece (if applicable)
Other dependencies required by the original MADLAD-400 implementation

Example:

pip install torch transformers sentencepiece

Performance

Dynamic quantization provides several benefits:

Reduced disk storage requirements.
Lower RAM usage during inference.
Faster execution on CPU workloads.
Easier deployment in production environments.

Actual speed improvements depend on:

CPU architecture
Batch size
Sequence length
Runtime environment

Limitations

Designed primarily for CPU inference.
GPU acceleration may not provide significant benefits for dynamically quantized models.
Translation quality may differ slightly from the original full-precision model.

Intended Use

This model is suitable for:

Machine translation APIs
Research projects
Edge deployments
Offline translation systems
Resource-constrained environments

Training

This repository does not contain training code or datasets.

It only provides a dynamically quantized version of the pretrained MADLAD-400 model for inference.

License

Please refer to the original MADLAD-400 model license and ensure compliance before using this model in production or redistribution.

Acknowledgements

Google Research for the original MADLAD-400 model.
PyTorch for dynamic quantization support.
Hugging Face for model hosting and distribution.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for ShadowMonarch07/madlad400-3b-mt-int8

Base model

google/madlad400-3b-mt

Finetuned

(6)

this model