MADLAD-400 INT8 Dynamic Quantized Model
Overview
This repository contains an INT8 dynamically quantized PyTorch version of the MADLAD-400 model, optimized for efficient multilingual machine translation and inference.
The model has been quantized to reduce memory consumption and improve CPU inference performance while maintaining high translation quality. It is intended for deployment in resource-constrained environments and applications requiring fast multilingual text translation.
Features
- ๐ Dynamic INT8 quantization for reduced model size and faster inference.
- ๐ Supports multilingual translation across hundreds of languages.
- ๐พ Lower memory footprint compared to the original FP32 model.
- โก Optimized for CPU execution.
- ๐ง Compatible with PyTorch.
Model Information
| Property | Value |
|---|---|
| Base Model | MADLAD-400 |
| Quantization | Dynamic INT8 |
| Framework | PyTorch |
| File Format | .pt |
| Intended Task | Multilingual Machine Translation |
| Inference Device | CPU (Recommended) |
Usage
Load the Model
import torch
model = torch.load(
"madlad400_int8_dynamic.pt",
map_location="cpu",
weights_only=False
)
model.eval()
Replace the loading logic if your project uses a custom model wrapper or architecture.
Requirements
- Python 3.10+
- PyTorch
- Transformers
- SentencePiece (if applicable)
- Other dependencies required by the original MADLAD-400 implementation
Example:
pip install torch transformers sentencepiece
Performance
Dynamic quantization provides several benefits:
- Reduced disk storage requirements.
- Lower RAM usage during inference.
- Faster execution on CPU workloads.
- Easier deployment in production environments.
Actual speed improvements depend on:
- CPU architecture
- Batch size
- Sequence length
- Runtime environment
Limitations
- Designed primarily for CPU inference.
- GPU acceleration may not provide significant benefits for dynamically quantized models.
- Translation quality may differ slightly from the original full-precision model.
Intended Use
This model is suitable for:
- Machine translation APIs
- Research projects
- Edge deployments
- Offline translation systems
- Resource-constrained environments
Training
This repository does not contain training code or datasets.
It only provides a dynamically quantized version of the pretrained MADLAD-400 model for inference.
License
Please refer to the original MADLAD-400 model license and ensure compliance before using this model in production or redistribution.
Acknowledgements
- Google Research for the original MADLAD-400 model.
- PyTorch for dynamic quantization support.
- Hugging Face for model hosting and distribution.
Model tree for ShadowMonarch07/madlad400-3b-mt-int8
Base model
google/madlad400-3b-mt