MADLAD-400 INT8 Dynamic Quantized Model

Overview

This repository contains an INT8 dynamically quantized PyTorch version of the MADLAD-400 model, optimized for efficient multilingual machine translation and inference.

The model has been quantized to reduce memory consumption and improve CPU inference performance while maintaining high translation quality. It is intended for deployment in resource-constrained environments and applications requiring fast multilingual text translation.

Features

  • ๐Ÿš€ Dynamic INT8 quantization for reduced model size and faster inference.
  • ๐ŸŒ Supports multilingual translation across hundreds of languages.
  • ๐Ÿ’พ Lower memory footprint compared to the original FP32 model.
  • โšก Optimized for CPU execution.
  • ๐Ÿ”ง Compatible with PyTorch.

Model Information

Property Value
Base Model MADLAD-400
Quantization Dynamic INT8
Framework PyTorch
File Format .pt
Intended Task Multilingual Machine Translation
Inference Device CPU (Recommended)

Usage

Load the Model

import torch

model = torch.load(
    "madlad400_int8_dynamic.pt",
    map_location="cpu",
    weights_only=False
)

model.eval()

Replace the loading logic if your project uses a custom model wrapper or architecture.

Requirements

  • Python 3.10+
  • PyTorch
  • Transformers
  • SentencePiece (if applicable)
  • Other dependencies required by the original MADLAD-400 implementation

Example:

pip install torch transformers sentencepiece

Performance

Dynamic quantization provides several benefits:

  • Reduced disk storage requirements.
  • Lower RAM usage during inference.
  • Faster execution on CPU workloads.
  • Easier deployment in production environments.

Actual speed improvements depend on:

  • CPU architecture
  • Batch size
  • Sequence length
  • Runtime environment

Limitations

  • Designed primarily for CPU inference.
  • GPU acceleration may not provide significant benefits for dynamically quantized models.
  • Translation quality may differ slightly from the original full-precision model.

Intended Use

This model is suitable for:

  • Machine translation APIs
  • Research projects
  • Edge deployments
  • Offline translation systems
  • Resource-constrained environments

Training

This repository does not contain training code or datasets.

It only provides a dynamically quantized version of the pretrained MADLAD-400 model for inference.

License

Please refer to the original MADLAD-400 model license and ensure compliance before using this model in production or redistribution.

Acknowledgements

  • Google Research for the original MADLAD-400 model.
  • PyTorch for dynamic quantization support.
  • Hugging Face for model hosting and distribution.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ShadowMonarch07/madlad400-3b-mt-int8

Finetuned
(6)
this model