Edit model card

mGPT-quantized

The concept: 8-bit quantized version of mGPT-13B, an LLM released by AI-Forever / Sberbank AI in 2022-2023.

On the GPT scale, it is between the # of parameters for GPT-2 and GPT-3, but comparison is tricky after training on 60+ languages.

My goal is to evaluate this on Hindi and Indonesian tasks, where there are fewer autoregressive language models in this size range.

For English: use a GPT model or LLaMa2-7B

For Arabic: in August 2023 I would recommend the bilingual JAIS model, which is also 13B parameters can be quantized.

In August 2023 AI-Forever added 1.3B-param models for 20+ languages. If your language is Mongolian, for example, it might be better to use mGPT-1.3B-mongol and not this one.

They also have a 1.3B param model for all languages, which I further quantized here: https://huggingface.co/monsoon-nlp/mGPT-quantized

How was the model created?

Quantization of mGPT-13B was done using bitsandbytes library, CoLab Pro with an A100 GPU, and a lot of space on Google Drive.

from transformers import BitsAndBytesConfig, GPT2LMHeadModel

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_compute_dtype=torch.bfloat16,
    bnb_8bit_use_double_quant=True,
    bnb_8bit_quant_type="nf4",
)

qmodel = GPT2LMHeadModel.from_pretrained(
    "ai-forever/mGPT-13B",
    load_in_8bit=True,
    torch_dtype=torch.bfloat16,
    quantization_config=quantization_config,
    device_map="auto"
)

qmodel.save_pretrained("model_name")

Future steps

  • mGPT could be further quantized (4-bit), but model.save_pretrained() currently throws a NotImplementedError error.
Downloads last month
13
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.