Edit model card

MaLLaM πŸŒ™ 5B (Malaysia Large Language Model), Pretrain 5B 4096 context length on Malaysian text

Pretrain from scratch 5B parameters using Mistral architecture on 90B Malaysian text tokens.

README at https://github.com/mesolitica/malaya/tree/5.1/pretrained-model/mistral

WandB, https://wandb.ai/mesolitica/pretrain-mistral-5b?workspace=user-husein-mesolitica

WandB report, https://wandb.ai/mesolitica/pretrain-mistral-3b/reports/Pretrain-Larger-Malaysian-Mistral--Vmlldzo2MDkyOTgz

Technical report, https://github.com/mesolitica/malaya/wiki/MaLLaM-%F0%9F%8C%99-Malaysia-Large-Language-Model

how-to

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import torch

TORCH_DTYPE = 'bfloat16'
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=getattr(torch, TORCH_DTYPE)
)

tokenizer = AutoTokenizer.from_pretrained('mesolitica/mallam-5B-4096')
model = AutoModelForCausalLM.from_pretrained(
    'mesolitica/mallam-5B-4096',
    use_flash_attention_2 = True,
    quantization_config = nf4_config
)
prompt = '<s>nama saya'
inputs = tokenizer([prompt], return_tensors='pt', add_special_tokens=False).to('cuda')

generate_kwargs = dict(
    inputs,
    max_new_tokens=512,
    top_p=0.95,
    top_k=50,
    temperature=0.9,
    do_sample=True,
    num_beams=1,
    repetition_penalty=1.05,
)
r = model.generate(**generate_kwargs)
Downloads last month
595
Safetensors
Model size
5B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using mesolitica/mallam-5B-4096 1

Collection including mesolitica/mallam-5B-4096