Malaysian Qwen2.5 1.5B-Instruct
Continue finetuning meta-llama/Llama-3.2-1B-Instruct on highly curated 1.2B tokens Malaysian instruction.
Improvement
- 128k context length.
- Support respond in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
- Able to code in Mandarin, Tamil, Jawi, Manglish, Johor, Kedah, Kelantan, Pahang, Perak, Sabah, Sarawak, Selangor, Negeri Sembilan and Terengganu.
- Multi-turn Malaysian context such as related to Malaysian Legislation, politics, religions and languages.
- Standard RAG.
MalayMMLU
Model Accuracy shot by_letter category
0 Malaysian-Llama-3.2-1B-Instruct 39.705280 0shot True STEM
1 Malaysian-Llama-3.2-1B-Instruct 42.286896 0shot True Language
2 Malaysian-Llama-3.2-1B-Instruct 41.196878 0shot True Social science
3 Malaysian-Llama-3.2-1B-Instruct 44.615016 0shot True Others
4 Malaysian-Llama-3.2-1B-Instruct 42.616610 0shot True Humanities
{'Social science': 6918, 'Language': 6288, 'Humanities': 4395, 'Others': 4169, 'STEM': 2443}
Model : Malaysian-Llama-3.2-1B-Instruct
Metric : first
Shot : 0shot
average accuracy 42.17569074464131
accuracy for STEM 39.70528039295947
accuracy for Language 42.286895674300254
accuracy for Social science 41.1968777103209
accuracy for Others 44.61501559126889
accuracy for Humanities 42.61660978384528
Training session
Finetune on mesolitica/Malaysian-SFT to make the model understand Malaysian context.
How we train
- LoRA on
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]
. - 256 Rank with alpha 512, or alpha of 2.0
- Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
- Forked CCE loss for LoRA
lm_head
to reduce memory consumption.
Source code at https://github.com/malaysia-ai/cooking/tree/main/llama/sft
Example
Load the model,
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
import torch
tokenizer = AutoTokenizer.from_pretrained('malaysia-ai/Malaysian-Llama-3.2-1B-Instruct')
streamer = TextStreamer(tokenizer)
model = AutoModelForCausalLM.from_pretrained(
'malaysia-ai/Malaysian-Llama-3.2-1B-Instruct', torch_dtype = torch.bfloat16
).cuda()
- All examples are using stochastic sampling method, might not able to reproduce the same results on different machines.
- Some examples might been truncated, too long for this README.
- Downloads last month
- 29
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.