wanadzhar913/malaysian-mistral-llmasajudge-v2

Model Details

This model was originally developed as part of the 1st place solution for the AI Tinkerer's Hackathon in Kuala Lumpur for an LLM-as-a-Judge use case.

We have finetuned mesolitica/malaysian-mistral-7b-32k-instructions-v4 for a Natural language inference (NLI) task. In our case, NLI is the task of determining whether a "hypothesis" is true (entailment) or false (contradiction) given a question-statement pair. We select this model primarily due to it's:

Context length of 32,000. This refers to the maximum number of tokens (including words, punctuation, and spaces) that the model can consider at once during input processing. A high context length is important since we'll be doing NLI for text pairs of various length.
No. of monthly downloads on HuggingFace. The consistently high num. of downloads on a monthly basis is a good proxy for model quality.
Good ability to comprehend Malay and English texts due to being Instruction-finetuned beforehand.

Training Details

Overall, solely training on the Boolq-Malay dataset (comprised of both Malay and English versions of the original Boolq dataset) and Google Colab's A100 GPU (40GB VRAM), we use the following training parameters and obtain the following training results:

No. of Epochs: 0.504
Per Device Train Batch Size: 4
Gradient Accumulation Steps: 1
LoRA Rank: 64
Learning Rate: 2e-4
Learning Rate Scheduler Type: constant
Maximum Sequence Lenght: 32768
Load model in 4-bit Precision: True
bf16 (Brain Floating Point 16-bit): False
Train Loss: 0.0524

The training notebook can be found here: https://github.com/wanadzhar913/aitinkerers-hackathon-supa-team-werecooked/blob/master/notebooks-finetuning-models/02_finetune_v2_malaysian_mistral_7b_32k_instructions_v4.ipynb

The Weights and Biases training run can be found here: https://wandb.ai/adzhar-faiq/finetune-malaysian-mistral-llmasajudge-v2

For NLI benchmarks specifically, the benchmarking notebook can be found here: https://github.com/wanadzhar913/aitinkerers-hackathon-supa-team-werecooked/blob/master/notebooks-benchmarking-exercises/03_benchmark_malaysian_mistral_llmasajudge_v2.ipynb

We achieve the following metrics on the validation dataset:

Language	Accuracy (%)	F1 Score (%)	Precision (%)	Recall (%)
Malay + English	64.8	74.0	65.3	85.3
Malay	64.3	73.6	64.6	85.4

In the future, we can do the following to garner better results:

Set bf16 parameter to True to optimize compute efficiency without significantly sacrificing model accuracy.
Increase the gradient_accumulation_steps to deal with the small GPU constraints or increase the batch_size if we've access to a larger GPU. The reasoning is mainly to avoid Out of Memory Errors (OOM).
Given more compute resources, we can also increase our patience variable and train for more than 10 epochs.

Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, \
                         BitsAndBytesConfig, pipeline

TORCH_DTYPE = 'bfloat16'

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=getattr(torch, TORCH_DTYPE)
)

tokenizer = AutoTokenizer.from_pretrained('wanadzhar913/malaysian-mistral-llmasajudge-v2')
model = AutoModelForCausalLM.from_pretrained(
    'wanadzhar913/malaysian-mistral-llmasajudge-v2',
    use_flash_attention_2 = True,
    quantization_config = nf4_config
)

pipe = pipeline(
    "text-generation",
    tokenizer = tokenizer,
    model=model,
    device=0,
)

# create a prompt template
prompt = """Anda adalah pakar dalam mengesan ketidakkonsistenan fakta dan halusinasi. Anda akan diberi satu dokumen dan satu soalan. Baca
dokumen dan soalan/kenyataan yang diberikan dengan teliti dan kenal pasti Ketidakkonsistenan Fakta (iaitu mana-mana soalan/kenyataan yang
tidak disokong atau bercanggah dengan maklumat dalam dokumen).

### Anda perlu memilih antara dua pilihan berikut:
- Tidak Konsisten dengan Fakta: Jika mana-mana soalan/kenyataan tidak disokong, terjawab atau bercanggah dengan dokumen, labelkannya sebagai 0.
- Konsisten dengan Fakta: Jika semua soalan/kenyataan disokong/terjawab oleh dokumen, labelkannya sebagai 1.

### Sebagai contoh:
Dokumen: "Gajah adalah mamalia besar yang biasanya ditemui di Afrika dan Asia. Mereka hidup dalam kumpulan yang dikenali sebagai kawanan dan terkenal kerana mempunyai ingatan yang baik."

Soalan/Kenyataan: "Gajah adalah mamalia besar yang biasanya ditemui di Eropah."
Jawapan: {{'consistency': 0}}

Soalan/Kenyataan: "Gajah adalah mamalia besar yang biasanya ditemui di Afrika dan Asia."
Jawapan: {{'consistency': 1}}

### Jawab berdasarkan dokumen dan soalan/kenyataan berikut:
Dokumen: {passage}
Soalan/Kenyataan: {question}

Kembalikan pilihan konsistenan dalam format JSON untuk pilihan yang diberikan. Sebagai contoh: {{'consistency': 1}} atau {{'consistency': 0}}"""

# https://www.thestar.com.my/business/business-news/2024/10/23/strong-support-for-chip-sector-under-budget-2025
passage_english = """
KUALA LUMPUR: Budget 2025 has set aside sizeable funds, both fiscal and non-fiscal, to ensure the success of the National Semiconductor Strategy (NSS), which is part of the New Industrial Master Plan 2030 (NIMP 2030), says Investment, Trade and Industry (Miti) Minister Tengku Datuk Seri Zafrul Abdul Aziz.

Among the initiatives announced in the budget, he said were the RM1bil sovereign fund for the electrical and electronics sector and high-value activities as well as training funds allocated for several universities.

Apart from that, he said there are initiatives to support mid-tier companies as well as tax incentives for companies in the industry.

“I think we are on track (to achieve the target set in NIMP 2030). You have seen exports continue to grow in these sectors as well.

“And if you look at the just-announced report card for our NIMP 2030, we should see positive growth by year-end, and growth in the manufacturing sector has contributed close to a 5% increase to our gross domestic product this year,” he said this during an interview with CNBC Asia Squawk Box yesterday.

Tengku Zafrul was commenting on the progress of the NSS and NIMP.

When asked how the new tax would help finance the bigger budget of RM421bil, he said that apart from the tax on dividends as well as the larger scope of sales and service tax, emphasis is given on cost discipline, for instance, via the merging of several agencies under Miti.

“Yes, I am quite confident that we will meet the budget estimate. We have been meeting our deficit target, for example, and I think we will hopefully achieve it (fiscal target) in 2024,” he said.

The ministry will also continue with initiatives to drive trade and investments to spur the country’s growth, added Tengku Zafrul. — Bernama"""

question_english = "Zafrul will not meet the budget deficit."

pipe(
    prompt.format(passage=passage_english, question=question_english),
    max_new_tokens = 8,
    return_full_text=False,
    temperature = 0.1,
    do_sample = True,
    top_p = 0.97,
    top_k = 50,
)[0]['generated_text']

# you'll probably have to use some regex to parse the outputs
>>> {"consistency": 0}

wanadzhar913
/

malaysian-mistral-llmasajudge-v2

Model Details

Training Details

Usage

Model tree for wanadzhar913/malaysian-mistral-llmasajudge-v2

Dataset used to train wanadzhar913/malaysian-mistral-llmasajudge-v2