HALU!

This model was fine-tuned using TPU on Kaggle. The training script can be found in this repository by Locutusque.

GGUF: Static/Imatrix made available by mradermacher

Training Details:

The Llama-3 8B base model was fine-tuned using synthetic stories generated from the KoboldAI/Mistral-7B-Erebus-v3 model for English and the Obrolin/Cerpen-7B-v0.1 model for Indonesian.
The adapter from the fine-tuned Llama-3 8B model was merged with the RLHFlow/LLaMA3-iterative-DPO-final model.
The Llama-3 8B Instruct model was then fine-tuned with 30,000 examples.
The adapter from the fine-tuned Llama-3 8B Instruct model was merged with the model resulting from step 2.

First step training

BF16 with QLORA
Examples Used: 5K (4K English and 1K Indonesian)
Lora Rank: 64
Lora Alpha: 16
Lora Dropout: 0.05
Learning Rate: 1e-5

Three Epoch

Second step training

BF16 with QLORA
Examples Used: 30K (25K English and 5K Indonesian)
Lora Rank: 64
Lora Alpha: 32
Lora Dropout: 0.05
Learning Rate: 6e-5

One Epoch

Both of the adapters were merged with Kaggle T4, that's why the weight is in FP16

Datasets Used for the Final Model: 30K examples were taken from the following datasets:

Undi95/Capybara-ShareGPT
mpasila/LimaRP-PIPPA-Mix-8K-Context
abacusai/SystemChat-1.1
Sao10K/Claude-3-Opus-Instruct-15K
teknium/trismegistus-project
ChaiML/50k_duduk_convo
Private Datasets (Mostly Indonesian, 5K Examples)

Some of the examples that don't have "system" are combined to fill the 8K context.

Llama 3 Chat Template:

<|start_header_id|>system<|end_header_id|>

{system}<|eot_id|><|start_header_id|>user<|end_header_id|>

{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{response}<|eot_id|>

Notes:

This is an RP model.
This model's responses have strong GPT-ism bias, especially without using the 'system' prompt. I'm currently working on a DPO version to reduce it.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	70.43
AI2 Reasoning Challenge (25-Shot)	64.16
HellaSwag (10-Shot)	83.40
MMLU (5-Shot)	67.68
TruthfulQA (0-shot)	54.70
Winogrande (5-shot)	79.95
GSM8k (5-shot)	72.71

Hastagaras
/

Halu-8B-Llama3-v0.3

HALU!

Open LLM Leaderboard Evaluation Results

Collection including Hastagaras/Halu-8B-Llama3-v0.3

lahh

Evaluation results