Edit model card

HALU!

This model was fine-tuned using TPU on Kaggle. The training script can be found in this repository by Locutusque.

GGUF: Static/Imatrix made available by mradermacher

Here is the OAS version.

Training Details:

  1. The Llama-3 8B base model was fine-tuned using synthetic stories generated from the KoboldAI/Mistral-7B-Erebus-v3 model for English and the Obrolin/Cerpen-7B-v0.1 model for Indonesian.
  2. The adapter from the fine-tuned Llama-3 8B model was merged with the RLHFlow/LLaMA3-iterative-DPO-final model.
  3. The Llama-3 8B Instruct model was then fine-tuned with 30,000 examples.
  4. The adapter from the fine-tuned Llama-3 8B Instruct model was merged with the model resulting from step 2.

First step training

BF16 with QLORA
Examples Used: 5K (4K English and 1K Indonesian)
Lora Rank: 64
Lora Alpha: 16
Lora Dropout: 0.05
Learning Rate: 1e-5

Three Epoch

Second step training

BF16 with QLORA
Examples Used: 30K (25K English and 5K Indonesian)
Lora Rank: 64
Lora Alpha: 32
Lora Dropout: 0.05
Learning Rate: 6e-5

One Epoch

Both of the adapters were merged with Kaggle T4, that's why the weight is in FP16

Datasets Used for the Final Model: 30K examples were taken from the following datasets:

Some of the examples that don't have "system" are combined to fill the 8K context.

Llama 3 Chat Template:

<|start_header_id|>system<|end_header_id|>

{system}<|eot_id|><|start_header_id|>user<|end_header_id|>

{input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

{response}<|eot_id|>

Notes:

  • This is an RP model.
  • This model's responses have strong GPT-ism bias, especially without using the 'system' prompt. I'm currently working on a DPO version to reduce it.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 70.43
AI2 Reasoning Challenge (25-Shot) 64.16
HellaSwag (10-Shot) 83.40
MMLU (5-Shot) 67.68
TruthfulQA (0-shot) 54.70
Winogrande (5-shot) 79.95
GSM8k (5-shot) 72.71
Downloads last month
1,250
Safetensors
Model size
8.03B params
Tensor type
FP16
·

Collection including Hastagaras/Halu-8B-Llama3-v0.3

Evaluation results