muellerzr
/

llama-3-8B-self-instruct-LoRA

PEFT

Safetensors

llama

Generated from Trainer

Model card Files Files and versions Community

muellerzr HF staff commited on May 24, 2024

Commit

6d1f6b2

verified ·

1 Parent(s): 96f90aa

Upload axolotl_config.yml with huggingface_hub

Browse files

Files changed (1) hide show

axolotl_config.yml +107 -2

axolotl_config.yml CHANGED Viewed

@@ -1,3 +1,21 @@
 base_model: llama3-8B
 model_type: LlamaForCausalLM
 tokenizer_type: AutoTokenizer
@@ -53,7 +71,7 @@ group_by_length: false
 bf16: auto
 fp16:
 tf32: false
-chat_template: chatml
 gradient_checkpointing: true
 gradient_checkpointing_kwargs:
@@ -94,4 +112,91 @@ tokens:
   - "<|im_end|>"
 lora_modules_to_save:
   - embed_tokens
-  - lm_head

+---
+library_name: peft
+tags:
+- generated_from_trainer
+base_model: llama3-8B
+model-index:
+- name: qlora_decrease_lr_promptfix
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.0`
+```yaml
 base_model: llama3-8B
 model_type: LlamaForCausalLM
 tokenizer_type: AutoTokenizer
 bf16: auto
 fp16:
 tf32: false
+chat_template: alpaca
 gradient_checkpointing: true
 gradient_checkpointing_kwargs:
   - "<|im_end|>"
 lora_modules_to_save:
   - embed_tokens
+  - lm_head
+```
+</details><br>
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/muellerzr/llama-3-8b-self-align-axolotl/runs/2q8jhm3e)
+# qlora_decrease_lr_promptfix
+This model was trained from scratch on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.4121
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 2
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 32
+- total_eval_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 100
+- num_epochs: 4
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.6903        | 0.0061 | 1    | 0.6706          |
+| 0.6463        | 0.1285 | 21   | 0.6392          |
+| 0.4944        | 0.2571 | 42   | 0.4806          |
+| 0.4495        | 0.3856 | 63   | 0.4532          |
+| 0.4444        | 0.5142 | 84   | 0.4406          |
+| 0.4185        | 0.6427 | 105  | 0.4334          |
+| 0.4336        | 0.7712 | 126  | 0.4286          |
+| 0.4061        | 0.8998 | 147  | 0.4252          |
+| 0.4002        | 1.0145 | 168  | 0.4221          |
+| 0.4013        | 1.1431 | 189  | 0.4205          |
+| 0.3674        | 1.2716 | 210  | 0.4189          |
+| 0.3942        | 1.4002 | 231  | 0.4175          |
+| 0.3984        | 1.5287 | 252  | 0.4165          |
+| 0.3867        | 1.6572 | 273  | 0.4150          |
+| 0.3872        | 1.7858 | 294  | 0.4137          |
+| 0.401         | 1.9143 | 315  | 0.4130          |
+| 0.3602        | 2.0275 | 336  | 0.4126          |
+| 0.3817        | 2.1561 | 357  | 0.4131          |
+| 0.3592        | 2.2846 | 378  | 0.4129          |
+| 0.3729        | 2.4132 | 399  | 0.4127          |
+| 0.372         | 2.5417 | 420  | 0.4121          |
+| 0.3685        | 2.6702 | 441  | 0.4120          |
+| 0.3732        | 2.7988 | 462  | 0.4115          |
+| 0.38          | 2.9273 | 483  | 0.4112          |
+| 0.3637        | 3.0413 | 504  | 0.4114          |
+| 0.3628        | 3.1699 | 525  | 0.4118          |
+| 0.355         | 3.2984 | 546  | 0.4122          |
+| 0.3646        | 3.4269 | 567  | 0.4121          |
+| 0.3496        | 3.5555 | 588  | 0.4121          |
+| 0.3573        | 3.6840 | 609  | 0.4121          |
+| 0.3598        | 3.8125 | 630  | 0.4121          |
+| 0.3669        | 3.9411 | 651  | 0.4121          |
+### Framework versions
+- PEFT 0.11.1
+- Transformers 4.42.0.dev0
+- Pytorch 2.3.0+cu118
+- Datasets 2.19.1
+- Tokenizers 0.19.1