--- base_model: unsloth/meta-llama-3.1-8b-instruct-bnb-4bit library_name: peft license: llama3.1 tags: - trl - sft - unsloth - generated_from_trainer model-index: - name: meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001 results: [] --- [Visualize in Weights & Biases](https://wandb.ai/nicola-er-ho/clembench-playpen-sft/runs/vk1p42l3) # meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001 This model is a fine-tuned version of [unsloth/meta-llama-3.1-8b-instruct-bnb-4bit](https://huggingface.co/unsloth/meta-llama-3.1-8b-instruct-bnb-4bit) on the None dataset. ## Model description The Model is trained only on successful episodes produced by the top 10 models from the clembench benchmark version 0.9 and 1.0. The success was measured in terms of most overall successful episodes across all games. | Place | Item | |-------|------| | 1 | gpt-4-0613-t0.0--gpt-4-0613-t0.0 | | 2 | claude-v1.3-t0.0--claude-v1.3-t0.0 | | 3 | gpt-4-1106-preview-t0.0--gpt-4-1106-preview-t0.0 | | 4 | gpt-4-t0.0--gpt-4-t0.0 | | 5 | gpt-4-0314-t0.0--gpt-4-0314-t0.0 | | 6 | claude-2.1-t0.0--claude-2.1-t0.0 | | 7 | gpt-4-t0.0--gpt-3.5-turbo-t0.0 | | 8 | claude-2-t0.0--claude-2-t0.0 | | 9 | gpt-3.5-turbo-1106-t0.0--gpt-3.5-turbo-1106-t0.0 | | 10 | gpt-3.5-turbo-0613-t0.0--gpt-3.5-turbo-0613-t0.0 | ## Intended uses & limitations More information needed ## Training and evaluation data Traning Data: D20001 ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 4 - eval_batch_size: 8 - seed: 7331 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.03 - lr_scheduler_warmup_steps: 5 - num_epochs: 1 ### Training results ### Framework versions - PEFT 0.12.0 - Transformers 4.44.2 - Pytorch 2.4.0+cu121 - Datasets 2.21.0 - Tokenizers 0.19.1