See axolotl config
axolotl version: 0.4.1
base_model: Dans-DiscountModels/Meta-Llama-3.1-8B-ChatML
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code:
# wandb configuration
wandb_project: l3.1-8b-dans-instruct
wandb_watch:
wandb_run_id: attempt-03
wandb_log_model:
# push checkpoints to hub
hub_model_id: anthracite-core/Dans-L3.1-Test
# how to push checkpoints to hub
# https://huggingface.co/docs/transformers/v4.31.0/en/main_classes/trainer#transformers.TrainingArguments.hub_strategy
hub_strategy: "all_checkpoints"
# Whether to use hf `use_auth_token` for loading datasets. Useful for fetching private datasets
# Required to be true when used in combination with `push_dataset_to_hub`
hf_use_auth_token: true
# where to save the finished model to
output_dir: ./l3.1-8b-dans-instruct
# dataset settings (local or huggingface repo)
datasets:
- path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
type: dan-chat
- path: AquaV/Energetic-Materials-Sharegpt
type: dan-chat
- path: AquaV/Chemical-Biological-Safety-Applications-Sharegpt
type: dan-chat
- path: AquaV/US-Army-Survival-Sharegpt
type: dan-chat
- path: AquaV/Resistance-Sharegpt
type: dan-chat
- path: AquaV/Interrogation-Sharegpt
type: dan-chat
- path: AquaV/Multi-Environment-Operations-Sharegpt
type: dan-chat
- path: PocketDoc/Dans-Mathmaxx
type: dan-chat
- path: PJMixers/Math-Multiturn-1K-ShareGPT
type: dan-chat
- path: PocketDoc/Dans-Benchmaxx
type: dan-chat
- path: PocketDoc/Dans-Codemaxx-LeetCode
type: dan-chat
- path: PocketDoc/Dans-Codemaxx-CodeFeedback-Conversations
type: dan-chat
- path: PocketDoc/Dans-Codemaxx-CodeFeedback-SingleTurn
type: dan-chat
- path: PocketDoc/Dans-Taskmaxx
type: dan-chat
- path: PocketDoc/Dans-Taskmaxx-DataPrepper
type: dan-chat
- path: PocketDoc/Dans-Taskmaxx-ConcurrentQA-Reworked
type: dan-chat
- path: PocketDoc/Dans-Toolmaxx-Agent
type: dan-chat
- path: PocketDoc/Dans-Toolmaxx-ShellCommands
type: dan-chat
- path: PocketDoc/Dans-ASCIIMaxx-Wordart
type: dan-chat
- path: PocketDoc/Dans-Prosemaxx-Gutenberg
type: dan-chat
- path: PocketDoc/Dans-Prosemaxx-Cowriter-XS
type: dan-chat
- path: PocketDoc/Dans-Prosemaxx-Adventure
type: dan-chat
- path: PocketDoc/Dans-Prosemaxx-Opus-Writing
type: dan-chat
- path: PocketDoc/Dans-Assistantmaxx-Sharegpt
type: dan-chat
- path: PocketDoc/Dans-Assistantmaxx-OpenAssistant2
type: dan-chat
- path: PocketDoc/Dans-Assistantmaxx-Opus-instruct-1
type: dan-chat
- path: PocketDoc/Dans-Assistantmaxx-Opus-instruct-2
type: dan-chat
- path: PocketDoc/Dans-Assistantmaxx-Opus-instruct-3
type: dan-chat
- path: PocketDoc/Dans-Assistantmaxx-NoRobots
type: dan-chat
- path: PocketDoc/Dans-Personamaxx
type: dan-chat
- path: PocketDoc/DansTestYard
type: completion
chat_template: chatml
plugins:
- axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_swiglu: true
liger_fused_linear_cross_entropy: true
load_in_8bit: false
load_in_4bit: false
strict: false
dataset_prepared_path: ./l3.1-8b-dans-instruct-data
val_set_size: 0.0
sequence_len: 8192
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
gradient_accumulation_steps: 2
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.0000015
cosine_min_lr_ratio:
adam_beta1: 0.9
adam_beta2: 0.95
adam_epsilon: 0.00000001
weight_decay: 0.01
max_grad_norm: 20
train_on_inputs: false
group_by_length: true
bf16: true
fp16: false
tf32: false
early_stopping_patience:
resume_from_checkpoint:
auto_resume_from_checkpoints:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_ratio: 0.1
saves_per_epoch: 2
debug: false
deepspeed: deepspeed_configs/zero2.json
fsdp:
fsdp_config:
special_tokens:
pad_token: <|finetune_right_pad_id|>
eos_token: <|im_end|>
Dans-L3.1-Test
This model is a fine-tuned version of Dans-DiscountModels/Meta-Llama-3.1-8B-ChatML on the None dataset.
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1.5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 230
- num_epochs: 3
Training results
Framework versions
- Transformers 4.45.0.dev0
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1
- Downloads last month
- 7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Dans-DiscountModels/Dans-Instruct-Mix-8b-ChatML-V0.2.0
Base model
Dans-DiscountModels/Meta-Llama-3.1-8B-ChatML