PEFT
Safetensors
gemma2
axolotl
Generated from Trainer
4-bit precision
bitsandbytes

Built with Axolotl

See axolotl config

axolotl version: 0.8.0.dev0

# === Start-up Commands ===
# curl -LsSf https://astral.sh/uv/install.sh | sh
# export PATH="$HOME/.local/bin:$PATH"
# git clone https://github.com/axolotl-ai-cloud/axolotl
# cd axolotl
# git checkout d8b4027200de0fe60f4ae0a71272c1a8cb2888f7
# uv venv
# source .venv/bin/activate
# uv pip install packaging ninja setuptools huggingface_hub[cli,hf_transfer]
# uv pip install "cut-cross-entropy[transformers] @ git+https://github.com/apple/ml-cross-entropy.git@24fbe4b5dab9a6c250a014573613c1890190536c"
# uv pip install apollo-torch
# uv pip install --no-build-isolation -e .[flash-attn,deepspeed]
# uv pip install git+https://github.com/huggingface/transformers.git
# export HF_HUB_ENABLE_HF_TRANSFER=1
# huggingface-cli login --token $hf_key && wandb login $wandb_key
# axolotl preprocess qwen21-pretrain.yml
# axolotl train qwen21-pretrain.yml

# curl -LsSf https://astral.sh/uv/install.sh | sh && export PATH="$HOME/.local/bin:$PATH" && git clone https://github.com/axolotl-ai-cloud/axolotl && cd axolotl && uv venv && source .venv/bin/activate && uv pip install torch==2.5.1 packaging ninja setuptools huggingface_hub[cli,hf_transfer] && uv pip install "cut-cross-entropy[transformers] @ git+https://github.com/apple/ml-cross-entropy.git@24fbe4b5dab9a6c250a014573613c1890190536c" && uv pip install apollo-torch && uv pip install --no-build-isolation -e .[flash-attn,deepspeed] && uv pip install git+https://github.com/huggingface/transformers.git && export HF_HUB_ENABLE_HF_TRANSFER=1 && cd .. && huggingface-cli login --token $hf_key && wandb login $wandb_key

# === Model Configuration ===
base_model: Columbidae/gemma-2-24b-pruned
load_in_8bit: false
load_in_4bit: true

# === HF Configuration === 
hub_model_id: Columbidae/gemma-2-24b-retrained-base
hub_strategy: "every_save"

# === Training Setup ===
num_epochs: 1
micro_batch_size: 3
gradient_accumulation_steps: 2
sequence_len: 8192
sample_packing: true
pad_to_sequence_len: true

# === Evaluation ===
#val_set_size: 100
#evals_per_epoch: 10
#eval_table_size:
#eval_max_new_tokens: 256
#eval_sample_packing: true
eval_strategy: "no"

# === LoRA Configuration ===
adapter: qlora
lora_model_dir:
lora_r: 64
lora_alpha: 64
lora_dropout: 0.5
lora_target_linear: 
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

#lora_mlp_kernel: true
#lora_qkv_kernel: true
#lora_o_kernel: true

# === Hyperparameter Configuration ===
#optimizer: apollo_adamw_layerwise
optimizer: paged_ademamix_8bit
# Apollo-mini configuration:
#optim_args: "proj=random,rank=1,scale=128.0,scale_type=tensor,update_proj_gap=200"
# Regular Apollo configuration:
# optim_args: 
#optim_target_modules: all_linear
learning_rate: 1e-5
lr_scheduler: rex
weight_decay: 0.01
warmup_ratio: 0.05


# === Data Configuration ===
shuffle_merged_datasets: true
datasets:
  - path: allura-org/roselily-furryinflation
    type: completion
    field: text
  - path: allura-org/not_gutenberg_json
    type: completion
    field: text
    split: train[:100]
  - path: ToastyPigeon/roselily-v0-expanded-deduped
    type: completion
    field: text
    split: train[:50%]
    data_files:
      - extra-pony-16k-dedup-small.json
    
dataset_prepared_path: last_run_prepared
# chat_template: tokenizer_default
# Example custom template:
# chat_template: jinja
# chat_template_jinja: |
#   {{- bos_token }}{%- for message in messages %}
#   {%- if message['role'] == 'system' %}
#   {{- '[SYSTEM_PROMPT]' + message['content'] + '[/SYSTEM_PROMPT]' }}
#   {%- elif message['role'] == 'user' %}
#   {{- '[INST]' + message['content'] + '[/INST]' }}
#   {%- elif message['role'] == 'assistant' %}
#   {{- message['content'] + eos_token }}
#   {%- endif %}
#   {%- endfor %}

# === Plugins ===
plugins:
  - axolotl.integrations.liger.LigerPlugin
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

# === Hardware Optimization ===
gradient_checkpointing: offload
#gradient_checkpointing_kwargs:
#  use_reentrant: true
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
#liger_fused_linear_cross_entropy: true
unsloth_cross_entropy_loss: true
cut_cross_entropy: true
# Only if using multiple GPUs:
deepspeed: axolotl/deepspeed_configs/zero3_bf16.json

# === Wandb Tracking ===
wandb_project: Gemma
# wandb_entity: [WANDB_ENTITY]
# wandb_name: [WANDB_RUN_NAME]

# === Checkpointing ===
saves_per_epoch: 20
save_total_limit: 1

# === Advanced Settings ===
output_dir: ./ckpts
bf16: auto
flash_attention: true
train_on_inputs: false
group_by_length: false
save_safetensors: true
max_grad_norm: 10.0
logging_steps: 1
gc_steps: 10
seed: 69

gemma-2-24b-retrained-base

This model is a fine-tuned version of Columbidae/gemma-2-24b-pruned on the allura-org/roselily-furryinflation, the allura-org/not_gutenberg_json and the ToastyPigeon/roselily-v0-expanded-deduped datasets.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 3
  • eval_batch_size: 3
  • seed: 69
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 24
  • total_eval_batch_size: 12
  • optimizer: Use OptimizerNames.PAGED_ADEMAMIX_8BIT and the args are: No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 1.0

Training results

Framework versions

  • PEFT 0.14.0
  • Transformers 4.50.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
12
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for ToastyPigeon/gemma-2-24b-retrained-base-adapter

Adapter
(1)
this model

Dataset used to train ToastyPigeon/gemma-2-24b-retrained-base-adapter