metadata

license: other
library_name: peft
tags:
  - axolotl
  - generated_from_trainer
base_model: google/gemma-2b
model-index:
  - name: gemma_odia_2b
    results: []

See axolotl config

axolotl version: 0.4.0

# use google/gemma-7b if you have access
base_model: google/gemma-2b
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

# huggingface repo
datasets:
  - path: OdiaGenAIdata/culturax-odia
    type: completion
val_set_size: 0.1
output_dir: ./gemma-odia-2b-pretrain
hub_model_id: sam2ai/gemma_odia_2b

adapter: qlora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

wandb_project: gemma-completion-2b-odia
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:


gradient_accumulation_steps: 3
micro_batch_size: 2
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_ratio: 0.1
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

gemma_odia_2b

This model is a fine-tuned version of google/gemma-2b on the None dataset. It achieves the following results on the evaluation set:

Loss: 13.3986

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 3
total_train_batch_size: 48
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 87
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
48.3127	0.0	1	48.2905
21.4891	0.25	449	21.4957
25.8116	0.5	898	26.0510
25.3858	0.75	1347	25.6013
16.9215	1.0	1796	16.9936
16.7894	1.24	2245	16.7975
16.8564	1.49	2694	17.0068
16.8912	1.74	3143	17.0482
16.9407	1.99	3592	17.0556
16.7487	2.22	4041	16.8123
17.7797	2.47	4490	18.1220
14.0039	2.72	4939	14.0630
14.7386	2.97	5388	14.7828
14.9965	3.21	5837	15.2212
15.1822	3.46	6286	15.6448
14.1876	3.71	6735	14.5398
16.6416	3.96	7184	16.9006
17.0568	4.19	7633	17.1808
17.4472	4.44	8082	17.5766
17.4219	4.69	8531	17.5393
17.3064	4.94	8980	17.5467
17.2741	5.18	9429	17.5657
16.9905	5.43	9878	17.3912
16.642	5.68	10327	17.1920
16.6345	5.93	10776	17.1085
15.5702	6.16	11225	16.0494
15.3421	6.41	11674	15.9889
13.1025	6.66	12123	13.1419
13.1904	6.91	12572	13.2151
13.261	7.15	13021	13.3119
13.2333	7.4	13470	13.3195
13.2705	7.65	13919	13.3380
13.3417	7.9	14368	13.3804
13.3553	8.13	14817	13.3902
13.4078	8.38	15266	13.4614
13.394	8.63	15715	13.4338
13.3754	8.88	16164	13.4149
13.3487	9.12	16613	13.4044
13.3807	9.37	17062	13.3903
13.3766	9.62	17511	13.3986

Framework versions

PEFT 0.9.0
Transformers 4.40.0.dev0
Pytorch 2.4.0.dev20240326+rocm6.0
Datasets 2.18.0
Tokenizers 0.15.0