See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: fxmarty/really-tiny-falcon-testing
bf16: auto
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - ultrafeedback_binarized_cleaned_train_data.json
  ds_type: json
  path: /workspace/input_data/ultrafeedback_binarized_cleaned_train_data.json
  type:
    field_input: source
    field_instruction: prompt
    field_output: prompt_id
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 1
eval_max_new_tokens: 128
eval_steps: 5
eval_table_size: null
flash_attention: false
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hours_to_complete: 5
hub_model_id: besimray/miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: true
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 32
lora_target_linear: true
lr_scheduler: cosine
max_steps: 500
micro_batch_size: 2
mlflow_experiment_name: /tmp/ultrafeedback_binarized_cleaned_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 4
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 10
save_strategy: steps
sequence_len: 4096
started_at: '2024-11-08T01:23:49.735921'
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.05
wandb_entity: besimray24-rayon
wandb_mode: online
wandb_project: Public_TuningSN
wandb_run: miner_id_24
wandb_runid: 3b9799d7-12b3-4975-ab6f-be3bd7705350
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029

This model is a fine-tuned version of fxmarty/really-tiny-falcon-testing on the None dataset. It achieves the following results on the evaluation set:

Loss: 11.0185

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
training_steps: 500

Training results

Training Loss	Epoch	Step	Validation Loss
44.3865	0.0000	1	11.0934
44.389	0.0002	5	11.0931
44.3678	0.0005	10	11.0919
44.3563	0.0007	15	11.0899
44.3507	0.0009	20	11.0876
44.3368	0.0011	25	11.0852
44.3137	0.0014	30	11.0826
44.2998	0.0016	35	11.0797
44.3223	0.0018	40	11.0766
44.2864	0.0020	45	11.0724
44.3044	0.0023	50	11.0679
44.2155	0.0025	55	11.0627
44.2277	0.0027	60	11.0574
44.1989	0.0029	65	11.0519
44.1718	0.0032	70	11.0467
44.0926	0.0034	75	11.0422
44.1955	0.0036	80	11.0389
44.1445	0.0038	85	11.0364
44.173	0.0041	90	11.0346
44.119	0.0043	95	11.0335
44.1072	0.0045	100	11.0327
44.0926	0.0047	105	11.0319
44.1488	0.0050	110	11.0308
44.1395	0.0052	115	11.0298
44.1485	0.0054	120	11.0291
44.1329	0.0056	125	11.0284
44.1311	0.0059	130	11.0280
44.1152	0.0061	135	11.0274
44.1053	0.0063	140	11.0267
44.0868	0.0066	145	11.0261
44.1361	0.0068	150	11.0255
44.1322	0.0070	155	11.0251
44.1098	0.0072	160	11.0246
44.1007	0.0075	165	11.0242
44.0995	0.0077	170	11.0239
44.114	0.0079	175	11.0235
44.0501	0.0081	180	11.0231
44.118	0.0084	185	11.0229
44.0967	0.0086	190	11.0228
44.1646	0.0088	195	11.0226
44.0501	0.0090	200	11.0225
44.0946	0.0093	205	11.0222
44.047	0.0095	210	11.0219
44.0339	0.0097	215	11.0216
44.1334	0.0099	220	11.0214
44.1315	0.0102	225	11.0212
44.0785	0.0104	230	11.0210
44.0846	0.0106	235	11.0209
44.0812	0.0108	240	11.0208
44.0548	0.0111	245	11.0205
44.0861	0.0113	250	11.0203
44.1121	0.0115	255	11.0202
44.0327	0.0118	260	11.0201
44.1155	0.0120	265	11.0199
44.0547	0.0122	270	11.0198
44.0969	0.0124	275	11.0198
44.0746	0.0127	280	11.0197
44.0535	0.0129	285	11.0195
44.1303	0.0131	290	11.0194
44.1142	0.0133	295	11.0193
44.0685	0.0136	300	11.0192
44.1292	0.0138	305	11.0191
44.0637	0.0140	310	11.0190
44.1035	0.0142	315	11.0189
44.0725	0.0145	320	11.0189
44.1142	0.0147	325	11.0189
44.005	0.0149	330	11.0189
44.0686	0.0151	335	11.0188
44.059	0.0154	340	11.0187
44.0642	0.0156	345	11.0187
44.0906	0.0158	350	11.0187
44.0386	0.0160	355	11.0186
44.0478	0.0163	360	11.0186
44.144	0.0165	365	11.0185
44.0354	0.0167	370	11.0185
44.0702	0.0169	375	11.0185

Framework versions

PEFT 0.13.2
Transformers 4.46.1
Pytorch 2.3.1+cu121
Datasets 3.0.1
Tokenizers 0.20.3

besimray
/

miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029

miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for besimray/miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029

Evaluation results