Built with Axolotl

See axolotl config

axolotl version: 0.4.1

adapter: lora
base_model: fxmarty/really-tiny-falcon-testing
bf16: auto
chat_template: llama3
dataset_prepared_path: null
datasets:
- data_files:
  - ultrafeedback_binarized_cleaned_train_data.json
  ds_type: json
  path: /workspace/input_data/ultrafeedback_binarized_cleaned_train_data.json
  type:
    field_input: source
    field_instruction: prompt
    field_output: prompt_id
    system_format: '{system}'
    system_prompt: ''
debug: null
deepspeed: null
early_stopping_patience: 1
eval_max_new_tokens: 128
eval_steps: 5
eval_table_size: null
flash_attention: false
fp16: null
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 4
gradient_checkpointing: true
group_by_length: false
hours_to_complete: 5
hub_model_id: besimray/miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029
hub_strategy: checkpoint
hub_token: null
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: true
local_rank: null
logging_steps: 1
lora_alpha: 32
lora_dropout: 0.05
lora_fan_in_fan_out: null
lora_model_dir: null
lora_r: 32
lora_target_linear: true
lr_scheduler: cosine
max_steps: 500
micro_batch_size: 2
mlflow_experiment_name: /tmp/ultrafeedback_binarized_cleaned_train_data.json
model_type: AutoModelForCausalLM
num_epochs: 4
optimizer: adamw_bnb_8bit
output_dir: miner_id_24
pad_to_sequence_len: true
resume_from_checkpoint: null
s2_attention: null
sample_packing: false
save_steps: 10
save_strategy: steps
sequence_len: 4096
started_at: '2024-11-08T01:23:49.735921'
strict: false
tf32: false
tokenizer_type: AutoTokenizer
train_on_inputs: false
trust_remote_code: true
val_set_size: 0.05
wandb_entity: besimray24-rayon
wandb_mode: online
wandb_project: Public_TuningSN
wandb_run: miner_id_24
wandb_runid: 3b9799d7-12b3-4975-ab6f-be3bd7705350
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029

This model is a fine-tuned version of fxmarty/really-tiny-falcon-testing on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 11.0185

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • training_steps: 500

Training results

Training Loss Epoch Step Validation Loss
44.3865 0.0000 1 11.0934
44.389 0.0002 5 11.0931
44.3678 0.0005 10 11.0919
44.3563 0.0007 15 11.0899
44.3507 0.0009 20 11.0876
44.3368 0.0011 25 11.0852
44.3137 0.0014 30 11.0826
44.2998 0.0016 35 11.0797
44.3223 0.0018 40 11.0766
44.2864 0.0020 45 11.0724
44.3044 0.0023 50 11.0679
44.2155 0.0025 55 11.0627
44.2277 0.0027 60 11.0574
44.1989 0.0029 65 11.0519
44.1718 0.0032 70 11.0467
44.0926 0.0034 75 11.0422
44.1955 0.0036 80 11.0389
44.1445 0.0038 85 11.0364
44.173 0.0041 90 11.0346
44.119 0.0043 95 11.0335
44.1072 0.0045 100 11.0327
44.0926 0.0047 105 11.0319
44.1488 0.0050 110 11.0308
44.1395 0.0052 115 11.0298
44.1485 0.0054 120 11.0291
44.1329 0.0056 125 11.0284
44.1311 0.0059 130 11.0280
44.1152 0.0061 135 11.0274
44.1053 0.0063 140 11.0267
44.0868 0.0066 145 11.0261
44.1361 0.0068 150 11.0255
44.1322 0.0070 155 11.0251
44.1098 0.0072 160 11.0246
44.1007 0.0075 165 11.0242
44.0995 0.0077 170 11.0239
44.114 0.0079 175 11.0235
44.0501 0.0081 180 11.0231
44.118 0.0084 185 11.0229
44.0967 0.0086 190 11.0228
44.1646 0.0088 195 11.0226
44.0501 0.0090 200 11.0225
44.0946 0.0093 205 11.0222
44.047 0.0095 210 11.0219
44.0339 0.0097 215 11.0216
44.1334 0.0099 220 11.0214
44.1315 0.0102 225 11.0212
44.0785 0.0104 230 11.0210
44.0846 0.0106 235 11.0209
44.0812 0.0108 240 11.0208
44.0548 0.0111 245 11.0205
44.0861 0.0113 250 11.0203
44.1121 0.0115 255 11.0202
44.0327 0.0118 260 11.0201
44.1155 0.0120 265 11.0199
44.0547 0.0122 270 11.0198
44.0969 0.0124 275 11.0198
44.0746 0.0127 280 11.0197
44.0535 0.0129 285 11.0195
44.1303 0.0131 290 11.0194
44.1142 0.0133 295 11.0193
44.0685 0.0136 300 11.0192
44.1292 0.0138 305 11.0191
44.0637 0.0140 310 11.0190
44.1035 0.0142 315 11.0189
44.0725 0.0145 320 11.0189
44.1142 0.0147 325 11.0189
44.005 0.0149 330 11.0189
44.0686 0.0151 335 11.0188
44.059 0.0154 340 11.0187
44.0642 0.0156 345 11.0187
44.0906 0.0158 350 11.0187
44.0386 0.0160 355 11.0186
44.0478 0.0163 360 11.0186
44.144 0.0165 365 11.0185
44.0354 0.0167 370 11.0185
44.0702 0.0169 375 11.0185

Framework versions

  • PEFT 0.13.2
  • Transformers 4.46.1
  • Pytorch 2.3.1+cu121
  • Datasets 3.0.1
  • Tokenizers 0.20.3
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for besimray/miner1_3b9799d7-12b3-4975-ab6f-be3bd7705350_1731029029

Adapter
(129)
this model