This is the Instruction Fine Tuned version of Tiny Llama on @Teknium1's openhermes dataset.

"The TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. With some proper optimization, we can achieve this within a span of "just" 90 days using 16 A100-40G GPUs ๐Ÿš€๐Ÿš€. The training has started on 2023-09-01."

See axolotl config

axolotl version: 0.3.0

base_model: ./TinyLlama-1.1B-intermediate-step-1195k-token-2.5T

model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: ./openhermes
    type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./out

sequence_len: 4096
sample_packing: false

adapter: 
lora_model_dir:
lora_r: 
lora_alpha: 
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:

wandb_project: tinyllama-openhermes
wandb_entity: tensoic
wandb_watch:
wandb_name: 
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 8
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention: true
flash_attention:

warmup_steps: 100
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: zero2.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

image/png

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 128
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 1
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
2.9371 0.0 1 1.6734
0.7951 0.25 451 1.4276
0.6622 0.5 902 1.3929
0.6669 0.75 1353 1.3425

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
22
Safetensors
Model size
1.1B params
Tensor type
FP16
ยท
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Tensoic/TinyLlama-1.1B-2.5T-openhermes