metadata

license: apache-2.0
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
  - generated_from_trainer
model-index:
  - name: data/llama-1B-20240502-0131
    results: []

See axolotl config

axolotl version: 0.4.0

base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: /data/data/final_set_cleaned/train/
    type: sharegpt
    conversation: chatml
  - path: /data/data/map_coig_cqia.jsonl
    type: sharegpt
    conversation: chatml
  - path: /data/data/ruozhiba.jsonl
    type: sharegpt
    conversation: chatml
dataset_prepared_path: last_run_prepared
val_set_size: 0
output_dir: ./out

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

wandb_project: FFT
wandb_entity:
wandb_watch:
wandb_name: llama3-fft
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 4
num_epochs: 2
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 2e-5

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: false
early_stopping_patience:
resume_from_checkpoint:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 100
evals_per_epoch: 0
eval_table_size:
saves_per_epoch: 4
save_total_limit: 8
debug:
deepspeed: deepspeed/zero2.json
weight_decay: 0.0
fsdp:
fsdp_config:
default_system_message: "You are a helpful assistant."
special_tokens:
  eos_token: "<|im_end|>"
  pad_token: "<|end_of_text|>"
tokens:
  - "<|im_start|>"
  - "<|im_end|>"

llama-1.1B-fft

This model is a full parameter fine-tuned version of TinyLlama/TinyLlama-1.1B-Chat-v1.0 on the SFT dataset for two epoches.

It has a substantial improvement over several benchmark:

Groups	Version	Filter	n-shot	Metric	TinyLlama-1.1B-Chat-v1.0	Ours
mmlu	N/A	none	0	acc	0.2463	0.3547
- humanities	N/A	none	0	acc	0.2493	0.3375
- other	N/A	none	0	acc	0.2588	0.3939
- social_sciences	N/A	none	0	acc	0.2304	0.3939
- stem	N/A	none	0	acc	0.2448	0.3035
gsm8k	3	strict-match	5	exact_match	0.0243	0.1615

strong english chat ability, while still weak on Chinese conversation (may suffer from repeatition).

{"instruction": "What are the names of some famous actors that started their careers on Broadway?", "output": "Some famous actors who started their careers on Broadway include:\n\n1. Richard Rodgers\n2. George Abbott\n3. Jerome Robbins\n4. Lin-Manuel Miranda\n5. Stephen Sondheim\n6. Harold Prince\n7. Alfred Uhry\n8. Stephen Schwartz\n9. Joe Mantello\n10. Mike Nichols\n11. Robert Duvall\n12. Susan Sarandon\n13. Martin Katz\n14. Gwyneth Paltrow\n15. Audra McDonald."} 
{"instruction": "How do I wrap a present neatly?", "output": "There are several ways to wrap a present neatly, depending on the size and type of the present. Here are a few options:\n\n1. Wrap the present in a piece of wrapping paper. This will help to keep the wrapping clean and neat.\n2. Use a piece of tissue paper or a small piece of gift wrap to wrap the present. This will help to keep the wrapping clean and neat.\n3. Use a piece of string or ribbon to tie the present. This will help to keep the wrapping clean and neat.\n4. Use a piece of bubble wrap or packing paper to wrap the present. This will help to keep the wrapping clean and neat.\n5. Use a small piece of gift wrap or a piece of tissue paper to wrap the present. This will help to keep the wrapping clean and neat.\n\nIt's important to keep in mind that the wrapping should be tight and secure, so that the gift is not damaged during transportation."}
{"instruction": "我正在设计一款专为老年人设计的人工智能手机，你能帮我写个宣传文案吗？", "output": "设计了一款专为老年人设计的人工智能手机，现在正在积极推广！\n它的功能齐全，智能操控能力极强，无论是拍照、观看视频，还是拍摄照片，都无所疑。\n此前，我们在欧洲设计的一款人工智能手机，很好的满足了老年人的需求，使用起来非常方便。\n此前，我们在美国设计的一款人工智能手机，也很好的满足了老年人的需求，使用起来非常方便。\n现在，我们在世界各地设计的一款人工智能手机，都很好的满足了老年人的需求，使用起来非常方便。\n谁都不愿意老退化，老年人不受孤僻和孤立的痛苦，他们只想要生活在一个人情味的世界里。\n老年人不受孤僻和孤立的痛苦，他们只想要生活在一个人情味的世界里。\n老年人不受孤僻和孤立的痛苦，他们只想要生活在一个人情味的世界里。\n我们欢迎您使用我们发布的人工智能手机，不受孤僻和孤立的痛苦，您的生活将更加美好！"}

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 8
total_train_batch_size: 64
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 2

Training results

Framework versions

Transformers 4.40.1
Pytorch 2.0.1+cu118
Datasets 2.15.0
Tokenizers 0.19.1