--- license: apache-2.0 base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 tags: - generated_from_trainer model-index: - name: data/llama-1B-20240502-0131 results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.0` ```yaml base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0 model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: false strict: false datasets: - path: /data/data/final_set_cleaned/train/ type: sharegpt conversation: chatml - path: /data/data/map_coig_cqia.jsonl type: sharegpt conversation: chatml - path: /data/data/ruozhiba.jsonl type: sharegpt conversation: chatml dataset_prepared_path: last_run_prepared val_set_size: 0 output_dir: ./out sequence_len: 4096 sample_packing: true pad_to_sequence_len: true wandb_project: FFT wandb_entity: wandb_watch: wandb_name: llama3-fft wandb_log_model: gradient_accumulation_steps: 8 micro_batch_size: 4 num_epochs: 2 optimizer: paged_adamw_8bit lr_scheduler: cosine learning_rate: 2e-5 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 100 evals_per_epoch: 0 eval_table_size: saves_per_epoch: 4 save_total_limit: 8 debug: deepspeed: deepspeed/zero2.json weight_decay: 0.0 fsdp: fsdp_config: default_system_message: "You are a helpful assistant." special_tokens: eos_token: "<|im_end|>" pad_token: "<|end_of_text|>" tokens: - "<|im_start|>" - "<|im_end|>" ```

# llama-1.1B-fft This model is a full parameter fine-tuned version of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on the SFT dataset for two epoches. It has a substantial improvement over several benchmark: | Groups |Version|Filter|n-shot|Metric|TinyLlama-1.1B-Chat-v1.0 |Ours |------------------|-------|------|-----:|------|-----:|-----: |mmlu |N/A |none | 0|acc |0.2463|0.3547 | - humanities |N/A |none | 0|acc |0.2493|0.3375 | - other |N/A |none | 0|acc |0.2588|0.3939 | - social_sciences|N/A |none | 0|acc |0.2304|0.3939 | - stem |N/A |none | 0|acc |0.2448|0.3035 |gsm8k | 3|strict-match | 5|exact_match|0.0243|0.1615| strong english chat ability, while still weak on Chinese conversation (may suffer from repeatition). ``` {"instruction": "What are the names of some famous actors that started their careers on Broadway?", "output": "Some famous actors who started their careers on Broadway include:\n\n1. Richard Rodgers\n2. George Abbott\n3. Jerome Robbins\n4. Lin-Manuel Miranda\n5. Stephen Sondheim\n6. Harold Prince\n7. Alfred Uhry\n8. Stephen Schwartz\n9. Joe Mantello\n10. Mike Nichols\n11. Robert Duvall\n12. Susan Sarandon\n13. Martin Katz\n14. Gwyneth Paltrow\n15. Audra McDonald."} {"instruction": "How do I wrap a present neatly?", "output": "There are several ways to wrap a present neatly, depending on the size and type of the present. Here are a few options:\n\n1. Wrap the present in a piece of wrapping paper. This will help to keep the wrapping clean and neat.\n2. Use a piece of tissue paper or a small piece of gift wrap to wrap the present. This will help to keep the wrapping clean and neat.\n3. Use a piece of string or ribbon to tie the present. This will help to keep the wrapping clean and neat.\n4. Use a piece of bubble wrap or packing paper to wrap the present. This will help to keep the wrapping clean and neat.\n5. Use a small piece of gift wrap or a piece of tissue paper to wrap the present. This will help to keep the wrapping clean and neat.\n\nIt's important to keep in mind that the wrapping should be tight and secure, so that the gift is not damaged during transportation."} {"instruction": "我正在设计一款专为老年人设计的人工智能手机,你能帮我写个宣传文案吗?", "output": "设计了一款专为老年人设计的人工智能手机,现在正在积极推广!\n它的功能齐全,智能操控能力极强,无论是拍照、观看视频,还是拍摄照片,都无所疑。\n此前,我们在欧洲设计的一款人工智能手机,很好的满足了老年人的需求,使用起来非常方便。\n此前,我们在美国设计的一款人工智能手机,也很好的满足了老年人的需求,使用起来非常方便。\n现在,我们在世界各地设计的一款人工智能手机,都很好的满足了老年人的需求,使用起来非常方便。\n谁都不愿意老退化,老年人不受孤僻和孤立的痛苦,他们只想要生活在一个人情味的世界里。\n老年人不受孤僻和孤立的痛苦,他们只想要生活在一个人情味的世界里。\n老年人不受孤僻和孤立的痛苦,他们只想要生活在一个人情味的世界里。\n我们欢迎您使用我们发布的人工智能手机,不受孤僻和孤立的痛苦,您的生活将更加美好!"} ``` ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - distributed_type: multi-GPU - num_devices: 2 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 2 ### Training results ### Framework versions - Transformers 4.40.1 - Pytorch 2.0.1+cu118 - Datasets 2.15.0 - Tokenizers 0.19.1