llama-1.1B-fft / README.md

lu-vae

Upload folder using huggingface_hub

82e37ff verified 3 months ago

preview code

raw

history blame contribute delete

No virus

6.32 kB

	---
	license: apache-2.0
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	tags:
	- generated_from_trainer
	model-index:
	- name: data/llama-1B-20240502-0131
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
	model_type: LlamaForCausalLM
	tokenizer_type: AutoTokenizer

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: /data/data/final_set_cleaned/train/
	type: sharegpt
	conversation: chatml
	- path: /data/data/map_coig_cqia.jsonl
	type: sharegpt
	conversation: chatml
	- path: /data/data/ruozhiba.jsonl
	type: sharegpt
	conversation: chatml
	dataset_prepared_path: last_run_prepared
	val_set_size: 0
	output_dir: ./out

	sequence_len: 4096
	sample_packing: true
	pad_to_sequence_len: true

	wandb_project: FFT
	wandb_entity:
	wandb_watch:
	wandb_name: llama3-fft
	wandb_log_model:

	gradient_accumulation_steps: 8
	micro_batch_size: 4
	num_epochs: 2
	optimizer: paged_adamw_8bit
	lr_scheduler: cosine
	learning_rate: 2e-5

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: false

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	early_stopping_patience:
	resume_from_checkpoint:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 100
	evals_per_epoch: 0
	eval_table_size:
	saves_per_epoch: 4
	save_total_limit: 8
	debug:
	deepspeed: deepspeed/zero2.json
	weight_decay: 0.0
	fsdp:
	fsdp_config:
	default_system_message: "You are a helpful assistant."
	special_tokens:
	eos_token: "<\|im_end\|>"
	pad_token: "<\|end_of_text\|>"
	tokens:
	- "<\|im_start\|>"
	- "<\|im_end\|>"

	```

	</details><br>

	# llama-1.1B-fft

	This model is a full parameter fine-tuned version of [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) on the SFT dataset for two epoches.

	It has a substantial improvement over several benchmark:

	\| Groups \|Version\|Filter\|n-shot\|Metric\|TinyLlama-1.1B-Chat-v1.0 \|Ours
	\|------------------\|-------\|------\|-----:\|------\|-----:\|-----:
	\|mmlu \|N/A \|none \| 0\|acc \|0.2463\|0.3547
	\| - humanities \|N/A \|none \| 0\|acc \|0.2493\|0.3375
	\| - other \|N/A \|none \| 0\|acc \|0.2588\|0.3939
	\| - social_sciences\|N/A \|none \| 0\|acc \|0.2304\|0.3939
	\| - stem \|N/A \|none \| 0\|acc \|0.2448\|0.3035
	\|gsm8k \| 3\|strict-match \| 5\|exact_match\|0.0243\|0.1615\|

	strong english chat ability, while still weak on Chinese conversation (may suffer from repeatition).

	```
	{"instruction": "What are the names of some famous actors that started their careers on Broadway?", "output": "Some famous actors who started their careers on Broadway include:\n\n1. Richard Rodgers\n2. George Abbott\n3. Jerome Robbins\n4. Lin-Manuel Miranda\n5. Stephen Sondheim\n6. Harold Prince\n7. Alfred Uhry\n8. Stephen Schwartz\n9. Joe Mantello\n10. Mike Nichols\n11. Robert Duvall\n12. Susan Sarandon\n13. Martin Katz\n14. Gwyneth Paltrow\n15. Audra McDonald."}
	{"instruction": "How do I wrap a present neatly?", "output": "There are several ways to wrap a present neatly, depending on the size and type of the present. Here are a few options:\n\n1. Wrap the present in a piece of wrapping paper. This will help to keep the wrapping clean and neat.\n2. Use a piece of tissue paper or a small piece of gift wrap to wrap the present. This will help to keep the wrapping clean and neat.\n3. Use a piece of string or ribbon to tie the present. This will help to keep the wrapping clean and neat.\n4. Use a piece of bubble wrap or packing paper to wrap the present. This will help to keep the wrapping clean and neat.\n5. Use a small piece of gift wrap or a piece of tissue paper to wrap the present. This will help to keep the wrapping clean and neat.\n\nIt's important to keep in mind that the wrapping should be tight and secure, so that the gift is not damaged during transportation."}
	{"instruction": "我正在设计一款专为老年人设计的人工智能手机，你能帮我写个宣传文案吗？", "output": "设计了一款专为老年人设计的人工智能手机，现在正在积极推广！\n它的功能齐全，智能操控能力极强，无论是拍照、观看视频，还是拍摄照片，都无所疑。\n此前，我们在欧洲设计的一款人工智能手机，很好的满足了老年人的需求，使用起来非常方便。\n此前，我们在美国设计的一款人工智能手机，也很好的满足了老年人的需求，使用起来非常方便。\n现在，我们在世界各地设计的一款人工智能手机，都很好的满足了老年人的需求，使用起来非常方便。\n谁都不愿意老退化，老年人不受孤僻和孤立的痛苦，他们只想要生活在一个人情味的世界里。\n老年人不受孤僻和孤立的痛苦，他们只想要生活在一个人情味的世界里。\n老年人不受孤僻和孤立的痛苦，他们只想要生活在一个人情味的世界里。\n我们欢迎您使用我们发布的人工智能手机，不受孤僻和孤立的痛苦，您的生活将更加美好！"}
	```


	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- total_eval_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2

	### Training results



	### Framework versions

	- Transformers 4.40.1
	- Pytorch 2.0.1+cu118
	- Datasets 2.15.0
	- Tokenizers 0.19.1