cosmosage_v2 / README.md

Upload folder using huggingface_hub

2f7faaa verified 4 months ago

No virus

4.62 kB

	---
	tags:
	- generated_from_trainer
	model-index:
	- name: workspace/output/cosmosage_qa
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.0`
	```yaml
	base_model: /workspace/output/cosmosage_base/
	model_type: MistralForCausalLM
	tokenizer_type: LlamaTokenizer
	is_mistral_derived_model: true

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: /workspace/input/datasets/qa_tune/arxiv_metadata_qa3.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/arxiv_refined_qa.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/arxiv_summary3.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/cosmology_qa.jsonl
	type: alpaca_chat.load_qa
	- path: /workspace/input/datasets/qa_tune/openhermes2_5.jsonl
	type: sharegpt
	- path: /workspace/input/datasets/qa_tune/cosmology_textbooks_qa.jsonl
	type: alpaca_chat.load_qa
	- path: /workspace/input/datasets/qa_tune/physics_astro_qa.jsonl
	type: alpaca_chat.load_qa

	dataset_prepared_path: /workspace/output/qa_tune_prepared
	val_set_size: 0.001
	output_dir: /workspace/output/cosmosage_qa

	chat_template: inst

	adapter:
	lora_model_dir:

	sequence_len: 4096
	sample_packing: true
	pad_to_sequence_len: true

	lora_r:
	lora_alpha:
	lora_dropout:
	lora_target_modules:
	lora_target_linear:
	lora_fan_in_fan_out:

	seed: 702

	wandb_project:
	wandb_entity:
	wandb_watch:
	wandb_name:
	wandb_log_model:

	gradient_accumulation_steps: 1
	micro_batch_size: 4
	num_epochs: 2.0
	optimizer: adamw_torch
	lr_scheduler: linear
	learning_rate: 0.000002
	max_grad_norm: 3.0

	train_on_inputs: false
	group_by_length: false
	bf16: true
	fp16: false
	tf32: false

	gradient_checkpointing: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 100
	eval_steps: 0.05
	eval_table_size:
	eval_table_max_new_tokens: 128
	saves_per_epoch: 1
	save_total_limit: 2
	debug:
	deepspeed: /workspace/axolotl/deepspeed_configs/zero1.json
	weight_decay:
	fsdp:
	fsdp_config:
	special_tokens:
	bos_token: "<s>"
	eos_token: "</s>"
	unk_token: "<unk>"

	ddp_timeout: 7200000

	```

	</details><br>

	# workspace/output/cosmosage_qa

	This model was trained from scratch on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5673

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-06
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 702
	- distributed_type: multi-GPU
	- num_devices: 4
	- total_train_batch_size: 16
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 1.1004 \| 0.0 \| 1 \| 1.1450 \|
	\| 0.7343 \| 0.1 \| 909 \| 0.7093 \|
	\| 0.697 \| 0.2 \| 1818 \| 0.6630 \|
	\| 0.6386 \| 0.3 \| 2727 \| 0.6380 \|
	\| 0.5687 \| 0.4 \| 3636 \| 0.6212 \|
	\| 0.5857 \| 0.5 \| 4545 \| 0.6083 \|
	\| 0.6161 \| 0.6 \| 5454 \| 0.5986 \|
	\| 0.522 \| 0.7 \| 6363 \| 0.5894 \|
	\| 0.5563 \| 0.8 \| 7272 \| 0.5825 \|
	\| 0.6176 \| 0.9 \| 8181 \| 0.5766 \|
	\| 0.5948 \| 1.0 \| 9090 \| 0.5719 \|
	\| 0.4269 \| 1.08 \| 9999 \| 0.5817 \|
	\| 0.4858 \| 1.18 \| 10908 \| 0.5796 \|
	\| 0.4909 \| 1.28 \| 11817 \| 0.5765 \|
	\| 0.4325 \| 1.38 \| 12726 \| 0.5746 \|
	\| 0.4037 \| 1.48 \| 13635 \| 0.5720 \|
	\| 0.507 \| 1.58 \| 14544 \| 0.5706 \|
	\| 0.4778 \| 1.68 \| 15453 \| 0.5697 \|
	\| 0.4599 \| 1.78 \| 16362 \| 0.5683 \|
	\| 0.4515 \| 1.88 \| 17271 \| 0.5673 \|


	### Framework versions

	- Transformers 4.38.0.dev0
	- Pytorch 2.0.1+cu118
	- Datasets 2.17.0
	- Tokenizers 0.15.0