Phasmid-2_v2 / README.md

Update README.md

0767860 11 months ago

6.21 kB

	---
	inference: false
	license: mit
	base_model: microsoft/phi-2
	tags:
	- axolotl
	- generated_from_trainer
	model-index:
	- name: Phasmid-2_v2
	results: []
	datasets:
	- PygmalionAI/PIPPA
	- HuggingFaceH4/no_robots
	---


	```
	_ (`-. ('-. .-. ('-. .-') _ .-') _ .-') _
	( (OO )( OO ) / ( OO ).-. ( OO ).( '.( OO )_ ( ( OO) )
	_.` \,--. ,--. / . --. /(_)---\_),--. ,--.) ,-.-') \ .'_
	(__...--''\| \| \| \| \| \-. \ / _ \| \| `.' \| \| \|OO),`'--..._)
	\| / \| \|\| .\| \|.-'-' \| \|\ :` `. \| \| \| \| \\| \| \ '
	\| \|_.' \|\| \| \\| \|_.' \| '..`''.)\| \|'.'\| \| \| \|(_/\| \| ' \|
	\| .___.'\| .-. \| \| .-. \|.-._) \\| \| \| \| ,\| \|_.'\| \| / :
	\| \| \| \| \| \| \| \| \| \|\ /\| \| \| \|(_\| \| \| '--' /
	`--' `--' `--' `--' `--' `-----' `--' `--' `--' `-------'
	```

	[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.3.0`
	```yaml
	base_model: microsoft/phi-2
	model_type: PhiForCausalLM
	tokenizer_type: AutoTokenizer
	is_llama_derived_model: false
	trust_remote_code: true

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	datasets:
	- path: SE6446/SE6446_phasmid_ds
	type: completion

	hub_model_id: SE6446/Phasmid-2_v2
	hub_strategy: every_save
	use_auth_token: true
	dataset_prepared_path: /phasmid-2-ds-path
	val_set_size: 0.05
	output_dir: ./phasmid-sft-out

	sequence_len: 2048
	sample_packing: true
	pad_to_sequence_len:

	adapter:
	lora_model_dir:
	lora_r:
	lora_alpha:
	lora_dropout:
	lora_target_linear:
	lora_fan_in_fan_out:

	wandb_project:
	wandb_entity:
	wandb_watch:
	wandb_name:
	wandb_log_model:

	gradient_accumulation_steps: 1
	micro_batch_size: 1
	num_epochs: 4
	optimizer: adamw_torch
	adam_beta2: 0.95
	adam_epsilon: 0.00001
	max_grad_norm: 1.0
	lr_scheduler: cosine
	learning_rate: 0.0003

	train_on_inputs: false
	group_by_length: true
	bf16: true
	fp16: false
	tf32: true

	gradient_checkpointing:
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention:

	warmup_steps: 100
	evals_per_epoch: 4
	saves_per_epoch: 1
	debug:
	deepspeed:
	weight_decay: 0.1
	fsdp:
	fsdp_config:
	resize_token_embeddings_to_32x: true
	special_tokens:
	bos_token: "<\|endoftext\|>"
	eos_token: "<\|endoftext\|>"
	unk_token: "<\|endoftext\|>"
	pad_token: "<\|endoftext\|>"

	```

	</details><br>


	# Phasmid-2_v2

	This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on a mix of no_robots and the PIPPA dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.2924

	## Model description
	Phasmid-2 has been trained on intructional data and thus can perform far better at instruction following than phi-2. However I have not extensively tested the model.
	## Intended uses & limitations
	This model is little more than a side project and I shall treat it as such.
	Phasmid-2 (due to it's size), can still suffer from problematic hallucinations and poor information. No effort was made to reduce potentially toxic responses, as such you should train this model further if you require it to do so.
	## Inference
	Ensure that eniops is installed
	```
	pip install einops
	```

	Phi doesn't like device_map = auto, therefore you should specify as like the following:

	1. FP16 / Flash-Attention / CUDA:
	```python
	model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype="auto", flash_attn=True, flash_rotary=True, fused_dense=True, device_map="cuda", trust_remote_code=True)
	```
	2. FP16 / CUDA:
	```python
	model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype="auto", device_map="cuda", trust_remote_code=True)
	```
	3. FP32 / CUDA:
	```python
	model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype=torch.float32, device_map="cuda", trust_remote_code=True)
	```
	4. FP32 / CPU:
	```python
	model = AutoModelForCausalLM.from_pretrained("SE6446/Phasmid-2_v2", torch_dtype=torch.float32, device_map="cpu", trust_remote_code=True)
	```

	And then use the following snippet
	```python
	tokenizer = AutoTokenizer.from_pretrained("SE6446/Phasmid-2_v2", trust_remote_code=True, torch_dtype="auto")
	inputs = tokenizer('''SYSTEM: You are a helpful assistant. Please answer truthfully and politely. {custom_prompt}\n
	USER: {{userinput}}\n
	ASSISTANT: {{character name if applicable}}:''', return_tensors="pt", return_attention_mask=False)
	outputs = model.generate(**inputs, max_length=200)
	text = tokenizer.batch_decode(outputs)[0]
	print(text)
	```
	it should generate after "ASSISTANT:".

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0003
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-05
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 4

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|
	\| 2.3313 \| 0.0 \| 1 \| 2.1374 \|
	\| 2.5755 \| 0.25 \| 1319 \| 2.5281 \|
	\| 2.4864 \| 0.5 \| 2638 \| 2.5314 \|
	\| 2.0961 \| 0.75 \| 3957 \| 2.4697 \|
	\| 2.6547 \| 1.0 \| 5276 \| 2.4213 \|
	\| 2.1235 \| 1.24 \| 6595 \| 2.3926 \|
	\| 1.8875 \| 1.49 \| 7914 \| 2.3233 \|
	\| 0.9059 \| 1.74 \| 9233 \| 2.2590 \|
	\| 2.2046 \| 1.99 \| 10552 \| 2.1985 \|
	\| 1.1938 \| 2.23 \| 11871 \| 2.2555 \|
	\| 1.1425 \| 2.48 \| 13190 \| 2.2393 \|
	\| 0.6688 \| 2.73 \| 14509 \| 2.2237 \|
	\| 1.1111 \| 2.98 \| 15828 \| 2.2126 \|
	\| 0.651 \| 3.21 \| 17147 \| 2.2859 \|
	\| 0.8669 \| 3.46 \| 18466 \| 2.2914 \|
	\| 0.4149 \| 3.71 \| 19785 \| 2.2924 \|


	### Framework versions

	- Transformers 4.37.0.dev0
	- Pytorch 2.0.1+cu118
	- Datasets 2.16.1
	- Tokenizers 0.15.0