openaccess-ai-collective
/

wizard-mega-13b

Text Generation

text-generation-inference

Model card Files Files and versions Community

winglian commited on May 15, 2023

Commit

d0579e3

·

1 Parent(s): eb518d8

update README and add config file

Files changed (2) hide show

README.md +5 -1
configs/wizard-mega-13b.yml +66 -0

README.md CHANGED Viewed

@@ -9,6 +9,10 @@ library_name: transformers
 pipeline_tag: text-generation
 ---
-# Wizard Mega 13B
 Wizard Mega is a Llama 13B model fine-tuned on the ShareGPT, WizardLM, and Wizard-Vicuna datasets. These particular datasets have all been filtered to remove responses where the model responds with "As an AI language model...", etc or when the model refuses to respond.

 pipeline_tag: text-generation
 ---
+# Wizard Mega 13B - Pre-Release (Epoch One)
 Wizard Mega is a Llama 13B model fine-tuned on the ShareGPT, WizardLM, and Wizard-Vicuna datasets. These particular datasets have all been filtered to remove responses where the model responds with "As an AI language model...", etc or when the model refuses to respond.
+# Build
+Wizard Mega was built with [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) on 8xA100 80GB for 15 hours. The configuration to duplicate this build is provided in this repo's [/config folder](https://huggingface.co/openaccess-ai-collective/wizard-mega-13b/tree/main/configs).

configs/wizard-mega-13b.yml ADDED Viewed

	@@ -0,0 +1,66 @@

+base_model: huggyllama/llama-13b
+base_model_config: huggyllama/llama-13b
+model_type: LlamaForCausalLM
+tokenizer_type: LlamaTokenizer
+load_in_8bit: false
+datasets:
+  - path: anon8231489123/ShareGPT_Vicuna_unfiltered
+    data_files: ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json
+    type: sharegpt
+  - path: ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
+    type: alpaca
+  - path: ehartford/wizard_vicuna_70k_unfiltered
+    type: sharegpt
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.02
+adapter:
+lora_model_dir:
+sequence_len: 2048
+max_packed_sequence_len: 2048
+lora_r: 8
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_modules:
+  - q_proj
+  - v_proj
+lora_fan_in_fan_out: false
+wandb_project: wizard-mega-13b
+wandb_watch:
+wandb_run_id:
+wandb_log_model:
+output_dir: ./wizard-mega-13b
+batch_size: 512
+micro_batch_size: 8
+num_epochs: 3
+optimizer:
+torchdistx_path:
+lr_scheduler:
+learning_rate: 0.00006
+train_on_inputs: false
+group_by_length: false
+bf16: true
+tf32: true
+gradient_checkpointing: true
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 1
+xformers_attention: true
+flash_attention:
+gptq_groupsize:
+gptq_model_v1:
+warmup_steps: 20
+eval_steps: 10
+save_steps:
+debug:
+deepspeed:
+weight_decay: 0
+fsdp:
+  - full_shard
+  - auto_wrap
+fsdp_config:
+  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
+tokens:
+  bos_token: "<s>"
+  eos_token: "</s>"
+  unk_token: "<unk>"