Text Generation
Transformers
PyTorch
English
llama
Inference Endpoints
text-generation-inference
winglian commited on
Commit
d0579e3
1 Parent(s): eb518d8

update README and add config file

Browse files
Files changed (2) hide show
  1. README.md +5 -1
  2. configs/wizard-mega-13b.yml +66 -0
README.md CHANGED
@@ -9,6 +9,10 @@ library_name: transformers
9
  pipeline_tag: text-generation
10
  ---
11
 
12
- # Wizard Mega 13B
13
 
14
  Wizard Mega is a Llama 13B model fine-tuned on the ShareGPT, WizardLM, and Wizard-Vicuna datasets. These particular datasets have all been filtered to remove responses where the model responds with "As an AI language model...", etc or when the model refuses to respond.
 
 
 
 
9
  pipeline_tag: text-generation
10
  ---
11
 
12
+ # Wizard Mega 13B - Pre-Release (Epoch One)
13
 
14
  Wizard Mega is a Llama 13B model fine-tuned on the ShareGPT, WizardLM, and Wizard-Vicuna datasets. These particular datasets have all been filtered to remove responses where the model responds with "As an AI language model...", etc or when the model refuses to respond.
15
+
16
+ # Build
17
+
18
+ Wizard Mega was built with [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) on 8xA100 80GB for 15 hours. The configuration to duplicate this build is provided in this repo's [/config folder](https://huggingface.co/openaccess-ai-collective/wizard-mega-13b/tree/main/configs).
configs/wizard-mega-13b.yml ADDED
@@ -0,0 +1,66 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ base_model: huggyllama/llama-13b
2
+ base_model_config: huggyllama/llama-13b
3
+ model_type: LlamaForCausalLM
4
+ tokenizer_type: LlamaTokenizer
5
+ load_in_8bit: false
6
+ datasets:
7
+ - path: anon8231489123/ShareGPT_Vicuna_unfiltered
8
+ data_files: ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json
9
+ type: sharegpt
10
+ - path: ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered
11
+ type: alpaca
12
+ - path: ehartford/wizard_vicuna_70k_unfiltered
13
+ type: sharegpt
14
+ dataset_prepared_path: last_run_prepared
15
+ val_set_size: 0.02
16
+ adapter:
17
+ lora_model_dir:
18
+ sequence_len: 2048
19
+ max_packed_sequence_len: 2048
20
+ lora_r: 8
21
+ lora_alpha: 16
22
+ lora_dropout: 0.05
23
+ lora_target_modules:
24
+ - q_proj
25
+ - v_proj
26
+ lora_fan_in_fan_out: false
27
+ wandb_project: wizard-mega-13b
28
+ wandb_watch:
29
+ wandb_run_id:
30
+ wandb_log_model:
31
+ output_dir: ./wizard-mega-13b
32
+ batch_size: 512
33
+ micro_batch_size: 8
34
+ num_epochs: 3
35
+ optimizer:
36
+ torchdistx_path:
37
+ lr_scheduler:
38
+ learning_rate: 0.00006
39
+ train_on_inputs: false
40
+ group_by_length: false
41
+ bf16: true
42
+ tf32: true
43
+ gradient_checkpointing: true
44
+ early_stopping_patience:
45
+ resume_from_checkpoint:
46
+ local_rank:
47
+ logging_steps: 1
48
+ xformers_attention: true
49
+ flash_attention:
50
+ gptq_groupsize:
51
+ gptq_model_v1:
52
+ warmup_steps: 20
53
+ eval_steps: 10
54
+ save_steps:
55
+ debug:
56
+ deepspeed:
57
+ weight_decay: 0
58
+ fsdp:
59
+ - full_shard
60
+ - auto_wrap
61
+ fsdp_config:
62
+ fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
63
+ tokens:
64
+ bos_token: "<s>"
65
+ eos_token: "</s>"
66
+ unk_token: "<unk>"