charlesdedampierre
/

TopicNeuralHermes-2.5-Mistral-7B

Text Generation

Model card Files Files and versions Community

charlesdedampierre commited on Jan 13, 2024

Commit

6eaddc4

·

verified ·

1 Parent(s): 38106d1

Update README.md

Files changed (1) hide show

README.md +23 -23

README.md CHANGED Viewed

@@ -95,26 +95,26 @@ print(sequences[0]['generated_text'])
 ## Training hyperparameters
-LoRA:
-r=16
-lora_alpha=16
-lora_dropout=0.05
-bias="none"
-task_type="CAUSAL_LM"
-target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
-Training arguments:
-per_device_train_batch_size=4
-gradient_accumulation_steps=4
-gradient_checkpointing=True
-learning_rate=5e-5
-lr_scheduler_type="cosine"
-max_steps=200
-optim="paged_adamw_32bit"
-warmup_steps=100
-DPOTrainer:
-beta=0.1
-max_prompt_length=1024
-max_length=1536

 ## Training hyperparameters
+**LoRA**:
+* r=16
+* lora_alpha=16
+* lora_dropout=0.05
+* bias="none"
+* task_type="CAUSAL_LM"
+* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
+**Training arguments**:
+* per_device_train_batch_size=4
+* gradient_accumulation_steps=4
+* gradient_checkpointing=True
+* learning_rate=5e-5
+* lr_scheduler_type="cosine"
+* max_steps=200
+* optim="paged_adamw_32bit"
+* warmup_steps=100
+**DPOTrainer**:
+* beta=0.1
+* max_prompt_length=1024
+* max_length=1536