lsmille
/

lora_evo_ta_all_layers_18_attention_layers

Generated from Trainer

Model card Files Files and versions Community

lsmille commited on Jun 5

Commit

fe208d9

•

1 Parent(s): ad2b85d

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -20,7 +20,27 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations

 ## Model description
+Trained on single ID token "5K dataset" filtered to 4k sequences (20% for test data)
+lora_alpha = 64 <--------------
+lora_dropout = 0.1
+lora_r = 64 <---------
+epochs = 3
+learning rate = 3e-4
+warmup_steps=500
+gradient_accumulation_steps = 1
+train_batch = 1
+eval_batch = 1
+ONLY ATTENTION LAYER <---------------------
 ## Intended uses & limitations