TroyDoesAI commited on
Commit
aa704ca
1 Parent(s): 1247e82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -65,39 +65,39 @@ Exciting times ahead as we delve into the MermaidLLama revolution! 🚀
65
 
66
  LoRA Rank
67
  Also called dimension count. Higher values = larger file, more content control. Smaller values = smaller file, less control. Use 4 or 8 for style, 128 or 256 to teach, 1024+ for fine-detail on big data. More VRAM is needed for higher ranks.
68
- 2048
69
 
70
  LoRA Alpha
71
  This divided by the rank becomes the scaling of the LoRA. Higher means stronger. A good standard value is twice your Rank.
72
- 4096
73
 
74
  Batch Size
75
  Global batch size. The two batch sizes together determine gradient accumulation (gradientAccum = batch / microBatch). Higher gradient accum values lead to better quality training.
76
- 1
77
 
78
  Micro Batch Size
79
  Per-device batch size (NOTE: multiple devices not yet implemented). Increasing this will increase VRAM usage.
80
- 1
81
 
82
  Cutoff Length
83
  Cutoff length for text input. Essentially, how long of a line of text to feed in at a time. Higher values require drastically more VRAM.
84
- 4096
85
 
86
  Save every n steps
87
  If above 0, a checkpoint of the LoRA will be saved every time this many steps pass.
88
- 1000
89
 
90
  Epochs
91
  Number of times every entry in the dataset should be fed into training. So 1 means feed each item in once, 5 means feed it in five times, etc.
92
- 3
93
 
94
  Learning Rate
95
  In scientific notation.
96
- 1e-6
97
 
98
  LR Scheduler
99
  Learning rate scheduler - defines how the learning rate changes over time. "Constant" means never change, "linear" means to go in a straight line from the learning rate down to 0, cosine follows a curve, etc.
100
- cosine
101
 
102
 
103
  Target Modules
 
65
 
66
  LoRA Rank
67
  Also called dimension count. Higher values = larger file, more content control. Smaller values = smaller file, less control. Use 4 or 8 for style, 128 or 256 to teach, 1024+ for fine-detail on big data. More VRAM is needed for higher ranks.
68
+ - 2048
69
 
70
  LoRA Alpha
71
  This divided by the rank becomes the scaling of the LoRA. Higher means stronger. A good standard value is twice your Rank.
72
+ - 4096
73
 
74
  Batch Size
75
  Global batch size. The two batch sizes together determine gradient accumulation (gradientAccum = batch / microBatch). Higher gradient accum values lead to better quality training.
76
+ - 1
77
 
78
  Micro Batch Size
79
  Per-device batch size (NOTE: multiple devices not yet implemented). Increasing this will increase VRAM usage.
80
+ - 1
81
 
82
  Cutoff Length
83
  Cutoff length for text input. Essentially, how long of a line of text to feed in at a time. Higher values require drastically more VRAM.
84
+ - 4096
85
 
86
  Save every n steps
87
  If above 0, a checkpoint of the LoRA will be saved every time this many steps pass.
88
+ - 1000
89
 
90
  Epochs
91
  Number of times every entry in the dataset should be fed into training. So 1 means feed each item in once, 5 means feed it in five times, etc.
92
+ - 3
93
 
94
  Learning Rate
95
  In scientific notation.
96
+ - 1e-6
97
 
98
  LR Scheduler
99
  Learning rate scheduler - defines how the learning rate changes over time. "Constant" means never change, "linear" means to go in a straight line from the learning rate down to 0, cosine follows a curve, etc.
100
+ - cosine
101
 
102
 
103
  Target Modules