Mistral-Adastra-IA3 / README.md
Astris's picture
Update README.md
fd7abf5
metadata
license: unknown
Batch Size: 1
Gradient Accumulation Steps: 8
Cutoff Length: 512
Epochs: 1
Learning rate: 0.01
Optimizer: Adafactor
LR Scheduler: Linear
Warmup Steps: 64
Projections: Q, K, V, O, up, gate, down

More of a proof of concept. Temper your expectations.

Warning: May generate 18+ content. Trained on the dialogue from Adastra (the furry visual novel) in text-generation-webui (by oobabooga), with a modified Training_PRO extension. Model was loaded unquantized (BF16). Currently, loading IA3's works in unmodified textgen-webui (I think... Load it as you would a LoRA) only if you load the model unquantized, and not in 4 or 8 bit.

Other Training parameters:

Add overlapping blocks: On

DEMENTOR (long form learning by FP): On //This might be the secret sauce that makes this IA3 so effective with just 1 epoch of training.

Extra:

Training took 9 minutes on an RTX 3090

If you are the creator of Adastra and would like this taken down, please contact me.

I do not claim to have produced the training data that went into this finetune.