jdpressman
/

BigVAE-Mistral-7B-v0.1

Model card Files Files and versions Community

jdpressman commited on Oct 3, 2023

Commit

8d2472d

·

1 Parent(s): fc73b33

Add training details

Files changed (1) hide show

README.md +11 -0

README.md CHANGED Viewed

@@ -98,6 +98,17 @@ autoregressive language models and be useful to alignment and interpretability r
 ## Training procedure
 The following `bitsandbytes` quantization config was used during training:
 - quant_method: bitsandbytes

 ## Training procedure
+This model was trained on [a 1 billion token sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample) of RedPajama
+on 8x H100 GPUs for roughly 24 hours.
+Using the scripts in the MiniHF repo as they exist now the training commands were:
+accelerate launch train_vae_overlap.py --model "mistralai/Mistral-7B-v0.1"
+--preprocessed preprocessed_mistral --context 64 --output vae_64_overlap_mistral --batch-size 24
+accelerate launch train_vae_router.py --model "mistralai/Mistral-7B-v0.1"
+--preprocessed preprocessed_mistral --vae-context 64 --start-from vae_64_overlap_mistral
+--output vae_64_overlap_router_mistral --lr 1e-4 --batch-size 1
 The following `bitsandbytes` quantization config was used during training:
 - quant_method: bitsandbytes