jdpressman commited on
Commit
8d2472d
1 Parent(s): fc73b33

Add training details

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md CHANGED
@@ -98,6 +98,17 @@ autoregressive language models and be useful to alignment and interpretability r
98
 
99
  ## Training procedure
100
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
  The following `bitsandbytes` quantization config was used during training:
103
  - quant_method: bitsandbytes
 
98
 
99
  ## Training procedure
100
 
101
+ This model was trained on [a 1 billion token sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample) of RedPajama
102
+ on 8x H100 GPUs for roughly 24 hours.
103
+
104
+ Using the scripts in the MiniHF repo as they exist now the training commands were:
105
+
106
+ accelerate launch train_vae_overlap.py --model "mistralai/Mistral-7B-v0.1"
107
+ --preprocessed preprocessed_mistral --context 64 --output vae_64_overlap_mistral --batch-size 24
108
+
109
+ accelerate launch train_vae_router.py --model "mistralai/Mistral-7B-v0.1"
110
+ --preprocessed preprocessed_mistral --vae-context 64 --start-from vae_64_overlap_mistral
111
+ --output vae_64_overlap_router_mistral --lr 1e-4 --batch-size 1
112
 
113
  The following `bitsandbytes` quantization config was used during training:
114
  - quant_method: bitsandbytes