great work!

#1
by MaziyarPanahi - opened

This is a great work! Given the limited number of A100/80G and only by running it for 100 minutes makes it very interesting! Just out of curiosity, did you use accelerate to launch axolotl and load the model on each GPU or you used python to launch it and shard the model on all GPUs? (I can't find a way to use your config and not get OOM on my 4 A100/80G)

Sign up or log in to comment