great work!
#1
by
MaziyarPanahi
- opened
This is a great work! Given the limited number of A100/80G and only by running it for 100 minutes makes it very interesting! Just out of curiosity, did you use accelerate
to launch axolotl and load the model on each GPU or you used python
to launch it and shard the model on all GPUs? (I can't find a way to use your config and not get OOM on my 4 A100/80G)