Tips on loading model with low memory

#3
by ryanramos - opened

Was just wondering if anyone's been able to load this model in something akin to a free Colab runtime i.e. ~12GB RAM, Tesla T4? I've tried the code snippet for loading the model in 8 bit precision (so I've got the device_map set to "auto") and have no luck. Luckily for me the model is already sharded (I can't normally load an 11B T5 without sharding) but I'm guessing I still can't handle the current shard size.

BigScience Workshop org

If it's just inference, something like https://huggingface.co/bigscience/bloomz/discussions/28 may work!

Thanks! I actually completely forgot about Petals. Might even use this for a different research project; thanks again!

BigScience Workshop org

👍 cc @borzunov

Sign up or log in to comment