ProphetOfBostrom's picture
Create MISLEAD.md
7d0fc1f verified

It took me all day to figure this out. It turns out that while HQQ will go ahead and fill 180GB of memory to do this - there's absolutely no reason for it! I did this from a slow**, 200 GB swap partition. On the off chance someone at Mobius see this - please don't ask transformers to load a 45B param model on to the CPU if you're not actually going to... call the model at all? It took ten minutes at SATA 2 speeds - and that was because it was padded to FP32 (CPU mode, right?).

45 Gigaweights * 2 Bytes per weight * fp32/bf16 = 180 GB of system memory allocated.

I wish I had that.

**May have been zswap's fault. I'm pretty sure 200MB/s and an idle CPU isn't the best you can hope for when you're doing sequential reads from a 4.0x4 NVME device? My GPU fell asleep between optimization passes. It even has a Gamer LED on it. I'll fix my sysctl next time.

  • Try $ python -i untitled.py

having saved that script from the mobius hf repo because you'll be spending a while in IDLE figuring out

  • >>> model.save_quantized("/absolute/path/noromaid")

at the end and trust me, quantizing something chunky and then watching python shred it because the save directory is somehow a recursive lambda function and not a string is heartbreaking. I don't know if it was supposed to emit more than the model.pt and the config.json but I'm taking what I can get.

If anyone's looking to donate I could do with an Epyc Rome and perhaps another pair of H100s? I've embedded my XMR address in attention tensors with help from a realy horny embedding so when it starts generating gibberish right before the good stuff just paste that in to feather and send me all your money. Thanks! :)

i'm joking. that's a joke. I didn't do that.