GPT-OSS-120B running on phone with 8GB of RAM

#216
by InfiniteVoid - opened

I just squeezed a 120B parameter MoE model into a Poco X4 GT. Yes, an Android phone with only 8GB of RAM.
Model: GPT-OSS-120B (117B total) Speed: ~0.17 tok/s
It took 32 minutes to generate the answer, but the system remained perfectly stable.

Ran GPT-OSS-120B locally on my Intel Core Ultra 7 258V laptop with 32GB RAM.

Latest real session:
prompt: 3.09–6.77 tok/s
generation: 2.88–3.06 tok/s

extreme quantization?

extreme quantization?

MXFP4 ~= 60.86 GiB. both - phone and laptop uses the same

Sign up or log in to comment