Tried with MPS?

by kronosprime - opened

I adjusted the code to replace CUDA references with MPS, but after 20 minutes on the fastest M2 with 96GB the generations hadn't fully finished. So I wanted to ask if anyone else had the same result, or did it work for you?

You might want to run the GGUF version then, no? Not sure whether @TheBloke has quantized this.

Sign up or log in to comment