To the few who can run it, Post speeds, setup, results quality.

#1
by alphaprime90 - opened

To the few who can run it, Post speeds, setup, results quality.

I use the 4-bit version on 128 gb MacBook m3 Max. all the samples in the model card come from that.

How well does the q_4 version compare to Miqu 120b?

How many tokens per second could you reach with your Macbook in this way? @ehartford

This is a very different model. Not really comparable.

How well does the q_4 version compare to Miqu 120b?

I use the 4-bit version on 128 gb MacBook m3 Max. all the samples in the model card come from that.

Hi Eric, how is your t/s? dare I say sup 1 t/s

I use the 4-bit version on 128 gb MacBook m3 Max. all the samples in the model card come from that.

Hi Eric, how is your t/s? dare I say sup 1 t/s

I quantized the OG model myself so its q4_k_m but here is what I am getting on a Mac studio/M1/64core gpu/128gb ram

total duration: 2m30.232853459s
load duration: 1.101417ms
prompt eval count: 23 token(s)
prompt eval duration: 3.85487s
prompt eval rate: 5.97 tokens/s
eval count: 720 token(s)
eval duration: 2m26.375722s
eval rate: 4.92 tokens/s

Hi, How can I merge the aa and ab files?

Sign up or log in to comment