rtuuuuuuuur

urtuuuu
ยท

AI & ML interests

None yet

Recent Activity

Organizations

None yet

urtuuuu's activity

Using the Model

1
#14 opened 8 days ago by
joel1610-hon
New activity in Qwen/Qwen2.5-14B-Instruct-1M 7 days ago
New activity in deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 17 days ago

System Prompt

13
#2 opened 19 days ago by
Wanfq
New activity in deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 18 days ago

System Prompt

7
#2 opened 19 days ago by
Wanfq

Not impressed?

2
#2 opened 19 days ago by
urtuuuu
New activity in deepseek-ai/DeepSeek-V3 about 1 month ago
New activity in huihui-ai/Falcon3-10B-Instruct-abliterated about 1 month ago

GGUFs eventually ?

6
#1 opened about 2 months ago by
HMasaki
replied to bartowski's post about 2 months ago
view reply

A bit annoying, isn't it? Some time ago I asked you for arm version of gemma-2-9b-it-abliterated. So now it won't work again. I guess there is no Q4_0 ?

reacted to bartowski's post with ๐Ÿ‘ about 2 months ago
view post
Post
42168
Looks like Q4_0_N_M file types are going away

Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)

You can see the reference PR here:

https://github.com/ggerganov/llama.cpp/pull/10446

So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)

As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !

Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541

Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
ยท
New activity in matteogeniaccio/phi-4 about 2 months ago
New activity in bartowski/EXAONE-3.5-7.8B-Instruct-GGUF 2 months ago

llama.cpp...

1
#1 opened 2 months ago by
urtuuuu