TheBloke/CapybaraHermes-2.5-Mistral-7B-GPTQ

Feb 18, 2024

Hello and sorry for the question, anyone knows what happens to TheBloke? Has stopped working for any reason? Would we expect new awsome stuff?

Adriato

Feb 18, 2024

•

edited Feb 18, 2024

Been wondering too. My suspicion is that Huggingface may have put limitations on his account.

He was my only source of quantized LLMs, so not sure where to find them now.

deleted

Feb 19, 2024

He was my only source of quantized LLMs, so not sure where to find them now.

While he does provide a great service to the community, its not hard to do it yourself, and does not take a lot of resources, or time..

CyberTimon

Feb 19, 2024

I think he just takes a break as nothing big happens atm.

Languido

Feb 19, 2024

The Huggingface should think to let he owners of the pages to put a message for community, like i am on holidays, or I give up after incredibly mountain of work, or something. I think that maybe a lot of people is waiting for his return or to know what has happened to Thebloke.
I wish it is just a temporay brake, also.

Yhyu13

Feb 19, 2024

His pipeline shold already be automated. I couldn't imagine a single person monitoring all these quantization procedures and uploads every single day without going mad in the end the day.

cubestarx

Feb 19, 2024

o7

CyberTimon

Feb 19, 2024

His pipeline shold already be automated. I couldn't imagine a single person monitoring all these quantization procedures and uploads every single day without going mad in the end the day.
@Yhyu13

He actually starts all quants on his own althrough the quantization and upload process is automated.

deleted

Feb 19, 2024

Ya, he searches out new ones ( not sure his criteria ), but once he does that, it was automated.

deleted

Feb 19, 2024

The Huggingface should think to let he owners of the pages to put a message for community, like i am on holidays, or I give up after incredibly mountain of work, or something.

They do offer a blog..

ttkciar

Feb 19, 2024

He was my only source of quantized LLMs, so not sure where to find them now.

While he does provide a great service to the community, its not hard to do it yourself, and does not take a lot of resources, or time..

Not for converting safetensors models, but pytorch models are a pain in the arse. Most model publishers aren't noting the version of python they are using, nor the versions of the libraries they used. Not using the same versions of everything usually causes unpickling to fail.

I deeply appreciate that TheBloke takes care of all that crap for us, so I don't have to.

Adriato

Mar 2, 2024

He made an update on Github a few days ago and a comment about the same topic on another site mid February, so he's alive. Hopefully back here too soon.

mambiux

Mar 2, 2024

This should be a wake up call for some people leeching quantized LLMs from him, some people do extraordinary work for the community, and we should all support them however we can.

deleted

Mar 2, 2024

This should be a wake up call for some people leeching quantized LLMs from him, some people do extraordinary work for the community, and we should all support them however we can.

Even tho i have been doing my own for a while now, i have been saying this since the beginning. Even if you can do it yourself, he was still a valuable member of the community, and relied on by countless people. ( actually his documentation and stuff is how i started doing to myself. Thought it needed way more hardware than i had access too, but his docs 'showed me the way' )

kim512

Mar 3, 2024

@Nurb432 Would you mind linking to the documentation mentioned?
I have had some success quantizing models but may have failed any information about the best way to go about quantizing would be greatly appreciated.

deleted

Mar 3, 2024

•

edited Mar 3, 2024

if you download llama instructions are there, but TLDR version ( that i mostly stole as im too lazy to type )

All from the llama.ccp README:

mkdir build
cd build
cmake ..
cmake --build . --config Release

Then you can run the quantize tool from binary, located at llama.cpp/build/bin

cd llama.cpp/build/bin &&
./quantize /path to models/WizardCoder-Python-13B-V1.0.gguf WizardCoder-Python-13B-V1.0.q6_k.bin 18

or if you dont like Q6_k just replace 18 with whatever format you want. ( like 7 for Q8_0 )

Only time i have failures is if im behind on the build of llama.. Have to be sure you dont need a branch instead of main.

deleted

Mar 4, 2024

•

edited Mar 4, 2024

just ran across this too.. not tested -> https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script

EDIT - ( i guess its only for windows.. blech :) )

kim512

Mar 7, 2024

Great, thank you for the information.

deleted

Mar 9, 2024

I left out the first part tho, sorry.. convert the HF to 'raw' GGUF .. then use the quantize binary motes above.. This uses the included python in the llama.ccp repository.

example: python convert.py /pathtomodels/WizardCoder-Python-13B-V1.0 --outfile WizardCoder-Python-13B-V1.0.gguf

then the above binary to make it 'smaller' if you want..

( im sure my terms are off, but its how i think of things .. )

froilo

Apr 4, 2024

I left out the first part tho, sorry.. convert the HF to 'raw' GGUF .. then use the quantize binary motes above.. This uses the included python in the llama.ccp repository.

example: python convert.py /pathtomodels/WizardCoder-Python-13B-V1.0 --outfile WizardCoder-Python-13B-V1.0.gguf

then the above binary to make it 'smaller' if you want..

( im sure my terms are off, but its how i think of things .. )

how long does a 70B bin model take on what machine to 4bit...?

deleted

Apr 4, 2024

well it would all depend on your hardware. My old non-gpu xeon doing 13's took perhaps 5 minutes for the entire 2 step process doing 6_k's. never really timed it, so an estimate.

froilo

Apr 4, 2024

ok thx
so not hours ;)
ill give it a shot with some smaller 20GB raw torch model

donlinglok

Apr 10, 2024

I miss @TheBloke too, hopes he can come back soon.

And Many thanks for @Nurb432by sharing the tools:)

Languido

Apr 10, 2024

Yes, @TheBloke has been the referent point where inspect for the new models.
Now, huggingface is not the same. It is more difficult to see what is new, and which quantization published is trusty useful.

Yhyu13

Apr 10, 2024

Since TheBloke , in his readme for each model, claimed his was backed by Anderssen16(a venture capital)'s grant for his work. Could it be the case that the found was withdrawal? I could not tell.

Adriato

Apr 18, 2024

Agree @Languido . I've found @bartowski useful with GGUF and exl2 quants of some of the major models. Was quick with Llama 3. Not as industrious as @TheBloke was, though. Others?

Languido

Apr 19, 2024

WOW! @bartowski is doing very good job! Good to know! Thanks! And now that llama3 is published, a lot of new great models are arriving!
Note for @TheBloke : we will always apreciate the work you have done.

deleted

Apr 19, 2024

Agreed

Note for @TheBloke : we will always apreciate the work you have done.

olafgeibig

Apr 23, 2024

It's easy to quantize on the free tier of Colab with the notebooks of Maxime Labonne. I'm using AutoGGUF but there is also a newer script that is also doing EXL2, AWQ, etc. https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu

sauravm8

Apr 30, 2024

Requesting @TheBloke to return and quantize Llama3 for us :)

Languido

Jun 6, 2024

Well, I think the great job of TheBloke has been splitted in lot of accounts. But the one is clearly following its steps is Bartowski
@bartowski

TheBloke
/

CapybaraHermes-2.5-Mistral-7B-GPTQ

Whats up with TheBloke ?