Whats up with TheBloke ?

#2
by Languido - opened

Hello and sorry for the question, anyone knows what happens to TheBloke? Has stopped working for any reason? Would we expect new awsome stuff?

Been wondering too. My suspicion is that Huggingface may have put limitations on his account.

He was my only source of quantized LLMs, so not sure where to find them now.

deleted

He was my only source of quantized LLMs, so not sure where to find them now.

While he does provide a great service to the community, its not hard to do it yourself, and does not take a lot of resources, or time..

I think he just takes a break as nothing big happens atm.

The Huggingface should think to let he owners of the pages to put a message for community, like i am on holidays, or I give up after incredibly mountain of work, or something. I think that maybe a lot of people is waiting for his return or to know what has happened to Thebloke.
I wish it is just a temporay brake, also.

His pipeline shold already be automated. I couldn't imagine a single person monitoring all these quantization procedures and uploads every single day without going mad in the end the day.

His pipeline shold already be automated. I couldn't imagine a single person monitoring all these quantization procedures and uploads every single day without going mad in the end the day.
@Yhyu13

He actually starts all quants on his own althrough the quantization and upload process is automated.

deleted

Ya, he searches out new ones ( not sure his criteria ), but once he does that, it was automated.

deleted

The Huggingface should think to let he owners of the pages to put a message for community, like i am on holidays, or I give up after incredibly mountain of work, or something.

They do offer a blog..

He was my only source of quantized LLMs, so not sure where to find them now.

While he does provide a great service to the community, its not hard to do it yourself, and does not take a lot of resources, or time..

Not for converting safetensors models, but pytorch models are a pain in the arse. Most model publishers aren't noting the version of python they are using, nor the versions of the libraries they used. Not using the same versions of everything usually causes unpickling to fail.

I deeply appreciate that TheBloke takes care of all that crap for us, so I don't have to.

He made an update on Github a few days ago and a comment about the same topic on another site mid February, so he's alive. Hopefully back here too soon.

This should be a wake up call for some people leeching quantized LLMs from him, some people do extraordinary work for the community, and we should all support them however we can.

deleted

This should be a wake up call for some people leeching quantized LLMs from him, some people do extraordinary work for the community, and we should all support them however we can.

Even tho i have been doing my own for a while now, i have been saying this since the beginning. Even if you can do it yourself, he was still a valuable member of the community, and relied on by countless people. ( actually his documentation and stuff is how i started doing to myself. Thought it needed way more hardware than i had access too, but his docs 'showed me the way' )

@Nurb432 Would you mind linking to the documentation mentioned?
I have had some success quantizing models but may have failed any information about the best way to go about quantizing would be greatly appreciated.

deleted
edited Mar 3

if you download llama instructions are there, but TLDR version ( that i mostly stole as im too lazy to type )

All from the llama.ccp README:

mkdir build
cd build
cmake ..
cmake --build . --config Release

Then you can run the quantize tool from binary, located at llama.cpp/build/bin

cd llama.cpp/build/bin &&
./quantize /path to models/WizardCoder-Python-13B-V1.0.gguf WizardCoder-Python-13B-V1.0.q6_k.bin 18

or if you dont like Q6_k just replace 18 with whatever format you want. ( like 7 for Q8_0 )

Only time i have failures is if im behind on the build of llama.. Have to be sure you dont need a branch instead of main.

deleted
edited Mar 4

just ran across this too.. not tested -> https://huggingface.co/FantasiaFoundry/GGUF-Quantization-Script

EDIT - ( i guess its only for windows.. blech :) )

Great, thank you for the information.

deleted

I left out the first part tho, sorry.. convert the HF to 'raw' GGUF .. then use the quantize binary motes above.. This uses the included python in the llama.ccp repository.

example: python convert.py /pathtomodels/WizardCoder-Python-13B-V1.0 --outfile WizardCoder-Python-13B-V1.0.gguf

then the above binary to make it 'smaller' if you want..

( im sure my terms are off, but its how i think of things .. )

I left out the first part tho, sorry.. convert the HF to 'raw' GGUF .. then use the quantize binary motes above.. This uses the included python in the llama.ccp repository.

example: python convert.py /pathtomodels/WizardCoder-Python-13B-V1.0 --outfile WizardCoder-Python-13B-V1.0.gguf

then the above binary to make it 'smaller' if you want..

( im sure my terms are off, but its how i think of things .. )

how long does a 70B bin model take on what machine to 4bit...?

deleted

well it would all depend on your hardware. My old non-gpu xeon doing 13's took perhaps 5 minutes for the entire 2 step process doing 6_k's. never really timed it, so an estimate.

ok thx
so not hours ;)
ill give it a shot with some smaller 20GB raw torch model

I miss @TheBloke too, hopes he can come back soon.

And Many thanks for @Nurb432by sharing the tools:)

Yes, @TheBloke has been the referent point where inspect for the new models.
Now, huggingface is not the same. It is more difficult to see what is new, and which quantization published is trusty useful.

Since TheBloke , in his readme for each model, claimed his was backed by Anderssen16(a venture capital)'s grant for his work. Could it be the case that the found was withdrawal? I could not tell.

Agree @Languido . I've found @bartowski useful with GGUF and exl2 quants of some of the major models. Was quick with Llama 3. Not as industrious as @TheBloke was, though. Others?

WOW! @bartowski is doing very good job! Good to know! Thanks! And now that llama3 is published, a lot of new great models are arriving!
Note for @TheBloke : we will always apreciate the work you have done.

deleted

Agreed

Note for @TheBloke : we will always apreciate the work you have done.

It's easy to quantize on the free tier of Colab with the notebooks of Maxime Labonne. I'm using AutoGGUF but there is also a newer script that is also doing EXL2, AWQ, etc. https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu

Requesting @TheBloke to return and quantize Llama3 for us :)

Well, I think the great job of TheBloke has been splitted in lot of accounts. But the one is clearly following its steps is Bartowski
@bartowski

Sign up or log in to comment