oobabooga model loading help? Thank you!

#12
by waynekenney - opened

I'm surprised no one has asked this, but there's no documentation anywhere for this model. I've tried asking Open.Assisstant and Chat GPT4 with no guidance. My question is, why are there two bin files with random names, and when I attempt to load it into oobabooga, it attempts to load "pytorch_model-00001-of-00003.bin"? What's with the bin file names and why did the trainer of this model fail to put any instructions anywhere for using it and loading into a WebUI?
So, I've generated a model merging script with Chat GPT to merge the model into one file, I've done that and renamed it pytorch-model. It still fails to load. I've renamed the largest model 00001 and made the smallest model the 00002 model of the 00003 model set, still doesn't load. I've also tried to load it with this command:
python server.py --model vicuna-13b-free-4bit-128g --wbits 4 --groupsize 128

Can someone please explain to me how to use this software? It's incredible that someone trained that dataset. And I'd love to experience what reeducator has contributed to the community. I just need at least a hint or something:

vicuna-13b-free-f16.bin
vicuna-13b-free-q4_0.bin

These two bin files that I've manually downloaded via my browser because my git client constantly crashes isn't exactly a recognizable pattern... I was really hoping someone else would have asked this question before me, so I wouldn't be the idiot missing whatever obvious instruction or concept I've missed while experimenting with ai (Note: I'm definitely a noob with chatbots using ai). But, I wanted to test out this model since I've witnessed the data set.

Any help is super greatly appreciated. Thanks guys! <3 I also wanted to say I'm very sorry if my question is really dumb because I'm missing something obvious. Thanks a lot for reading my discussion post. I commend anyone willing to help me. I wish I was as smart as everyone else.

I haven't yet tried ooba myself, so someone else might have a better idea. But the two bin files in this repository are the weights in 16bit and 4bit ggml format that are compatible with llama.cpp. I read from here https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md that the .bin filenames should explicitly contain "ggml", so you could rename the 4bit quant to "ggml-vicuna-13b-free-q4_0.bin" and it might recognize it. By default I guess it tries to load the pytorch model files, which I haven't (yet) included in this repository due to how large they are. If on the other hand you want to use .safetensors which is GPTQ, make sure you've installed GPTQ-for-LLaMa as indicated in step 1 here https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md.

But yeah, this is only based on what I found, someone experienced with ooba might be able to point out further issues!

(i have to edit: i cant also make it work myself)

Some files are missing and i dont know where to get them:

For example, for me, it asks a config.json
If i add those coming from another vicuna 13g, i get the same error as the OP

Yes, it does not appear to work with the following error:

OSError: models/reeducator_vicuna-13b-free does not appear to have a file named config.json. Checkout 'https://huggingface.co/models/reeducator_vicuna-13b-free/None' for available files.

To make this work, I copied over the config.json, tokenizer.model and tokenizer_config.json from a previous 1.0 revision of the anon8231489123_vicuna-13b-GPTQ-4bit-128g repo.

This now appears to be working fine for me.

@waynekenney

Any help is super greatly appreciated. Thanks guys! <3 I also wanted to say I'm very sorry if my question is really dumb because I'm missing something obvious. Thanks a lot for reading my discussion post. I commend anyone willing to help me. I wish I was as smart as everyone else.

Hey bud, just words I know but as a guy who used to think I wasn't as smart as everyone else either, let me tell you that for the most part it's more a matter of how much you've learned, and less about higher or lower intelligence.
There are certainly people out there who have a natural propensity for understanding this stuff, but if you've gotten as far as you have with it than you can certainly learn how to do much more.

When I was in my early 20's I genuinely thought I was a buffoon (mostly due to my Father constantly telling me I was stupid growing up) until a hacker (grey hat) friend of mine tried teaching me how to mod consoles in the mid 2000's.
I would watch him see all these pages of code and numbers and he just seemed to know what it was all about, and I'd think 'this guy's so smart, I wish I was smart like that'.
One day I vocalized these thoughts, and he assured me I wasn't stupid - that I just needed to learn how to do it, and that all of us learn differently and at a different pace.
He told me all about tutorials and how he himself had started at the basics and began doing multiple tutorials (and eventually a free course or two) and eventually his knowledge grew to the 'genius' I thought I had standing before me.

That guy instilling his confidence in me changed my entire self-concept and after that I swore to myself that if I had trouble understanding something that I really wanted to learn, I would never let my belief that it was too complicated or hard stand in the way.

Since then I've gone on to learn and workably dabble in a number of coding languages, taught myself 3D graphics (using Autodesk and then 'graduating' to Blender haha), do a little chemistry and a few other things.

If I can do it so can you my friend. I hope this helps you in the same way he helped me.
Salud. :)

Hi guys. I'm back with an update. So, here's what I've done with this problem as it's been helpful with other ai related problems in the past. I've asked chat GPT to help me modify the webui's code. So, we figured out how to fix the bin file errors by changing the line within "models.py" to:

checkpoint = Path(f'{shared.args.model_dir}/{shared.model_name}-4bit-128g.safetensors')

If you're curious about which definition I've edited of the checkpoint variable, it's the one definition found in 'models.py' basically.

Which essentially loads the safetensor file, but now upon manual loading of the model. I get the config.json error. However, I've pulled the config file from the previous version this model is based on.
Are there any changes that must be made to the config file to load this model? Thanks guys!

I can confirm that renaming the bin to 'ggml-vicuna-13b-free-q4_0.bin' works, without any changes to configuration or scripts. No additional files are needed for the ggml format.

Hey Squish. Thanks a million for suggesting the ultimate fix to my problem! This actually resolved my issue. Thank you for enabling me to load my model finally!

What do i need to download with ooba and what do i need to download then ?
For example, do i need the big vicuna-13b-free-f16.bin file ?

What do i need to download with ooba and what do i need to download then ?
For example, do i need the big vicuna-13b-free-f16.bin file ?

Yeah. I've downloaded both, but I've only renamed the one bin file by adding ggml to the beginning forming "ggml-vicuna-13b-free-q4_0.bin" and it loads perfectly fine for me now.

the method given above worked for me.
The outputs of the bot made sense but i have only asked 3 questions so far

Just to clarify, only the vicuna-13b-free-q4_0.bin file is needed for GGML. Rename that one to ggml-vicuna-13b-free-q4_0.bin. That works for CPU. The ggml prefix just tells the webui to load it as a ggml.

I haven't been able to get the .safetensors format to load. I don't recall what all I tried, but adjusting config from a similar model didn't work. I'm sure I missed something there. If we can sort out what is missing maybe it can be included. Ideally it should install from the model tab webui like other models. I also vaguely recall some way of converting to GPTQ not working in the webui currently, but I don't know any details about that.
Depending on how it is installed, you should have the GPTQ-for-Llama extension by default.

The .safetensors / GPTQ format seems to load okay with the missing files. I think I forgot the filename issue last time. The missing files can be found at anon8231489123/vicuna-13b-GPTQ-4bit-128g

Create folder models/reeducator_vicuna-13b-free-4bit-128g
Note the 4bit-128g suffix. Include the .safetensors file, don't rename it.

From the above repository, include these files:
config.json
generation_config.json
pytorch_model.bin.index.json
special_tokens_map.json
tokenizer.model
tokenizer_config.json

Then it should just work. If you have a capable GPU this is the way to go. Seems like a great model so far. Thanks for everyone's hard work putting this together.

Sign up or log in to comment