Text Generation
Transformers
English
llama
Inference Endpoints
text-generation-inference

Great triton-4bit-128g.safetensors is giving me giberish?

#6
by Goldenblood56 - opened

I tried in chat and instruct mode. Double checked my other models still work fine. I updated "Ooba"

Here is how I installed it.
I copy and pasted my "reeducator_vicuna-13b-cocktail" folder. Renamed the copy to "reeducator_vicuna-16b-cocktail_Tri"
put the Triton model in there and removed the former model so only the new one is there.

Ran it with command line "call python server.py --auto-devices --chat --model reeducator_vicuna-16b-cocktail_Tri --wbits 4 --groupsize 128" same as before. It's giving me gibberish? If anyone has any ideas please let me know.

Here is what I suspect is the issue.

  1. I need to install or update sort sort of addon for Ooba something that does not install or update with git pull.
  2. The files that are in "Files and Versions" tokenizer.model are not for the Triton model? Or there is some missing step?
  3. Something is off with my arguments?

If no on here knows I will likely seek assistance on Ooba's discussion link. Thank you.

Have you actually switched to a Triton branch in your repositories/GPTQ-for-LLaMa ?

This is the Triton branch: https://github.com/qwopqwop200/GPTQ-for-LLaMa as per https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md

Have you actually switched to a Triton branch in your repositories/GPTQ-for-LLaMa ?

This is the Triton branch: https://github.com/qwopqwop200/GPTQ-for-LLaMa as per https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md

Thank you mancub. That must be my issue. Since I'm a windows user I also need WSL it seems. Whatever that is. I may or may not try to set this up. It seems a little complicated and I can never find like a step by step guide on these pages. I usually don't understand more than half of what I'm actually doing. So I will read though all of this another time. I like finding like a Youtube video of someone just walking us though it. That's how I figured out Stable Diffusion and most of this AI stuff. Just following Youtube guides. But I don't think any of the AI channels have setup this yet. Thanks again I'm used to figuring it out as I go so I will get around to it eventually.

I'm the same way, figuring out as I go. Can't be bothered watching instructional YT videos though because they are too long and I only need a tldr; most of the time.

I am running WSL and have no issues. Matter a fact I interchange between CUDA and Triton branches for oobabooga, depending on the model I use.

Setting WSL up is not hard and definitely worth the effort because, like you, I tried running in native Windows at first and it was such a pain to get everything compiled and working. I gave up after spending couple of days trying to compile BitsAndBytes and make it work. Went ahead and installed Ubuntu in WSL. Already had Debian running for other things so the basic setup was there.

If you start from https://learn.microsoft.com/en-us/windows/wsl/install or https://learn.microsoft.com/en-us/windows/wsl/install-manual you will be up and running with WSL in not time.

Sign up or log in to comment