Seems to generate gibberish for me

#3
by anon8231489123 - opened

Tried running on oobabooga with the following command:
python server.py --model llama-storytelling-13b-4bit-32g --wbits 4 --listen --chat --groupsize 32 -
-settings settings-chatbot.json --model_type llama
Output was this after prompting slightly:
king us throughigskotoS,kc piecesLI Nk obkUS+AcLN Englishitu IT UnW piecestryoAc"]A\Hwo top± otA almostoma click antkie figuresbose shemen N orilakh = terboseThong+ ONUscomm rad Usku live"] nationsuboseposaUs­ritetrk ONais for agthey MoreTwo thr Alicevik &itthusakhs Wulog minimum amountsOnebisAnt click equ & nearly it now UnderAnt nom< app objectskils Achao absolutely +Longu organizationship San translatedong+aireaisoma standk almost intenosRerov kigsakh # Stand Alice almostsk oruSdr+ pieces &old Englishus ALLPEgs plussk millionIais lsilkth objects PDF+ logbose envi within flash Overk "wkomd lit Clilla Sees Strak schiskoRe+ Billybosakhт SUookbosemenC tot soughttrill + treat positionsSu Kasus hard Stand Twologatel loadsigsOhousblog Washington +at palesucorr occup AntiskoakhgoOus+ Ant Over Alice Vil Englishrieaiscboseyoflashboseaks magn nigs EnglishItboseTigsthomo for CharlieskbbeusiennstandyUsclbosesls + ls Chinaplus MLaireominbosewaiskmons Rat youigsIt Mongo coldcbosek now­ThComm+ then- lean theoretical Ast Inigs It Smithking Checkomsils ONbbe

Settings? Seems like a tokenizer problem.

Which GPTQ commit did you use?

I think it was 4b7c8bd, but I'm using the latest version on the cuda branch and it's working on my end.

Yeah if a GPTQ model is created with the latest GPTQ-for-LLaMa code, you then have to use that same latest GPTQ-for-LLaMa code in text-generation-webui/repositories otherwise you get gibberish.

I include instructions on how to update text-generation-webui with the latest GPTQ code in my GPTQ models. Example: https://huggingface.co/TheBloke/vicuna-7B-GPTQ-4bit-128g

Alternatively, if the GPTQ is done with the GPTQ-for-LLaMa fork provided by oobabooga (https://github.com/oobabooga/GPTQ-for-LLaMa) then it will work immediately in text-generation-webui. However I found with this older code you cannot use --act-order in the GPTQ settings, as otherwise it again produces gibberish. And without --act-order the inference results may be slightly lower quality (I don't know how much.)

On my Koala repos, eg https://huggingface.co/TheBloke/koala-13B-GPTQ-4bit-128g, I also provided an older GPTQ made with oobabooga's fork. But it takes a lot of time to produce three GPTQ files for every repo!

Sign up or log in to comment