TheBloke
/

Vicuna-33B-1-3-SuperHOT-8K-GGML

Model card Files Files and versions Community

Issues in Text-Generation-Wev-ui

by rombodawg - opened Jun 30, 2023

Jun 30, 2023

So im running this model in text gen, and its partially working with llamaccp, but the models glitch out after about 6 tokens and start repeating the same words, and if you increase repeat penaly they will start spewing out random words. Any idea how to fix this?

rombodawg

Jun 30, 2023

Im seeing now that support for text gen is still slowly being worked on, in the mean time what command do we use for this in window for kobaldcpp?

TheBloke

Owner Jun 30, 2023

The same as I show in the Readme, just you run koboldcpp.exe instead of python koboldcpp.py - the rest of the arguments should be the same

rombodawg

Jun 30, 2023

Ok i was able to get it to run, however still have the issue of the models glitch out after about 6 tokens and start repeating the same words, here is what im running on windows
koboldcpp.exe --stream --unbantokens --threads 8 --noblas vicuna-33b-1.3-superhot-8k.ggmlv3.q6_K.bin
Im running on cpu exclusively because i only have enough ram on cpu to run the model. Is there something im doing wrong that causing the glitch? what settings do you run on kobaldcpp so the model behaves normally?

TheBloke

Owner Jun 30, 2023

You need to set --contextsize , eg --contextsize 4096. These SuperHOT models seem to perform very poorly at the default 2048 context, but are OK at higher context sizes.

This should be resolved in future as the context-increasing algorithm improves.

nichedreams

Jul 1, 2023

This comment has been hidden

rombodawg

Jul 1, 2023

Honestly i dont know if im using the wrong version of koboldcpp.exe, but that program only allows you to generate up to 500 tokens, even with the --contextsize 4096 flag enabled. The version im using is here on windows:
https://github.com/LostRuins/koboldcpp

rombodawg

Jul 2, 2023

•

edited Jul 2, 2023

Hey i finally got it working, using the command lines to change the context size in koboldcpp doesnt work for generation, maybe it helps for loading the model but you have to do into settings, and select the actual number that represents Max_tokens and also amount_to_generate and set those manually to 8k and 4k respectively. thats the only way it works, even though the sliders dont go that far you can edit the numbers yourself and the models will work with it.

TheBloke

Owner Jul 8, 2023

Ah yeah, I used to have a note about that in my READMEs but it's got lost somewhere along the way. I'll make sure to add it in future!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment