Get only empty string '' response back - Anyone used them successfully?

by appliedstuff - opened Jul 20, 2023

Jul 20, 2023

•

edited Jul 20, 2023

I just want to know if someone was able to use them sucessfully? I only get empty responses.

I am using llama-cpp-python (latest versions). The english llma-2-7b-chat.ggmlv3.g4_0.bin works fine. Any suggestion or recommendations what I should test or what the reason could be?

I tried original quant and new quant models.

appliedstuff changed discussion title from Get only empty string '' response back - Any one used them successfully? to Get only empty string '' response back - Anyone used them successfully? Jul 20, 2023

TheBloke

Owner Jul 20, 2023

Hmm yeah I saw the same with the q4_0 model. But then I tried q5_1 and it works.. but won't shut up:

Maybe this model doesn't work very well in GGML :(

Silverspoon7

Jul 20, 2023

•

edited Jul 20, 2023

Yes, I tried q4K and q6k variant, and it does gibberish output in both English and German, maybe depending on temperature settings a little bit. I tried the Open Assistant instruction template on chat-instruct, but without success (except of doing output instead of giving empty results). Maybe it is not really compatible with oobabooga or need completely own hf settings.

wolfram

Jul 20, 2023

Same here, doesn't work. Used llama-2-13b-german-assistant-v2.ggmlv3.q5_K_M.bin and got only the beginning of a response, then it started outputting only blanks.

Silverspoon7

Jul 20, 2023

When you got blanks, prompt template is empty or wrong. See model card for correct prompt. Turn down temperature to get better results. As explained before, results are not good with my tests.

appliedstuff

Jul 28, 2023

Unfourtanetly, the results are also not good for me and the q5_1 works. So what makes that model so worse?
Is it the conversion to GGML?
Or the model itself?

Silverspoon7

Jul 31, 2023

That model is completely overfitted I think. I trained a blank llama2 13b model and got better results. But I haven‘t used that instruction sets, which are not bad at all. But I don‘t need them. I simply apply my Lora file to an orca llama2 after training the blank model. That works good for me. I am using the text generation ui for training, and it seems there are some compatibility issues, so training the orca-llama2 doesn‘t start…

My results are these so long: Use a Lora rank around 32, learning rate of 2-e5 and two epochs. Meanwhile, I trained 10 MB mixed books and 10 MB of specialized stories, the model keeps in context, but German is still poor, but build up bias on specialized stories. I think that this is an awesome result for a private hobby research with one simple RTX 4090 GPU and just five GPU hours. The hardest work is to clean the German books. Or getting some. I cleaned up text in more than ten hours. With my own hands. :-D

Pawloo

Aug 18, 2023

That model is completely overfitted I think. I trained a blank llama2 13b model and got better results. But I haven‘t used that instruction sets, which are not bad at all. But I don‘t need them. I simply apply my Lora file to an orca llama2 after training the blank model. That works good for me. I am using the text generation ui for training, and it seems there are some compatibility issues, so training the orca-llama2 doesn‘t start…

My results are these so long: Use a Lora rank around 32, learning rate of 2-e5 and two epochs. Meanwhile, I trained 10 MB mixed books and 10 MB of specialized stories, the model keeps in context, but German is still poor, but build up bias on specialized stories. I think that this is an awesome result for a private hobby research with one simple RTX 4090 GPU and just five GPU hours. The hardest work is to clean the German books. Or getting some. I cleaned up text in more than ten hours. With my own hands. :-D

Sounds really interesting! Do you mind sharing your dataset? I understand if you can not share the dataset since there might be issues 'redistributing' those books.

Silverspoon7

Aug 18, 2023

Yes, i have trained 50 mb books meanwhile, Lora rank 64, but used the model llama2 model from John p. (Jphme). German is getting better, but is still poor. Data is a raw text collection from books, I am not allowed to share… sorry for that… I think I will need another 300-500 mb and 200 gpu hours… this is really hard. I am switching now to use huggingface dataset, but scripting that costs some time, too. Then I will be able to use a filtered oasst dataset for instruct training, which could be a little bit more efficient. I also figured out to train some stories twice in German and English. Loss value curve was incredible. But results are hard to evaluate on the poor amount of translations (200kb text, used Google translate). But it seems like the English pretrained model can profit from that.

Silverspoon7

Aug 18, 2023

Here is a sample output from my model:

Erzähl mir das Märchen Hänsel und Gretel der Gebrüder grimm

Chiharu Yamada
Die beiden Kinder werden von ihrem Vater in eine Wildnis geschickt; dort geraten sie in große Gefahr, weil sie den Wolf nicht finden können. Am Ende wird sie beide fast fressen, aber durch List entkommen sie dem Tod.

blakepeerlogic

Oct 5, 2023

For what it's worth, I've done a lot of experimentation with this and what I found is the following:

if you implement retry logic, it'll generally output something non-blank eventually
the rate at which it outputs blanks is directly correlated to the token-length of the prompt. I found that it seems to fully hang up after about 400 tokens on my machine (M1 MacBook Pro). I'm modifying my prompts in my project to be broken up at as fine a level as possible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment