Get only empty string '' response back - Anyone used them successfully?

#1
by appliedstuff - opened

I just want to know if someone was able to use them sucessfully? I only get empty responses.

I am using llama-cpp-python (latest versions). The english llma-2-7b-chat.ggmlv3.g4_0.bin works fine. Any suggestion or recommendations what I should test or what the reason could be?

I tried original quant and new quant models.

appliedstuff changed discussion title from Get only empty string '' response back - Any one used them successfully? to Get only empty string '' response back - Anyone used them successfully?

Hmm yeah I saw the same with the q4_0 model. But then I tried q5_1 and it works.. but won't shut up:

image.png

Maybe this model doesn't work very well in GGML :(

Yes, I tried q4K and q6k variant, and it does gibberish output in both English and German, maybe depending on temperature settings a little bit. I tried the Open Assistant instruction template on chat-instruct, but without success (except of doing output instead of giving empty results). Maybe it is not really compatible with oobabooga or need completely own hf settings.

Same here, doesn't work. Used llama-2-13b-german-assistant-v2.ggmlv3.q5_K_M.bin and got only the beginning of a response, then it started outputting only blanks.

When you got blanks, prompt template is empty or wrong. See model card for correct prompt. Turn down temperature to get better results. As explained before, results are not good with my tests.

Unfourtanetly, the results are also not good for me and the q5_1 works. So what makes that model so worse?
Is it the conversion to GGML?
Or the model itself?

That model is completely overfitted I think. I trained a blank llama2 13b model and got better results. But I haven‘t used that instruction sets, which are not bad at all. But I don‘t need them. I simply apply my Lora file to an orca llama2 after training the blank model. That works good for me. I am using the text generation ui for training, and it seems there are some compatibility issues, so training the orca-llama2 doesn‘t start…

My results are these so long: Use a Lora rank around 32, learning rate of 2-e5 and two epochs. Meanwhile, I trained 10 MB mixed books and 10 MB of specialized stories, the model keeps in context, but German is still poor, but build up bias on specialized stories. I think that this is an awesome result for a private hobby research with one simple RTX 4090 GPU and just five GPU hours. The hardest work is to clean the German books. Or getting some. I cleaned up text in more than ten hours. With my own hands. :-D

That model is completely overfitted I think. I trained a blank llama2 13b model and got better results. But I haven‘t used that instruction sets, which are not bad at all. But I don‘t need them. I simply apply my Lora file to an orca llama2 after training the blank model. That works good for me. I am using the text generation ui for training, and it seems there are some compatibility issues, so training the orca-llama2 doesn‘t start…

My results are these so long: Use a Lora rank around 32, learning rate of 2-e5 and two epochs. Meanwhile, I trained 10 MB mixed books and 10 MB of specialized stories, the model keeps in context, but German is still poor, but build up bias on specialized stories. I think that this is an awesome result for a private hobby research with one simple RTX 4090 GPU and just five GPU hours. The hardest work is to clean the German books. Or getting some. I cleaned up text in more than ten hours. With my own hands. :-D

Sounds really interesting! Do you mind sharing your dataset? I understand if you can not share the dataset since there might be issues 'redistributing' those books.

Yes, i have trained 50 mb books meanwhile, Lora rank 64, but used the model llama2 model from John p. (Jphme). German is getting better, but is still poor. Data is a raw text collection from books, I am not allowed to share… sorry for that… I think I will need another 300-500 mb and 200 gpu hours… this is really hard. I am switching now to use huggingface dataset, but scripting that costs some time, too. Then I will be able to use a filtered oasst dataset for instruct training, which could be a little bit more efficient. I also figured out to train some stories twice in German and English. Loss value curve was incredible. But results are hard to evaluate on the poor amount of translations (200kb text, used Google translate). But it seems like the English pretrained model can profit from that.

Here is a sample output from my model:

Erzähl mir das Märchen Hänsel und Gretel der Gebrüder grimm

Chiharu Yamada
Die beiden Kinder werden von ihrem Vater in eine Wildnis geschickt; dort geraten sie in große Gefahr, weil sie den Wolf nicht finden können. Am Ende wird sie beide fast fressen, aber durch List entkommen sie dem Tod.

For what it's worth, I've done a lot of experimentation with this and what I found is the following:

  • if you implement retry logic, it'll generally output something non-blank eventually
  • the rate at which it outputs blanks is directly correlated to the token-length of the prompt. I found that it seems to fully hang up after about 400 tokens on my machine (M1 MacBook Pro). I'm modifying my prompts in my project to be broken up at as fine a level as possible.

Sign up or log in to comment