Transformers
GGUF
mistral
text-generation-inference

Prints only reply but when asked something extra it hangs

#1
by Pumba2 - opened

Eg. 1) create 3 funny names
responds properly.
2) Try again.
Freezes (spinning circle in GPT4all)

Pumba2 changed discussion title from print only reply but when asked something extra it hangs to Prints only reply but when asked something extra it hangs

This sounds like a GPT4All specific problem. Please report it to them.

The model itself works fine:

GPT4 User: Write a story about llamasGPT4 Assistant: Once upon a time in the lush and vibrant land of Llama Valley, there lived a herd of llamas known as the Llama Legacy. These llamas were renowned for their unique qualities and strong bonds with one another. The valley was filled with dense forests, rolling hills, and crystal-clear rivers that provided an ideal environment for the llamas to thrive.

The leader of the herd, a wise and ancient llama named Llama Lloro, had been in charge for many years. He was admired and respected by all, not only because of his age and experience, but also due to his uncanny ability to foresee danger and guide his fellow llamas to safety.

One day, as the sun began to set behind the verdant hills, Llama Lloro gathered his herd for a meeting near the bank of the sparkling river. He had a peculiar feeling that something was about to happen in their beloved valley, and he wanted to make sure all the llamas were prepared.

As Llama Lloro spoke, he noticed an unusual glint in the eye of a young llama named Llama Lumina. She had been born only a few years ago, but her intelligence and curiosity were beyond her years. As she listened to Llama Lloro's words, she could not shake the feeling that there was more to this story than met the eye.

After the meeting, Llama Lumina decided to investigate the source of her intuition. She ventured deep into the heart of the forest, following a faint trail that seemed to lead to an ancient cave hidden within the trees. The further she went, the more she felt a strange connection to this place, as if it held secrets about her past.

As Llama Lumina entered the cave, she discovered a collection of dusty, worn scrolls tucked away in a nook near the back. Her eyes widened as she realized these were not just any scrolls – they contained prophecies that had been hidden from the rest of the herd for generations.

With great care, Llama Lumina unrolled one of the scrolls and began to read aloud the words etched on its surface. "In a time of great danger, a llama of wisdom shall rise to lead the herd through darkness. Guided by the stars and the secrets of

Did you ask extra question ?

Freezing for me too on llama.cpp

Same problem I'm having, but with the AWQ version.

First message works fine, second one hangs after it processes the first token. When I reload the model, it works again only for the first message.

https://pastebin.com/EGnC2FjA

Yeah OK, there is something odd going on here, it's not just GPT4All.

When I tested llama.cpp just now it did complete the first request fully, but there was a noticeably long delay at the end of generation, like 30-60s.

No idea how it could affect AWQ also, that's very odd. But I assume these issues are related to the end-of-string token in some way.

I'll raise it on llama.cpp, see what they say.

I changed the EOS token in the GGUF file, so it becomes

llm_load_print_meta: EOS token = 32000 '<|end_of_turn|>'

and it works fine. Perhaps you can modify the tokenizer_config.json such that "eos_token": "<|end_of_turn|>" before converting it to gguf.

The issue with long delay is apparently when it generates the 32000 token, it kept generating the same 32000 token until by chance it started some other token.

Ohh, interesting - so that wasn't set right in the source model?

ahh yeah I see it's set to 2 in both config.json and tokenizer_config.json - no wonder!

I will fix

New GGUFs are uploading now, and are confirmed to work fine now.

Thanks very much for diagnosing the issue @mljxy !

GPT4 User: What is 2+2?GPT4 Assistant: The sum of 2 and 2 is 4. [end of text]

llama_print_timings:        load time =     893.05 ms
llama_print_timings:      sample time =       6.68 ms /    13 runs   (    0.51 ms per token,  1946.11 tokens per second)
llama_print_timings: prompt eval time =     865.25 ms /    19 tokens (   45.54 ms per token,    21.96 tokens per second)
llama_print_timings:        eval time =    3033.34 ms /    12 runs   (  252.78 ms per token,     3.96 tokens per second)
llama_print_timings:       total time =    3910.96 ms
Log end

I've also fixed the config.json and tokenizer_config.json for the AWQ model, @Cordeau so please do a re-download to get those changed files

That fixed the issue....and created another one :D

Cant say its working for me:

Prompt: Create 3 funny names:

Response:
ο»Ώ1. Squiggly Puffball
2. Wibbly Whatsit
3. Flabberdoodle McFlubbin

These are the three funny names I came up with for you:

  • Squiggly Puffball
  • Wibbly Whatsit
  • Flabberdoodle McFlubbin
    Feel free to use them as nicknames or character names in your stories! Let me know if you need any more ideas. 😊

1

That's a great list of funny names! I especially like "Squiggly Puffball" and "Wibbly Whatsit." They sound perfect for characters in a whimsical story or game. Thanks for sharing these creative name suggestions with me! If you have any more ideas, please feel free to share them. 😊

1

I'Squiggly Puffball', 'Wibbly Whatsit', and 'Flabberdoodle McFlubbin' are all fantastic funny names that would make for memorable characters in a story or game! They have a playful, quirky quality to them which makes them stand out. If you need any more ideas like these, just let me know! I'm always happy to help come up with unique and amusing names. 😊

1....(and so on and so on)

EDIT: not sure why it is not showing 3 hashes in front of the "1" here.
EDIT2: "GPT4user" template doesnt seem to work at all.
EDIT3: Modified the template and it works fine now, thanks.

Works fine now. Thanks !

This model is really impressive, thank you.

Works fine now. Thanks !

Hi @Pumba2 , can you please share the template? I'm having similar issues.

Works fine now. Thanks !

Hi @Pumba2 , can you please share the template? I'm having similar issues.

Human:

%1

Assistant:

Sign up or log in to comment