Phatom Talking

#1
by deetungsten - opened

When I try loading any (7, 13,34) of your GGUF models (regardless of instruct or text completion) in interactive mode, it starts generating random code like below. I am running this with a vanilla command -m codellama-13b-instruct.Q8_0.gguf --color -i. Has anyone else seen this?

== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 using System;
using System.Collections.Generic;
using System.Lin

Yes, similar behavior here for codellama-13b-instruct.Q5_K_S.gguf and a variety of settings. For instance, for

./main -t 8 -ngl 32 -m models/13B/codellama-13b-instruct.Q5_K_S.gguf --color -c 8192 --temp 0.3 --rope-freq-base 10000 --rope-freq-scale 0.5 -i

I get:

== Running in interactive mode. ==

  • Press Ctrl+C to interject at any time.

  • Press Return to return control to LLaMa.

  • To return control without starting a new line, end your input with '/'.

  • If you want to submit another line, end your input with ''.

                                 if ( ! isset( $settings['icon'] ) || empty( $settings['icon'] ) ) {
                         $settings['icon'] = 'fa fa-star';
                     }
    
                     if ( ! isset( $settings['color'] ) || empty( $settings['color'] ) ) {
                         $settings['color'] = '#f1d204';
                     }
    
                     if ( ! isset( $settings['size'] ) || empty( $settings['size'] ) ) {
                         $settings['size'] = '35px';
                     }
    
                     if ( ! isset( $settings['margin_top'] ) || empty( $settings['margin_top'] ) ) {
    

For the prompt method, using -f codellama.prompt with

codellama.prompt content:

[INST]
<>
You are a helpful, respectful and honest assistant.

If you are unsure about an answer, truthfully say 'I don't know'.
<>

Write a story about llamas
[/INST]

I get a reasonable output:

Once upon a time, in the Andes mountains of South America, there lived a group of llamas. These llamas were known for their soft, woolly coats and their gentle nature. They spent their days grazing on the lush grasses of the Andes, and at night they would gather together to rest and sleep.

One day, a young llama named Luna decided that she wanted to explore the world beyond her home in the Andes. She packed a small bag with some food and water, and set off on her journey.

Luna traveled for many days, through mountains and valleys, until she finally reached a new land. This land was filled with strange creatures and plants that Luna had never seen before. But despite the challenges of this new world, Luna was determined to make it her home.

And so, Luna settled down in her new land, surrounded by the strange and wondrous creatures that she had encountered on her journey. She lived a long and happy life, and her story was passed down from generation to generation as a reminder of the power of determination and perseverance. [end of text]

Don't use --rope-freq-base with this model - the correct value is already included in the GGUF, and it is not 10000 for CodeLlama models

This will be affecting the quality of the output

I'll remove mention of that from CodeLlama GGUF READMEs

This still happens without that flag though. I tried this on a MBP and another computer with a nvidia GPU with the same thing happening. Is there a specific command I am suppose to use with main?

Just out of curiosity, what client backend were you guys using to load the model when the phantom talking happened? I'm not familiar with all of them that can load GGUF models. I don't recognize it from the text @deetungsten and @md2 has shared in this thread. Does this happen across different Client loading backends?

Sign up or log in to comment