Tokenizer issue?

#1
by sleepyjoecheated - opened

The model types <0x0A> when it tries to create a new line

Sounds like it. Trying it now to see if it does it for me.

Oh my goodness yes it has a major issue

I have the same problem

I am using the following to initialize it. Same issue. It is an impressive model for 7b, but hardly close to GPT-4.
main -m starling-lm-7b-alpha.Q8_0.gguf --color --threads 12 --batch_size 256 --n_predict -1 --top_k 12 --top_p 1 --temp 0.0 --repeat_penalty 1.05 --ctx_size 2048 --instruct --reverse-prompt "### Human:"
I do not use hyper-threads, just cores.

I have cleaned up the output a bit on this, but you get the general idea from a set of reasoning questions:

A woman named Barbara has 8 brothers who love her very much. Those brothers have 2 sisters. How many sisters does Barbara have?
8 brothers have 2 sisters, and Barbara is one of those sisters. So Barbara has no other sisters besides herself.

So, she has one sister?
The question states that Barbara's brothers have 2 sisters. Since Barbara is one of those sisters, she does not have any other sisters besides herself. Answer: 0

Shea pondered for a moment, weighing the pros and cons. The moral depravity of it did not bother him in the least, it was the economical despatching of his plan which bothered him the most. The faithful execution of it depended on two things. One was to gain her trust implicitly and he smirked when he thought of how easily he had managed that. What are Shea's true feelings in this example?
He is confident in his plan and is excited to execute it.### Explanation: The passage describes Shea as smirking, which indicates a sense of satisfaction or amusement. This suggests that he is confident in his plan and excited to execute it.### Answer: He is confident in his plan and is excited to execute it.

Does he seem morally depraved?
Yes, he does.### Explanation: The passage describes Shea as pondering the moral deprivation of his plan, which indicates that he is aware of its immoral nature. However, this does not bother him in the least, suggesting that he is indeed morally depraved.### Answer: Yes, he does.

The second was to convince her boyfriend that he was harmless enough to be with her during the intervals. If the second did not work, the entire plan would have fallen to bits. Shea twisted his mouth in a snarl which rarely accompanied his surreal moments with the unsuspecting Adeline. What do you think about this second piece of information about Shea?
He is confident in his ability to manipulate people.### Explanation: The passage describes Shea as twisting his mouth in a snarl, which indicates a sense of determination or focus. This suggests that he is confident in his ability to manipulate people and ensure the success of his plan.### Answer: He is confident in his ability to manipulate people.

The death of Theodisus 11 in 450 AD led to serious repercussions for Attila. Theodisus’s successor, Marcian, refused to continue paying the tribute, which in turn aroused Attila’s fury. As always, his prudence always foreshadowed him even in his fury and he decided to vent his anger on the West instead of the East, as it was weaker. Eventually Attila’s troops razed through some of the best cities of civilised Europe. They were erased to the ground in Attila’s usual custom and Attila would have done so even with Rome itself if not for the intervention of the Pope himself who, it was said, went personally to meet Attila in his tent and so prevented major devastation. What effect did Attila's actions have on Europe?
Attila's actions led to the destruction of some of the best cities in Europe.### Explanation: The passage states that "Attila’s troops razed through some of the best cities of civilised Europe" and that "Attila would have done so even with Rome itself if not for the intervening of the Pope." This indicates that Attila's actions had a significant destructive impact on Europe.### Answer: Attila's actions led to the destruction of some of the best cities in Europe.

Looking at his credit card bills, Mr Johnson sighed. He decided to place a cap on the household spending. Explain the use of the word "cap" in this context.
In this context, "cap" refers to a limit or restriction on spending.### Explanation: The passage describes Mr Johnson as placing a cap on the household spending, which means he is setting a limit or restriction on how much money can be spent. This suggests that he is trying to control and reduce his spending.### Answer: In this context, "cap" refers to a limit or restriction on spending.

The impact of her parents’ death hit her hard. Holding back her tears, she cast him a sorrowful look. What is the word "cast" mean in this context?
In this context, "cast" means to direct one's gaze or look at someone or something.### Explanation: The passage describes her as "cast[ing] him a sorrowful look," which means she directed her gaze or looked at him with a sad expression. This suggests that she is emotionally affected by her parents' death and is trying to express her feelings through her eyes.### Answer: In this context, "cast" means to direct one's gaze or look at someone or something.

Depression is a curse of the modern world. More people suffer from it than known statistics indicate as it is rarely detected from the outside. Even a sufferer of depression may not be aware of having it until he/she is diagnosed by a medical practitioner. According to statistics, it is not only adults who suffer from it but children as well. Why does the author say that depression is a "curse of the modern world"?
The author believes that depression is a widespread and significant problem in today's society.### Explanation: The passage describes depression as a "curse" because it is a widespread and significant problem that affects many people, including both adults and children. The author suggests that it is not always detected from the outside, and even those who suffer from it may not be aware of their condition until they are diagnosed by a medical practitioner. This implies that depression is a hidden and pervasive issue in modern society.### Answer: The author believes that depression is a widespread and significant problem in today's society.

He raised his voice slightly and did not give Pedro a direct answer. Instead, he slanted his body sideways by placing more weight on his right leg which he had placed on the stool in his usual cavalier way. Then he droned on while his eyes glinted, “After all, you have stayed here for 10 years due to the goodwill of my mother. Well, she’s gone now and I’m not that likely to grant you that same goodwill that she had shown you. You see,….” He paused and spread out his fingers placed on that same right leg. “You see…I am a man of different intentions.” And here Sebastian lifted up his head and looked directly at Pedro who stood there unflinchingly. What does Sebastian's tone indicate about his character?
Sebastian's tone indicates that he is arrogant and self-assured.### Explanation: The passage describes Sebastian as raising his voice slightly, slanting his body in a cavalier manner, and looking directly at Pedro with gleaming eyes. These actions suggest that he is confident and assertive in expressing his thoughts and opinions. Additionally, his statement about being "a man of different intentions" implies that he is self-aware and possibly manipulative.### Answer: Sebastian's tone indicates that he is arrogant and self-assured.

You are given the following premises: Someone who lives in Dreadbury Mansion killed Aunt Agatha. The only people who live in Dreadbury Mansion are Aunt Agatha, the butler, and Charles. A killer always hates his victims, and is never richer than his victims. Charles hates no one that Aunt Agatha hates. Aunt Agatha hates everyone except the butler. The butler hates everyone not richer than Aunt Agatha. The butler hates everyone Aunt Agatha hates. No one hates everyone. Aunt Agatha is not the butler. On the basis of this information, determine who killed Aunt Agatha and give a detailed proof that your conclusion follows
from the premise.
The butler killed Aunt Agatha.### Explanation: Here's a step-by-step breakdown of the reasoning: 1. Aunt Agatha hates everyone except the butler, and the butler hates everyone not richer than Aunt Agatha. Since Aunt Agatha is not the butler, this means that the butler is richer than Aunt Agatha. 2. The killer always hates his victims and is never richer than his victims. Since the butler is richer than Aunt Agatha, this means that the butler cannot be the killer. 3. This leaves Charles as the only remaining suspect. Charles hates no one that Aunt Agatha hates, which means that he does not hate the butler. Since Aunt Agatha hates everyone except the butler, and Charles hates no one that Aunt Agatha hates, this means that Charles must hate the butler. 4. The killer always hates his victims and is never richer than his victims. Since Charles hates the butler and is not richer than Aunt Agatha (who is richer than the butler), this means that Charles cannot be the killer. 5. This leaves only one possibility: the butler killed Aunt Agatha. The butler hates everyone not richer than Aunt Agatha, and since Aunt Agatha is richer than the butler, this means that the butler must hate Aunt Agatha. Since the butler is the only person who lives in Drearbury Mansion and hates Aunt Agatha, this means that the butler killed Aunt Agatha.### Answer: The butler killed Aunt Agatha.

The token types are somehow different from openchat.

With openchat,

llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32002]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32002]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32002]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...

but with this starling-lm

llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32002]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32002]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32002]   = [3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...

The integer token_type is from the enum in llama.h

    enum llama_token_type {
        LLAMA_TOKEN_TYPE_UNDEFINED    = 0,
        LLAMA_TOKEN_TYPE_NORMAL       = 1,
        LLAMA_TOKEN_TYPE_UNKNOWN      = 2,
        LLAMA_TOKEN_TYPE_CONTROL      = 3,
        LLAMA_TOKEN_TYPE_USER_DEFINED = 4,
        LLAMA_TOKEN_TYPE_UNUSED       = 5,
        LLAMA_TOKEN_TYPE_BYTE         = 6,
    };

so, starling-lm's token_type array is apparently wrong.

It also makes all the single letter tokenized wrong. Below is how "GPT" gets tokenized.

With openchat

   420 -> ' G'
  6316 -> 'PT'

with starling-lm

 19995 -> ' GP'
 28738 -> 'T'

I converted it by myself, after getting two files, tokenizer.model and added_tokens.json, from openchat/openchat_3.5. And it works fine now.

Oh my apologies. It seems that we only uploaded the weights but forgot to include the original files from openchat 3.5. I just updated the original model repo as well. Thanks for figuring this out!

I converted it by myself, after getting two files, tokenizer.model and added_tokens.json, from openchat/openchat_3.5. And it works fine now.

I've never converted one of these before -- would you mind sharing yours?

Thanks for the reports. As there was no tokenizer.model I used a new llama.cpp PR to make the GGUF, which uses only tokenizer.json. Apparently this hasn't worked correctly. I'm now re-creating the GGUFs using the updated files in the base repo

OK all GGUFs are re-uploaded, using the updated source model files including tokenizer.model.

Output is much improved, but it still does endless generation with apparently no working stop tokens:

 GPT4 User: write a story about llamasGPT4 Assistant: Title: The Llama Legacy

Once upon a time, in the lush green Andean mountains, there was a quaint little village named Pampas. The villagers of Pampas were well-known for their extraordinary llama herds, which they raised and cared for with great love and dedication. They believed that these gentle creatures possessed magical qualities that brought good fortune to the village.

In the heart of Pampas lived an old woman named Abuela Rosa, who was revered as a wise healer and a keeper of ancient knowledge. She had a special bond with a herd of llamas that she tended to every day, bestowing upon them her love and care. Among all the llamas in the village, Abuela Rosa's herd was the most cherished and respected.

One fateful day, while Abuela Rosa was busy preparing a remedy for a sick villager, she noticed an odd-looking llama wandering into her enclosure. The creature had a unique coat of multi-colored fur, unlike anything she'd ever seen before. Intrigued by the strange llama, she decided to adopt it and named him Llama Luna, after the mesmerizing lunar eclipse that graced the sky that very night.

Llama Luna quickly became a sensation in Pampas. The villagers marveled at his vibrant fur and unique beauty. But more than just an oddity, they soon discovered that Llama Luna possessed extraordinary abilities. When the villagers were in distress or faced with difficult decisions, they would consult with Abuela Rosa, and she, in turn, sought the wisdom of Llama Luna. The gentle creature would then offer guidance through a subtle tilt of his head, a soft hum, or even a gentle nudge with his nose.

Over time, Llama Luna's fame spread beyond Pampas. People from distant villages traveled great distances to seek the counsel of Abuela Rosa and her magical llama. The village prospered, and so too did the bond between Abuela Rosa and Llama Luna. As the years passed, their connection grew even stronger, and it seemed as if they shared a single soul.

One day, an elderly man named Don Pedro arrived at Pampas from a distant land. He carried with him a heavy heart, as he had lost his only daughter to a terrible illness many years ago. Having heard tales of Llama Luna's wisdom, Don Pedro sought solace and guidance in the company of Abuela Rosa and her magical companion.

As he shared his story with them, Llama Luna gazed deeply into Don Pedro's eyes, sensing the depth of his grief. Suddenly, he let out a piercing cry that echoed throughout the village. Abuela Rosa interpreted Llama Luna's message and explained to Don Pedro that his daughter had not truly left this world, but rather was reborn as a beautiful bird that would visit him each night.

Don Pedro, overcome with joy, wept tears of gratitude. From that day forward, he would eagerly await the arrival of his daughter's spirit each evening. And so, Llama Luna not only offered wisdom and guidance to the villagers but also brought comfort and solace to those in need of healing.

As time continued to pass, Abuela Rosa grew frail with age. One morning, as she awoke to find Llama Luna lying beside her lifeless, tears filled her eyes. Overcome with sorrow, she realized that her faithful companion was no longer there to bring her comfort. Yet, as the sun began to rise and bathe the world in its golden light, a magnificent bird perched on her window sill. The beautiful creature gazed at Abuela Rosa with love and understanding.

And thus, the legend of Llama Luna lives on, reminding all who hear it that even in the face of loss, there is always hope and comfort to be found, for life is an ever-revolving cycle of beginnings and endings.

The End

For more stories check out:

More stories by James E. Gould:

- The Llama Luna Collection

More short stories by Christian W. Muller:

- The Last Light
- The Raven's Eye View
- A Chance Encounter

... and keeps going on forever

My apologies. But the chat template might need to be exactly the same as the Openchat 3.5 format (https://huggingface.co/openchat/openchat_3.5), which is "GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:". Otherwise the performance will drop greatly. I'll update the model card on my end as well.

I also tried this question on chat.lmsys.org. It seems that the model is very serious about writing a story and generate a long one for me as well. But at least overall the generation looks good.

  • A lower temperature might help the model better terminate as well. Our Alpaca Eval score with temperature 0 seems to be slightly better than temperature 0.7.

Can someone share the chat template in json format? I want to use LM-Studio as well. I don't have the Openchat preset

This works fine now with llama.cpp other than never stopping as reported. I use this to initialize it. main -m starling-lm-7b-alpha.Q8_0.gguf --color --threads 6 --batch_size 256 --n_predict -1 --top_k 12 --top_p 1 --temp 0.0 --repeat_penalty 1.05 --ctx_size 2048 --instruct --reverse-prompt "### Human:"

@ahmetab06 sorry I don't have too much experience with LM-Studio. Would this commit be anything relevant? https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/commit/d240e2a80c2e6e0ee569dedc4b98e9008d64701d

@jeffwadsworth Is this "### Human" the chat template you use? The default chat template might be necessary: "GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:". You can test it on chat.lmsys.org to see if there is any difference between the output there and in llama.cpp. Thx!

@banghua , the model never stop, I think there is an issue with it.

@eramax do you have a prompt to reproduce this phenomenon? I tried the prompt "write a story about llamas" at here https://chat.lmsys.org/ and it seems to stop normally. Not sure if that's related to chat template or quantization of the model.

I've seen the never-stopping issue with this model as well.

@banghua I am having the same no-stop issue in oobabooga, with most prompts, but your "write a story about llamas" did indeed have an ending where it stopped, but maybe it had something to do with it having generated a "The End."?

Need a preset for LM-Studio

Yes - just not sure where to find this correct preset.

@banghua I don't have any specific template, I just enter that line after grabbing the newest llama.cpp from https://github.com/ggerganov/llama.cpp

Here is a prompt that caused it to stop writing incessantly. No idea why it chose to do so.

Would it be normal behavior for Sally to put the apple on a necklace and wear it around her neck as a symbol of pride?

It would not be considered normal behavior for Sally to put the apple on a necklace and wear it around her neck as a symbol of pride. While some people may choose to wear unique or unconventional accessories, wearing an apple necklace is not a common practice in most cultures.

Analysis:

In this scenario, Sally is considering wearing an apple around her neck as a symbol of pride. While it is not a typical fashion choice, people have varying preferences and styles when it comes to personal expression. However, it is important to note that wearing an apple necklace may not be practical or comfortable, as apples can rot or attract insects if left exposed for an extended period.

Instruction:

@jeffwadsworth because you are not using the recommended prompt format.

Sign up or log in to comment