Hey guys

#25
by zipingl - opened

I'm not sure why the default template seems a bit weak, but I've gotten better results using my own. I'm no AI expert or elite AI hacker, so I use text-generation-webui to do anything with text generation, and its latest update supports the newest version of transformers. This includes the generation methods needed for Gemma-2 (or gemma2 as it's known in transformers). The provided chat template also incorrectly uses new line characters such as \n in that since its a json string, if you need to add new line characters that are to be actually see as new line chars, and not an actual new line, you have to do \\n.

  • You can find the provided template in tokenizer_config.json at the field chat_template and notice how for the last two new line characters in the template, the gemma team forgot to escape it properly. So that alone will cause you to get some odd responses. I've provided my own revised version below, along with other changes I've made or ensured I've made when running this model. These changes have been helpful for me, at least as a non-elite AI, and address aspects beyond inference, such as fine-tuning.

Instruction/Chat Template

I do recommend that you at least edit or create your own instruction template, as the default one doesn't work very well. The writing of the default one either suggests the author(s) didn't have much experience with YAML configuration generators or, in a more alarming case, how the model is reliant on very specific delimiters.

  • You'll definitely get more useful responses if you experiment with the instruction template.
  • It's unclear why the instruction template seems to be such a sticking point, especially since Meta-3 isn't really dependent on its intended template.
  • However, if you use anything other than the "special characters" that separate the text content (e.g., <start_of_turn>), you'll get gibberish. That's concerning, because large language models and text-to-text generation, in general, should be able to understand any kind of language without relying on specific templates.

I've found this template below, which is a quick revision of the default one, to be at least somewhat better in responses (especially if you utilize system messages as well, not just user and model (aka assistant) messages.

In yaml:

instruction_template: |-
  {%- set ns = namespace(found=false) -%}
  {%- for message in messages -%}
      {%- if message['role'] == 'system' -%}
          {%- set ns.found = true -%}
      {%- endif -%}
  {%- endfor -%}
  {{- '<bos>' -}}
  {% for message in messages %}
  {% if loop.first and message['role'] == 'system' %}
      {% set role = 'user' %}
  {% elif message['role'] == 'assistant' %}
      {% set role = 'model' %}
  {% else %}
      {% set role = message['role'] %}
  {% endif %}
  {{- '<start_of_turn>' + role }}
  {{- '\n' + message['content'].rstrip() -}}
  {{- '<end_of_turn>\n' -}}
  {% if role == 'user' %}
      {{- '' -}}
  {% endif %}
  {% endfor %}

Or as a JSON string:

  "chat_template": "{%- set ns = namespace(found=false) -%}{%- for message in messages -%}{%- if message['role'] == 'system' -%}{%- set ns.found = true -%}{%- endif -%}{%- endfor -%}{{- '<bos>' -}}{% for message in messages %}{% if loop.first and message['role'] == 'system' %}{% set role = 'user' %}{% elif message['role'] == 'assistant' %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{- '<start_of_turn>' + role }}{{- '\\n' + message['content'].rstrip() -}}{{- '<end_of_turn>\\n' -}}{% if role == 'user' %}{{- '' -}}{% endif %}{% endfor %}",

Generational Parameters

In terms of additional configurations, if you take a close look at the model's tokenizer_config.json file, you will also see that it has additional settings that you should ensure are the same when running the model within text-generational-webui or whatever type of program or code you so choose. Namely, here are some key differences that you might need to ensure are configured properly from Llama like models:

  • skip_special_tokens, should be false as per the showcase of tokenizer_config.json usage of special tokens (or a lot of them for some odd reason.

Play around with these variables, and to see if things may differ for you. Understanding the tokenizer_config.json file is definitely the most important step in running models that are especially different than what you are used to, correctly. __For instance, you will see that it is in the tokenizer_config.json file that the chat_template is also defined, so be sure to change that to the one above or your own. For instance, to replace chat_template with the one above, set chat_template to the following below in tokenizer_config.json:

  "chat_template": "{%- set ns = namespace(found=false) -%}{%- for message in messages -%}{%- if message['role'] == 'system' -%}{%- set ns.found = true -%}{%- endif -%}{%- endfor -%}{{- '<bos>' -}}{% for message in messages %}{% if loop.first and message['role'] == 'system' %}{% set role = 'user' %}{% elif message['role'] == 'assistant' %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{- '<start_of_turn>' + role }}{{- '\\n' + message['content'].rstrip() -}}{{- '<end_of_turn>\\n' -}}{% if role == 'user' %}{{- '' -}}{% endif %}{% endfor %}",

Fine Tuning

One caveat that you might consider is that if you wish to fine-tune this model, by default text-generational-webui does not have the added code to handle Gemma2 as its a different kind of model, so in the training.py file, edit the def list_target_modules function to handle gemma2 as a unique case for now, such as like this:

 def list_target_modules(model_id):
        if model_id =='gemma2':
            logger.warning(f"Gemma 2 not supported in model_to_lora modules, so using default modules q_proj and k_proj")
            return ["q_proj", "k_proj", "v_proj", "o_proj"]
        ## ... rest of the function as normal below...

Quirks in character output: the odd additions of in responses

You will notice that responses from gemma-2 will have sometimes these characters: ▁▁ as part of the response, usually added to the beginning of a sentence or paragraph (and end) for some odd reason. They are not really that much of an issue other from a cosmetic stand point. I did ask Gemma why it does that and here is its response:

Question: Why do you output these characters as part of your responses? "▁▁"?

Gemma-2-9b-it's Answer: You're right to point that out! Those extra spaces, or " ▁▁", are sometimes included in my responses, and it's a quirk I'm still working on ironing out. They likely appear because of the way I was trained.
▁▁Large language models like me learn by identifying patterns in massive amounts of text data. Sometimes, these patterns include extra spaces being used for stylistic reasons or to create visual separation. ▁▁
While they don't affect the meaning of my responses, I understand they can be visually distracting. The Gemma team is aware of this and is working on refining my training to minimize these kinds of inconsistencies.

So that means if you do see it, do not bother trying to fix it or think its due to something you did wrong, this is just a "quirk" that apparently Google thinks is okay to release its second iteration of Gemma with. _Notice that in the response above, you can actually see that part of the response suffers from the added ▁▁ starting with the sentence "Large language models..." The good thing here is Gemma-2 is also able to understand usage of ▁▁ as actual content, and not just tokenization characters, which does align with Gemma-2's statement of how this quirk shouldn't affect responses but is rather more of a cosmetic or "visual" issue.

  • Additional things to now consider
    • In the best case: This would mean that Google's dataset used to train Gemma was improperly formatted, perhaps even loaded in the wrong charset format.
    • In the worst case: this would indicate that Google has no idea how to properly create a dataset to train models with, as the additional added characters such as is the consequence given that the intended dataset google "created" and trained the file with, is not at all what the training session interpreted the data as, e.g., hence, the additional "quirks" that Google has yet to solve.

Either way, these quirks are not acceptable at the level of the kind of release this model is and from (e.g. an organization as large as Google). This is just my honest opinion here and actually lowers the standard of open source models due to Google's lack of urgency in regarding this quirk as a non-issue, given the aforementioned two cases mentioned above.

  • I'm not suprised, as Vertex AI isn't really that great as well and is constantly with its own quirks. Not sure why Google is having so much trouble developing models that don't have small issues such as what has been written above, especially if ideally, those that work on these models on the Gemma team are comprised of individuals that at least has as much knowledge of LLMs as I do (and that's not at all saying much as I have already stated I am not an elite AI hacker.

Model vs Assistant

I've used Vertex AI extensively and noticed some differences, such as Google's implementation of the "window" and "cmd" keys found in Windows and MacBooks. The role for the assistant in Google's models is actually called "model" and not "assistant." If you decide to use "assistant," you will not have a very good time with responses, and it will actually be rejected as a valid request when using Google's paid API with any of its models. Google's models also tend to blur the lines between the role system and the user, meaning that its models are not able to regard the system as anything more than a user with a fancy title. This is possibly due to how Google's models have not been trained to a point of actually having a form of instruct variant. This lack of instructive capability raises questions about the model's effectiveness in handling system messages. As you can see, the default template simply drops the system message altogether if found, from the default chat template (albeit it is a chat template and not a chat-instruct template) seen in tokenizer_config.json.

Conclusion

Overall, this model is pretty shit (I apologize for my bluntness). It's not well-trained and doesn't seem to have a specific intended use, unlike models like Meta's Meta-3, which is more user-friendly, requires less setup, and boasts robust capabilities. I'm not sure what this release was supposed to be, but it's been a complete waste of my time. I'll be refraining from using any further releases from Google. I will continue using Vertex AI for now, but I'm still finding it inferior to GPT-3 (although I appreciate the free and unlimited access cuz I'm probably the only one right now using it for serious applications and providing them good QA or whatever, because their models are glorified GPT-3 variants. Seriously, Google's best model, Gemini Pro 1.5 Flash (or whatever it's called), is maybe only half as good as GPT-3. Half of the time it misunderstands and gives pretty mediocre responses, even if they're technically correct. Other times, it responds as if certain things are true when they're not. I think people are calling that "hallucination."

Update July 17th: To my surprise, after some configuration and background reading about transformers in general and how they work, I now have a much better grasp of running Gemma-2, and I'm amazed by the results. Comparing it to the most popular open-source transformer model, Meta-3-8b-instruct, the difference between Gemma-2-9b-instruct and Meta-3-8b-instruct is simply beyond anything I had imagined, expected, or even remotely considered possible.

Despite the seemingly small size difference from 8B to 9B (considering I've worked with models up to 13B for fine-tuning), the impact is significant and undeniable. It's almost like comparing GPT-4 to GPT-3—a leap forward of even greater magnitude. Gemma-2 makes Meta-3-8b-instruct seem outdated by comparison. Wow.
— And to me, it's as good as, if not better than, GPT-4-0. ▁▁In some cases, I even feel like I'm not interacting with the real GPT-4-0 at times, just based on my experience using ChatGPT and seeing the widely varying quality of responses GPT-4-0 has shown.

Yeah, but the odd additions of those spaces ▁▁ in Gemma-2's responses are a bit annoying. Gemma-2 seems to regard them as just whitespace, which is a bit odd but probably a band-aid fix to cover up some unusual dataset formatting issue. They haven't really caused any issues so far with affecting the actual content or text being understood by the model. (It can obviously be confusing for a user who might regard them as some sort of punctuation.) But most likely, given how well Gemma-2 has done with my tasks, it's probably just a harmless cosmetic issue. It was probably a mistake in not reading a file with the correct encoding type. Oh well. - You can simply replace them with nothing—either with only the text content displayed to the user or also with the text content that has added special characters and tokens in the format the model uses to interpret aspects of the text, for example, with added delimiters, special tokens, and templated special characters like ‘' and so forth. I've done that in both cases, and it hasn't caused any other side effects or changed the responses generated.

I made sure to correct any grammatical issues while maintaining the original tone and style. Let me know if there's anything else you need!

Gemma-2, how many times do I have to tell you to only respond with the revised text and not add any comments? I'm leaving this in here. Smh... And so this is why system messages are necessary. Not sure why this was intentionally left out of Gemma-2. For better response control, having an additional role allows for an added dimensionality in then specifying how things should be done at an even higher level than a user's query.... That's also kind of what transformers is based on, having different projections.. Nvm..

Anyways if you want to try out my fine-tuned variant of gemma-2 that is intended to be an AI that is me, check it out at here

Sign up or log in to comment