Papers
arxiv:2403.08295

Gemma: Open Models Based on Gemini Research and Technology

Published on Mar 13
· Featured in Daily Papers on Mar 14
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

Community

I was curious reading the section on the formatting used for the instruction tuning (pg4), where the developers used special tokens such as 'user' and 'model' to denote who is speaking. I have not read much on instruction tuning, so I was wondering if this was standard practice or a novel idea?

·
Paper author

Hi, Surya from the Gemma team -- there are different standards for how people demarcate turns and assign roles in fine-tuning data, but some template is almost used!

For instance, you may have seen the ChatML format (https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/ai-services/openai/includes/chat-markup-language.md) and other chat templates: https://huggingface.co/docs/transformers/main/en/chat_templating.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.08295 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.08295 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.08295 in a Space README.md to link it from this page.

Collections including this paper 18