arxiv:2403.08295

Gemma: Open Models Based on Gemini Research and Technology

Published on Mar 13

· Submitted by

akhaliq on Mar 14

#2 Paper of the day

Upvote

Authors:

Surya Bhupatiraju ,

Laurent Sifre ,

Mihir Sanjay Kale ,

Pouya Tafti ,

Aakanksha Chowdhery ,

Aditya Barua ,

Abstract

This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Gemma outperforms similarly sized open models on 11 out of 18 text-based tasks, and we present comprehensive evaluations of safety and responsibility aspects of the models, alongside a detailed description of model development. We believe the responsible release of LLMs is critical for improving the safety of frontier models, and for enabling the next wave of LLM innovations.

View arXiv page View PDF Add to collection

Community

Excession2501

Mar 14

•

edited Mar 14

I was curious reading the section on the formatting used for the instruction tuning (pg4), where the developers used special tokens such as 'user' and 'model' to denote who is speaking. I have not read much on instruction tuning, so I was wondering if this was standard practice or a novel idea?

suryabhupa

Paper author Mar 14

Hi, Surya from the Gemma team -- there are different standards for how people demarcate turns and assign roles in fine-tuning data, but some template is almost used!

For instance, you may have seen the ChatML format (https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/ai-services/openai/includes/chat-markup-language.md) and other chat templates: https://huggingface.co/docs/transformers/main/en/chat_templating.