Model Card for Model ID

This model card corresponds to the 7B instruct finetuned version of the Gemma model.

Model Details

This is a general question-answer model finetuned on the web_questions dataset.

Model Description

This is a general question-answer LLM finetuned using Gemma on top of web_questions dataset. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants. Gemma models are well-suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as a laptop, desktop or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Developed by: Geerath Bhat
Model type: Fine-tuned Instruct LLM.
Language(s) (NLP): English
License: No
Finetuned from model: [google/gemma-7b-it]

Usage

Google/Gemma has shared some code snippets on how to get quickly started with running the model. First make sure to pip install -U transformers, then copy the snippet from the section that is relevant for your usecase.

hf_model_repo = Geerath/google-gemma-7b-it-finetuned-web-questions

# Get the tokenizer
tokenizer = AutoTokenizer.from_pretrained(hf_model_repo)

# Load the model


model = AutoModelForCausalLM.from_pretrained(hf_model_repo,
                                             quantization_config=bnb_config,
                                             device_map="auto")

prompt = ["Question: Tell me something about IISc\n\nAnswer:\n"]

# Generate response
%%time
input_ids = tokenizer(prompt, return_tensors="pt", truncation=True).input_ids
outputs = model.generate(input_ids=input_ids,
                         max_new_tokens=200,
                         do_sample = True,
                         temperature=0.2)

result = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]

result = "Question:"+result.split("Question:")[1]

Print the result

print(f"Generated response:\n{result}")

Fine-tuning the model

You can find fine-tuning scripts and notebook under the examples/ directory of google/gemma-7b repository. To adapt it to this model, simply change the model-id to google/gemma-7b-it. In that repository, we provide:

A script to perform Supervised Fine-Tuning (SFT) on UltraChat dataset using QLoRA
A script to perform SFT using FSDP on TPU devices
A notebook that you can run on a free-tier Google Colab instance to perform SFT on English quotes dataset

How to Get Started with the Model

Use the code provided by google/gemma-7b-it to get started with this finetuned model.

Training Details

Training Data

web_questions

Training Procedure

Trained using SFTTrainer and below are the TrainingArguments.

num_train_epochs=1, # adjust based on the data size
per_device_train_batch_size=4, # use 2 or 4 if you have less GPU RAM
per_device_eval_batch_size=4,
optim="paged_adamw_32bit",
#gradient_accumulation_steps=2,
save_strategy="epoch", 
evaluation_strategy="epoch",
learning_rate=2e-4,
logging_steps=1,
fp16=True,
weight_decay=0.01,
lr_scheduler_type="cosine",
seed=42,

Evaluation

Evaluated on test set of the web_questions dataset.

Testing Data

Currently tested on test set of web_questions dataset and will update soon the testing results with respect to other datasets. Thank you!!!

Metrics

Perplexity Accuracy F1 Score

Results

After 2 epochs the training loss was 1.114500 and validation loss was 1.592121.

Perplexity on test data from web_questions dataset: 5.13

Geerath
/

google-gemma-7b-it-finetuned-web-questions