metadata
library_name: transformers
license: apache-2.0
basemodel: google/gemma-7b
Model Card for Firefly-Gemma
gemma-7B-it-firefly is trained based on gemma-7b-it to act as a helpful and harmless AI assistant. We use Firefly to train the model with LoRA.
![](/yys/gemma-7B-it-firefly/resolve/main/open_llm_leaderboard.png)
We advise you to install transformers>=4.38.2.
Performance
We evaluate our models on Open LLM Leaderboard, they achieve good performance.
Usage
The chat template of our chat models is similar as Official gemma-7b-it:
<bos><start_of_turn>user
hello, who are you?<end_of_turn>
<start_of_turn>model
I am a AI program developed by Firefly<eos>
You can use script to inference in Firefly.
You can also use the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name_or_path = "yys/gemma-7B-it-firefly"
model = AutoModelForCausalLM.from_pretrained(
model_name_or_path,
trust_remote_code=True,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
prompt = "Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions. "
text = f"""
<bos><start_of_turn>user
{prompt}<end_of_turn>
<start_of_turn>model
""".strip()
model_inputs = tokenizer([text], return_tensors="pt").to('cuda')
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=1500,
top_p = 0.9,
temperature = 0.35,
repetition_penalty = 1.0,
eos_token_id=tokenizer.encode('<eos>', add_special_tokens=False)
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)