--- library_name: transformers license: apache-2.0 basemodel: google/gemma-7b --- ## Model Card for Firefly-Gemma [gemma-7B-it-firefly](https://huggingface.co/yys/gemma-7B-it-firefly) is trained based on [gemma-7b-it](https://huggingface.co/google/gemma-7b-it) to act as a helpful and harmless AI assistant. We use [Firefly](https://github.com/yangjianxin1/Firefly) to train the model with LoRA. We advise you to install transformers>=4.38.2. ## Performance We evaluate our models on [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), they achieve good performance. ## Usage The chat template of our chat models is similar as Official gemma-7b-it: ```text user hello, who are you? model I am a AI program developed by Firefly ``` You can use script to inference in [Firefly](https://github.com/yangjianxin1/Firefly/blob/master/script/chat/chat.py). You can also use the following code: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_name_or_path = "yys/gemma-7B-it-firefly" model = AutoModelForCausalLM.from_pretrained( model_name_or_path, trust_remote_code=True, low_cpu_mem_usage=True, torch_dtype=torch.float16, device_map='auto', ) tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) prompt = "Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions. " text = f""" user {prompt} model """.strip() model_inputs = tokenizer([text], return_tensors="pt").to('cuda') generated_ids = model.generate( model_inputs.input_ids, max_new_tokens=1500, top_p = 0.9, temperature = 0.35, repetition_penalty = 1.0, eos_token_id=tokenizer.encode('', add_special_tokens=False) ) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ] response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(response) ```