Sandiago21
/

falcon-7b-prompt-answering

@@ -13,9 +13,9 @@ tags:
 ## Model Card for Model ID
-This repository contains a LLaMA-7B further fine-tuned model on conversations and question answering prompts.
-**I used falcon-7b (https://huggingface.co/tiiuae/falcon-7b) as a base model, so this model is for Research purpose only (See the [license](https://huggingface.co/tiiuae/falcon-7b/blob/main/LICENSE))**
 ## Model Details
@@ -34,7 +34,7 @@ The tiiuae/falcon-7b model was finetuned on conversations and question answering
 **Language(s) (NLP):** English, multilingual
-**License:** Research
 **Finetuned from model:** tiiuae/falcon-7b
@@ -72,24 +72,11 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 The model was trained on the following kind of prompt:
 ```python
-def generate_prompt(instruction: str, input_ctxt: str = None) -> str:
-    if input_ctxt:
-        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
-### Instruction:
-{instruction}
-### Input:
-{input_ctxt}
-### Response:"""
-    else:
-        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
-### Instruction:
-{instruction}
-### Response:"""
 ```
 ## How to Get Started with the Model
@@ -101,20 +88,27 @@ Use the code below to get started with the model.
 ```python
 import torch
 from peft import PeftConfig, PeftModel
-from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
-MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
-config = PeftConfig.from_pretrained(MODEL_NAME)
-model = LlamaForCausalLM.from_pretrained(
     config.base_model_name_or_path,
-    load_in_8bit=True,
-    torch_dtype=torch.float16,
     device_map="auto",
 )
-tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
 model = PeftModel.from_pretrained(model, MODEL_NAME)
@@ -133,10 +127,9 @@ if torch.__version__ >= "2":
 ### Example of Usage
 ```python
-instruction = "What is the capital city of Greece and with which countries does Greece border?"
-input_ctxt = None  # For some tasks, you can provide an input context to help the model generate a better response.
-prompt = generate_prompt(instruction, input_ctxt)
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids
 input_ids = input_ids.to(model.device)
@@ -159,21 +152,30 @@ print(response)
 ```python
 import torch
 from peft import PeftConfig, PeftModel
-from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
-MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
 BASE_MODEL = "tiiuae/falcon-7b"
-config = PeftConfig.from_pretrained(MODEL_NAME)
-model = LlamaForCausalLM.from_pretrained(
     BASE_MODEL,
-    load_in_8bit=True,
-    torch_dtype=torch.float16,
     device_map="auto",
 )
-tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
 model = PeftModel.from_pretrained(model, MODEL_NAME)
@@ -193,10 +195,9 @@ if torch.__version__ >= "2":
 ### Example of Usage
 ```python
-instruction = "What is the capital city of Greece and with which countries does Greece border?"
-input_ctxt = None  # For some tasks, you can provide an input context to help the model generate a better response.
-prompt = generate_prompt(instruction, input_ctxt)
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids
 input_ids = input_ids.to(model.device)

 ## Model Card for Model ID
+This repository contains further fine-tuned Falcon-7B model on conversations and question answering prompts.
+**I used falcon-7b (https://huggingface.co/tiiuae/falcon-7b) as a base model, so this model has the same license with Falcon-7b model (Apache-2.0)**
 ## Model Details
 **Language(s) (NLP):** English, multilingual
+**License:** Apache-2.0
 **Finetuned from model:** tiiuae/falcon-7b
 The model was trained on the following kind of prompt:
 ```python
+def generate_prompt(prompt: str) -> str:
+    return f"""
+    <human>: {prompt}
+    <assistant>:
+    """.strip()
 ```
 ## How to Get Started with the Model
 ```python
 import torch
 from peft import PeftConfig, PeftModel
+from transformers import GenerationConfig, AutoTokenizer, AutoModelForCausalLM
+MODEL_NAME = "Sandiago21/falcon-7b-prompt-answering"
+compute_dtype = getattr(torch, "float16")
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=compute_dtype,
+    bnb_4bit_use_double_quant=True,
+)
+model = AutoModelForCausalLM.from_pretrained(
     config.base_model_name_or_path,
+    quantization_config=bnb_config,
     device_map="auto",
+    trust_remote_code=True,
 )
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 model = PeftModel.from_pretrained(model, MODEL_NAME)
 ### Example of Usage
 ```python
+prompt = "What is the capital city of Greece and with which countries does Greece border?"
+prompt = generate_prompt(prompt)
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids
 input_ids = input_ids.to(model.device)
 ```python
 import torch
 from peft import PeftConfig, PeftModel
+from transformers import GenerationConfig, AutoTokenizer, AutoModelForCausalLM
+MODEL_NAME = "Sandiago21/falcon-7b-prompt-answering"
 BASE_MODEL = "tiiuae/falcon-7b"
+compute_dtype = getattr(torch, "float16")
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=compute_dtype,
+    bnb_4bit_use_double_quant=True,
+)
+MODEL_NAME = "Sandiago21/falcon-7b-prompt-answering"
+model = AutoModelForCausalLM.from_pretrained(
     BASE_MODEL,
+    quantization_config=bnb_config,
     device_map="auto",
+    trust_remote_code=True,
 )
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 model = PeftModel.from_pretrained(model, MODEL_NAME)
 ### Example of Usage
 ```python
+prompt = "What is the capital city of Greece and with which countries does Greece border?"
+prompt = generate_prompt(prompt)
 input_ids = tokenizer(prompt, return_tensors="pt").input_ids
 input_ids = input_ids.to(model.device)