Sandiago21
/

falcon-7b-prompt-answering

+---
+license: other
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- llama
+- decapoda-research-7b-hf
+- prompt answering
+- peft
+---
+## Model Card for Model ID
+This repository contains a LLaMA-7B further fine-tuned model on conversations and question answering prompts.
+⚠️ **I used falcon-7b (https://huggingface.co/tiiuae/falcon-7b) as a base model, so this model is for Research purpose only (See the [license](https://huggingface.co/tiiuae/falcon-7b/blob/main/LICENSE))**
+## Model Details
+Anyone can use (ask prompts) and play with the model using the pre-existing Jupyter Notebook in the **noteboooks** folder. The Jupyter Notebook contains example code to load the model and ask prompts to it as well as example prompts to get you started.
+### Model Description
+The tiiuae/falcon-7b model was finetuned on conversations and question answering prompts.
+**Developed by:** [More Information Needed]
+**Shared by:** [More Information Needed]
+**Model type:** Causal LM
+**Language(s) (NLP):** English, multilingual
+**License:** Research
+**Finetuned from model:** tiiuae/falcon-7b
+## Model Sources [optional]
+**Repository:** [More Information Needed]
+**Paper:** [More Information Needed]
+**Demo:** [More Information Needed]
+## Uses
+The model can be used for prompt answering
+### Direct Use
+The model can be used for prompt answering
+### Downstream Use
+Generating text and prompt answering
+## Recommendations
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+# Usage
+## Creating prompt
+The model was trained on the following kind of prompt:
+```python
+def generate_prompt(instruction: str, input_ctxt: str = None) -> str:
+    if input_ctxt:
+        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Input:
+{input_ctxt}
+### Response:"""
+    else:
+        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Response:"""
+```
+## How to Get Started with the Model
+Use the code below to get started with the model.
+1. You can git clone the repo, which contains also the artifacts for the base model for simplicity and completeness, and run the following code snippet to load the mode:
+```python
+import torch
+from peft import PeftConfig, PeftModel
+from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
+MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
+config = PeftConfig.from_pretrained(MODEL_NAME)
+model = LlamaForCausalLM.from_pretrained(
+    config.base_model_name_or_path,
+    load_in_8bit=True,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
+model = PeftModel.from_pretrained(model, MODEL_NAME)
+generation_config = GenerationConfig(
+    temperature=0.2,
+    top_p=0.75,
+    top_k=40,
+    num_beams=4,
+    max_new_tokens=32,
+)
+model.eval()
+if torch.__version__ >= "2":
+    model = torch.compile(model)
+```
+### Example of Usage
+```python
+instruction = "What is the capital city of Greece and with which countries does Greece border?"
+input_ctxt = None  # For some tasks, you can provide an input context to help the model generate a better response.
+prompt = generate_prompt(instruction, input_ctxt)
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids
+input_ids = input_ids.to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        input_ids=input_ids,
+        generation_config=generation_config,
+        return_dict_in_generate=True,
+        output_scores=True,
+    )
+response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
+print(response)
+>>> The capital city of Greece is Athens and it borders Turkey, Bulgaria, Macedonia, Albania, and the Aegean Sea.
+```
+2. You can also directly call the model from HuggingFace using the following code snippet:
+```python
+import torch
+from peft import PeftConfig, PeftModel
+from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM
+MODEL_NAME = "Sandiago21/llama-7b-hf-prompt-answering"
+BASE_MODEL = "tiiuae/falcon-7b"
+config = PeftConfig.from_pretrained(MODEL_NAME)
+model = LlamaForCausalLM.from_pretrained(
+    BASE_MODEL,
+    load_in_8bit=True,
+    torch_dtype=torch.float16,
+    device_map="auto",
+)
+tokenizer = LlamaTokenizer.from_pretrained(MODEL_NAME)
+model = PeftModel.from_pretrained(model, MODEL_NAME)
+generation_config = GenerationConfig(
+    temperature=0.2,
+    top_p=0.75,
+    top_k=40,
+    num_beams=4,
+    max_new_tokens=32,
+)
+model.eval()
+if torch.__version__ >= "2":
+    model = torch.compile(model)
+```
+### Example of Usage
+```python
+instruction = "What is the capital city of Greece and with which countries does Greece border?"
+input_ctxt = None  # For some tasks, you can provide an input context to help the model generate a better response.
+prompt = generate_prompt(instruction, input_ctxt)
+input_ids = tokenizer(prompt, return_tensors="pt").input_ids
+input_ids = input_ids.to(model.device)
+with torch.no_grad():
+    outputs = model.generate(
+        input_ids=input_ids,
+        generation_config=generation_config,
+        return_dict_in_generate=True,
+        output_scores=True,
+    )
+response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True)
+print(response)
+>>> The capital city of Greece is Athens and it borders Turkey, Bulgaria, Macedonia, Albania, and the Aegean Sea.
+```
+## Training Details
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 4
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 8
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 50
+- num_epochs: 2
+- mixed_precision_training: Native AMP
+### Framework versions
+- Transformers 4.28.1
+- Pytorch 2.0.0+cu117
+- Datasets 2.12.0
+- Tokenizers 0.12.1
+### Training Data
+The tiiuae/falcon-7b was finetuned on conversations and question answering data
+### Training Procedure
+The tiiuae/falcon-7b model was further trained and finetuned on question answering and prompts data for 1 epoch (approximately 10 hours of training on a single GPU)
+## Model Architecture and Objective
+The model is based on tiiuae/falcon-7b model and finetuned adapters on top of the main model on conversations and question answering data.