--- library_name: peft license: apache-2.0 datasets: - Yasbok/Alpaca_arabic_instruct language: - ar pipeline_tag: text-generation --- # 🚀 Falcon-7b-QLoRA-alpaca-arabic This repo contains a low-rank adapter for Falcon-7b fit on the Stanford Alpaca dataset Arabic version [Yasbok/Alpaca_arabic_instruct](https://huggingface.co/datasets/Yasbok/Alpaca_arabic_instruct). ## Model Summary - **Model Type:** Causal decoder-only - **Language(s):** Arabic - **Base Model:** [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) (License: [Apache 2.0](https://huggingface.co/tiiuae/falcon-7b#license)) - **Dataset:** [Yasbok/Alpaca_arabic_instruct](https://huggingface.co/datasets/Yasbok/Alpaca_arabic_instruct) - **License(s):** Apache 2.0 inherited from "Base Model" ## Model Details The model was fine-tuned in 8-bit precision using 🤗 `peft` adapters, `transformers`, and `bitsandbytes`. Training relied on a method called QLoRA introduced in this [paper](https://arxiv.org/abs/2305.14314). The run took approximately 3 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory. ### Model Date June 10, 2023 ### Recommendations We recommend users of this model to develop guardrails and to take appropriate precautions for any production use. ## How to Get Started with the Model ### Setup ```python # Install packages !pip install -q -U bitsandbytes loralib einops !pip install -q -U git+https://github.com/huggingface/transformers.git !pip install -q -U git+https://github.com/huggingface/peft.git !pip install -q -U git+https://github.com/huggingface/accelerate.git ``` ### GPU Inference in 8-bit This requires a GPU with at least 12 GB of memory. ### First, Load the Model ```python import torch from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer # load the model peft_model_id = "Ali-C137/falcon-7b-chat-alpaca-arabic" config = PeftConfig.from_pretrained(peft_model_id) model = AutoModelForCausalLM.from_pretrained( config.base_model_name_or_path, return_dict=True, device_map={"":0}, trust_remote_code=True, load_in_8bit=True, ) tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path) tokenizer.pad_token = tokenizer.eos_token model = PeftModel.from_pretrained(model, peft_model_id) ``` ### CUDA Info - CUDA Version: 12.0 - Hardware: 1 A100-SXM - Max Memory: {0: "37GB"} - Device Map: {"": 0} ### Package Versions Employed - `torch`: 2.0.1+cu118 - `transformers`: 4.30.0.dev0 - `peft`: 0.4.0.dev0 - `accelerate`: 0.19.0 - `bitsandbytes`: 0.39.0 - `einops`: 0.6.1 ### This work is highly inspired from [Daniel Furman](https://huggingface.co/dfurman)'s work, so Thanks a lot Daniel