avnishkr
/

falcon-7b-QueAns

open source llms

fine tuning llms

Model card Files Files and versions Community

falcon-7b-QueAns / README.md

avnishkr's picture

Update README.md

57afa4d about 1 year ago

|

2.73 kB

	---
	library_name: peft
	datasets:
	- squad
	language:
	- en
	tags:
	- llms
	- falcon-7b
	- open source llms
	- fine tuning llms
	- QLoRA
	- PEFT
	- LoRA
	---

	Open source falcon 7b large language model fine tuned on SQuAD dataset for question and answering.

	QLoRA technique used for fine tuning the model on consumer grade GPU
	SFTTrainer is also used.

	Dataset used: SQuAD
	Dataset Size: 87278
	Training Steps: 500



	# 🚀 Falcon-7b-chat-oasst1

	Falcon-7b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset. This repo only includes the LoRA adapters from fine-tuning with 🤗's [peft](https://github.com/huggingface/peft) package.

	## Model Summary

	- Model Type: Causal decoder-only
	- Language(s): English
	- Base Model: [Falcon-7B](https://huggingface.co/tiiuae/falcon-7b) (License: [Apache 2.0](https://huggingface.co/tiiuae/falcon-7b#license))
	- Dataset: [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) (License: [Apache 2.0](https://huggingface.co/datasets/OpenAssistant/oasst1/blob/main/LICENSE))
	- License(s): Apache 2.0 inherited from "Base Model" and "Dataset"

	## Model Details

	The model was fine-tuned in 8-bit precision using 🤗 `peft` adapters, `transformers`, and `bitsandbytes`. Training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. The run took approximately 6.25 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory. See attached [Colab Notebook](https://huggingface.co/dfurman/falcon-7b-chat-oasst1/blob/main/finetune_falcon7b_oasst1_with_bnb_peft.ipynb) for the code and hyperparams used to train the model.

	### Model Date

	May 30, 2023


	## Training procedure


	The following `bitsandbytes` quantization config was used during training:
	- load_in_8bit: True
	- load_in_4bit: False
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: False
	- bnb_4bit_compute_dtype: float16

	The following `bitsandbytes` quantization config was used during training:
	- load_in_8bit: True
	- load_in_4bit: False
	- llm_int8_threshold: 6.0
	- llm_int8_skip_modules: None
	- llm_int8_enable_fp32_cpu_offload: False
	- llm_int8_has_fp16_weight: False
	- bnb_4bit_quant_type: nf4
	- bnb_4bit_use_double_quant: False
	- bnb_4bit_compute_dtype: float16
	### Framework versions

	- PEFT 0.4.0.dev0

	- PEFT 0.4.0.dev0