metadata

license: mit
datasets:
  - mlabonne/guanaco-llama2-1k
language:
  - en
metrics:
  - bleu
library_name: adapter-transformers
tags:
  - chemistry
  - biology
  - finance
  - legal
  - music
  - art
  - code
  - climate
  - medical
  - text-generation-inference

Deployed Model

AjayMukundS/Llama-2-7b-chat-finetune

Model Description

This is a Llama 2 Fine Tuned Model with 7 Billion Parameters on the Dataset from mlabonne/guanaco-llama2. The training data is basically a Chat between a Human and an Assistant where the Human poses some queries and the Assistant responds to those queries in a suitable fashion. In the case of Llama 2, the following Chat Template is used for the chat models:

[INST] SYSTEM PROMPT

User Prompt [/INST] Model Answer

System Prompt (optional) --> to guide the model

User prompt (required) --> to give the instruction / User Query

Model Answer (required)

Training Data

The Instruction Dataset is reformated to follow the above Llama 2 template.

Original Dataset --> https://huggingface.co/datasets/timdettmers/openassistant-guanaco\

Reformated Dataset with 1K Samples --> https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k

Complete Reformated Datset --> https://huggingface.co/datasets/mlabonne/guanaco-llama2

To drastically reduce the VRAM usage, we must fine-tune the model in 4-bit precision, which is why we’ll use QLoRA here and the GPU on which the model was fined tuned on was L4 (Google Colab Pro)

Process

Load the dataset as defined.
Configure bitsandbytes for 4-bit quantization.
Load the Llama 2 model in 4-bit precision on a GPU (L4 - Google Colab Pro) with the corresponding tokenizer.
Loading configurations for QLoRA, regular training parameters, and pass everything to the SFTTrainer.
Fine Tuning Starts...