vishanoberoi's picture
Update README.md
bce7ec0 verified
metadata
language:
  - en
library_name: transformers
pipeline_tag: question-answering
tags:
  - Finetuning

Model Card for vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF

This model is a fine-tuned version of Llama-2-Chat-7b on company-specific question-answers data. It is designed for efficient performance while maintaining high-quality output, suitable for conversational AI applications.

Full Tutorial on Cheap Finetuning

https://github.com/VishanOberoi/FineTuningForTheGPUPoor?tab=readme-ov-file

Model Details

It was finetuned using QLORA and PEFT. After fine-tuning, the adapters were merged with the base model and then quantized to GGUF.

Model Sources

Uses

This model is optimized for direct use in conversational AI, particularly for generating responses based on company-specific data. It can be utilized effectively in customer service bots, FAQ bots, and other applications where accurate and contextually relevant answers are required.

Example with ctransformers:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

llm = AutoModelForCausalLM.from_pretrained("vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF", model_file="finetuned.gguf", model_type="llama", gpu_layers = 50, max_new_tokens = 2000, temperature = 0.2, top_k = 40, top_p = 0.6, context_length = 6000)

system_prompt = "<<SYS>>You are a useful bot... <</SYS>>"

user_prompt = "Tell me about your company"

Combine system prompt with user prompt

full_prompt = f"{system_prompt}\n[INST]{user_prompt}[/INST]"

Generate the response

response = llm(full_prompt)

Print the response

print(response)