Edit model card

Model Card for vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF

This model is a fine-tuned version of Llama-2-Chat-7b on company-specific question-answers data. It is designed for efficient performance while maintaining high-quality output, suitable for conversational AI applications.

Model Details

It was finetuned using QLORA and PEFT. After fine-tuning, the adapters were merged with the base model and then quantized to GGUF.

Model Sources

Uses

This model is optimized for direct use in conversational AI, particularly for generating responses based on company-specific data. It can be utilized effectively in customer service bots, FAQ bots, and other applications where accurate and contextually relevant answers are required.

Usage notebook

https://colab.research.google.com/drive/1885wYoXeRjVjJzHqL9YXJr5ZjUQOSI-w?authuser=4#scrollTo=TZIoajzYYkrg

Example with ctransformers:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

llm = AutoModelForCausalLM.from_pretrained("vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF", model_file="finetuned.gguf", model_type="llama", gpu_layers = 50, max_new_tokens = 2000, temperature = 0.2, top_k = 40, top_p = 0.6, context_length = 6000)

system_prompt = '''<<SYS>>
You are a useful bot
<</SYS>>

user_prompt = "Tell me about your company"

Combine system prompt with user prompt

full_prompt = f"{system_prompt}\n[INST]{user_prompt}[/INST]"

Generate the response

response = llm(full_prompt)

Print the response

print(response)

Downloads last month
172
GGUF
Model size
6.74B params
Architecture
llama