Model Card for vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF

This model is a fine-tuned version of Llama-2-Chat-7b on company-specific question-answers data. It is designed for efficient performance while maintaining high-quality output, suitable for conversational AI applications.

Full Tutorial on Cheap Finetuning

https://github.com/VishanOberoi/FineTuningForTheGPUPoor?tab=readme-ov-file

Model Details

It was finetuned using QLORA and PEFT. After fine-tuning, the adapters were merged with the base model and then quantized to GGUF.

Model Sources

Uses

This model is optimized for direct use in conversational AI, particularly for generating responses based on company-specific data. It can be utilized effectively in customer service bots, FAQ bots, and other applications where accurate and contextually relevant answers are required.

Example with ctransformers:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

llm = AutoModelForCausalLM.from_pretrained("vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF", model_file="finetuned.gguf", model_type="llama", gpu_layers = 50, max_new_tokens = 2000, temperature = 0.2, top_k = 40, top_p = 0.6, context_length = 6000)

system_prompt = "<<SYS>>You are a useful bot... <</SYS>>"

user_prompt = "Tell me about your company"

Combine system prompt with user prompt

full_prompt = f"{system_prompt}\n[INST]{user_prompt}[/INST]"

Generate the response

response = llm(full_prompt)

Print the response

print(response)
Downloads last month
16
GGUF
Model size
6.74B params
Architecture
llama
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.