vishanoberoi
/

Llama-2-7b-chat-hf-finedtuned-to-GGUF

Question Answering

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-2-7b-chat-hf-finedtuned-to-GGUF / README.md

vishanoberoi's picture

Update README.md

f2e27f7 verified 5 months ago

|

No virus

2.15 kB

	# Model Card for vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF

	This model is a fine-tuned version of Llama-2-Chat-7b on company-specific question-answers data. It is designed for efficient performance while maintaining high-quality output, suitable for conversational AI applications.

	## Model Details
	It was fined using QLORA and PEFT. After fine-tuning, the adapters were merged with the base model and then quantized to GGUF.
	- Developed by: Vishan Oberoi and Dev Chandan.
	- Model type: Transformer-based Large Language Model
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf

	### Model Sources

	- Repository: [vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF](https://huggingface.co/vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF)
	- Links:
	- LLaMA: [LLaMA Paper](https://arxiv.org/abs/2302.13971)
	- QLORA: [QLORA Paper](https://arxiv.org/abs/2305.14314)
	- llama.cpp: [llama.cpp Paper/Documentation](https://github.com/ggerganov/llama.cpp)

	## Uses


	This model is optimized for direct use in conversational AI, particularly for generating responses based on company-specific data. It can be utilized effectively in customer service bots, FAQ bots, and other applications where accurate and contextually relevant answers are required.
	## Usage notebook
	https://colab.research.google.com/drive/1885wYoXeRjVjJzHqL9YXJr5ZjUQOSI-w?authuser=4#scrollTo=TZIoajzYYkrg

	#### Example with `ctransformers`:

	```python
	from ctransformers import AutoModelForCausalLM, AutoTokenizer

	llm = AutoModelForCausalLM.from_pretrained("vishanoberoi/Llama-2-7b-chat-hf-finedtuned-to-GGUF", model_file="finetuned.gguf", model_type="llama", gpu_layers = 50, max_new_tokens = 2000, temperature = 0.2, top_k = 40, top_p = 0.6, context_length = 6000)

	system_prompt = '''<<SYS>>
	You are a useful bot
	<</SYS>>

	'''

	user_prompt = "Tell me about your company"

	# Combine system prompt with user prompt
	full_prompt = f"{system_prompt}\n[INST]{user_prompt}[/INST]"

	# Generate the response
	response = llm(full_prompt)

	# Print the response
	print(response)