run in vs code

#18
by ArunRaj000 - opened

hi , i just succesfully run, mistral 7b instruct gguf file using ctransformer, but the ai is not responding according to user input

how to solve this

Can you share how did you ran it in VS CODE?

downloaded the gguf files, and used ctrasnformer to run these code, which is really computational effective, but it takes more time print the output
instead of real time printing

Even I tried ctransfromers but it is giving segmentation fault, afterwards I tried with llama-cpp-python it worked. Can you share how you did.
from ctransformers import AutoModelForCausalLM
import gradio as gr

llm = AutoModelForCausalLM.from_pretrained(
"TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
model_file="mistral-7b-instruct-v0.2.Q4_K_M.gguf",
model_type="llama",
gpu_layers=0 )

title= "Shivansh Model"

def llm_func(message,history):
response=llm(message)
return response

gr.ChatInterface(
fn=llm_func,
title=title,
).launch()
This gave segmentation fault.

On the other hand:->
from langchain.llms import LlamaCpp
import gradio as gr

def load_llm():
llm = LlamaCpp(
model_path="../model/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
max_new_tokens=512,
temperature=0.1
)
return llm
title= "Shivansh Model"

def llm_func(message,history):
llm=load_llm()
response=llm(message)
return response

gr.ChatInterface(
fn=llm_func,
title=title,
).launch()
This is working good.

Could you share your code.

from langchain.llms import CTransformers
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
import time
import json

Load model configuration from the specified path

config_path = "D:/Project File/restart/NEW AI/config.json"

with open(config_path, 'r') as config_file:
model_config = json.load(config_file)

Extract specific parameters

load_params = model_config.get('load_params', {})

Use the extracted parameters in your main code

model_path = "D:/Project File/restart/NEW AI/mistral-7b-instruct-v0.1.Q2_K.gguf"

Initialize LangChain's CTransformers with StreamingStdOutCallbackHandler

llm = CTransformers(
model=model_path,
callbacks=[StreamingStdOutCallbackHandler()]
)

Initialize conversation history

conversation_history = []

prompt_template = "[INST] {prompt} [/INST]"

while True:
prompt_template = {
"pre_prompt": "You are an artificial intelligence called VICTOR, Victor stands for Virtual Intelligent Companion for Technological Optimization and Reinforcement, created by Arun Raj, a Physics student.you are a friend of the user, you aim to keep our conversation very concise and engaging you, ",
"pre_prompt_suffix": "",
"pre_prompt_prefix": "",
"input_prefix": "[INST]",
"input_suffix": "[/INST]",
"antiprompt": ["[INST]"],
}
user_input = input("You: ")

if user_input.lower() == "quit":
    break

formatted_input = f"{prompt_template['input_prefix']}{user_input}{prompt_template['input_suffix']}"

print("\nYou:", user_input)

response = llm(prompt_template['pre_prompt'] + formatted_input)

if conversation_history and response != conversation_history[-1][1]:  # Check if conversation history is not empty
    print()

conversation_history.append(("User", user_input))
conversation_history.append(("AI", response))

print("Chatbot session ended.")

in this code i am still facing memory issue,

can you help me with memory

Can you share your hardware details, so I can help you.

ryzen 5 5600h, nvdia gtx 1650and amd radieon gpu, 24gb ram

i mean conversational memory, the model doesn't remember previous interaction,

you can try out langchain.memory for conversational/contextual memory.

yeah i tried that now the context window is 8k but don't have long term memory, so planning to integrate a database.

do you know any open source text to speech library for speaking in live stream output

no I have not expored that area yet

It's ok... Can you tell me how you used llama. Cpp?
I have some errors while installing llama. Cpp... If you know..., then please text me in insta -Arun_luka

Sign up or log in to comment