TheBloke/Mistral-7B-Instruct-v0.1-GGUF · This model is amazingly good

rambocoder

Sep 28, 2023

I am impressed. Works with latest llama.cpp without any issues.

deleted

Sep 28, 2023

Indeed!

sethuiyer

Sep 28, 2023

This comment has been hidden

ianuvrat

Sep 29, 2023

How did you ran, can you please share the code?

BingoBird

Sep 29, 2023

:What is a good invocation and paramaters for mistral 7b?

I'm testing with --temp 0.6 --mirostat 2 --mirostat-ent 6 --mirostat-lr 0.2 -n 2048 -c 2048 -n -1 --repeat-last-n 1600 --repeat-penalty 1.2

TK-Master

Sep 29, 2023

This has to be indeed the best 7b model I have tried.. for those who can't get it to run in text-generation-ui (I sure couldn't, it's broken af) here's some code and detailed instructions for a simple llama-cpp-python chatbot using this model.

First, I recommend a clean python installation with pip etc, you can use a virtual environment for this (I'm using miniconda with python version 3.10).
Then I installed llama-cpp-python with cuda support using the following commands (in windows cmd).

set CMAKE_ARGS="-DLLAMA_CUBLAS=on" && pip install llama-cpp-python set CMAKE_ARGS="-DLLAMA_CUBLAS=on" && set FORCE_CMAKE=1 && set CUDAFLAGS="-arch=all -lcublas" python -m pip install https://github.com/jllllll/llama-cpp-python-cuBLAS-wheels/releases/download/basic/llama_cpp_python-0.2.7+cu118-cp310-cp310-win_amd64.whl --no-cache-dir

(Note: this works for me using cuda 11.8, no avx. For other versions you might wanna replace the link with another from https://jllllll.github.io/llama-cpp-python-cuBLAS-wheels)

And that's it.. now you can run the following python script to ask the model questions.

python simpleStreamChat.py

import json
import argparse
from llama_cpp import Llama

parser = argparse.ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="../models/mistral-7b-instruct-v0.1.Q4_K_M.gguf")
parser.add_argument("-pt", "--prompt", type=str, default="<s>[INST]{prompt}[/INST]")
args = parser.parse_args()

prompt_template = args.prompt

print("Loading model " + args.model)
llm = Llama(model_path=args.model, n_gpu_layers=35, n_ctx=4096, temp=0.7, repeat_penalty=1.1, verbose=False)

stream = ""#llm("Question: What are the names of the planets in the solar system? Answer: ", max_tokens=48,stop=["Q:", "\n"],stream=True)

# Function - Print response output in chunks (stream)
def printresponse(response):
    completion_text = ''
    # iterate through the stream of events and print it
    print(f"Bot:", end="", flush=True)
    for event in response:
        event_text = event['choices'][0]['text']
        completion_text += event_text
        print(f"{event_text}", end="", flush=True)

    print("",flush=True)
    # remember context
    #context.append({"role": "assistant", "content" : completion_text})
    return completion_text

#printresponse(stream)

while True:
    try:
        u_input = input("-> ")
        
        prompt = prompt_template.format(prompt=u_input)
        stream = llm(prompt, max_tokens=512, stream=True)
        response = printresponse(stream)
        print()

    except KeyboardInterrupt:
        print("\n..(Response interrupted).")#continue
    print()

Note: set verbose=True to see token generation times etc. n_gpu_layers=how many layers on gpu, n_ctx=context size

Uncomment stream = ""#llm("Question: What are the names of the planets in the solar system? Answer: ", max_tokens=48,stop=["Q:", "\n"],stream=True)
and #printresponse(stream) if you want.

You're welcome!

deleted

Sep 29, 2023

odd. working fine for me and i have not updated anything in a couple of weeks. ( ooba )

aghatage

Sep 29, 2023

Anyone able to use it with constricting grammar in llama.cpp ?

atstim731

Oct 1, 2023

Hands down the best 7b model, holy cow.
For starters, I have a custom character, but the settings I'm using in tgwui are:
Instruction Template: Mistral (no modifications)
Generation Preset: Divine Intellect
Model Loader: LlamaCpp
The model is smart, retains context after several turns, great inference, picks up on nuance.
Mistral just hit's different than Llama, no judgement on Meta.
If you've watched Frazier, Mistral is like a very smart Roz, and Llama is Maris.

edumoulin

Oct 2, 2023

•

edited Oct 2, 2023

I was able to run this model in Q5_K_M.gguf through oobabooga on a M2 MacBookPro with 16GB : it runs very smoothly with 1 layer in GPU units, quite faster than 7B Llama2 or Vigogne with same quantization. However, it seems sometimes to struggle with long conversations (the answers get lesser accurate and you need to reload the model).
Tested on bash and SQL code, the results where relevant in most cases.

Akalilol

Oct 3, 2023

Is it uncensored ?

ianuvrat

Oct 4, 2023

@edumoulin , you tested in ql code as in? I want to understand how can we use this model to query database/csv or panda datafame. I tried with langchain but no luck. Would you be so kind to point out the tools/code how to achieve this? I feel that it would be useful to a lot of people. Thank you very much in advance.

edumoulin

Oct 4, 2023

Is it uncensored ?

@Akalilol , this is what they claim on their website, "It does not have any moderation mechanism. We’re looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs".
But I did not try to ask for questionable topics... 🙂

edumoulin

Oct 4, 2023

you tested in ql code as in? I want to understand how can we use this model to query database/csv or panda datafame.

@ianuvrat , actually I was only able to test its capabilities in chat mode for writing bash scripts and SQL statements, only using natural language. For now, it seems not to work in instruct mode, nor does it accept training at this time, at least using oobabooga's functions (https://github.com/oobabooga/text-generation-webui).

I guess this will change over time as we are currently in 0.1 version. I'm also curious to go deeper in database exploration

Colderthanice

Oct 13, 2023

Mistral is so good, exceeds expectation.

abhirajeshbhai

Oct 14, 2023

I have a question, I am trying to generate a poem, but it only generates half poems.. How do i make it to generate full poems??

whoknowsmeinhf

Oct 17, 2023

It's responses are good. Not sure why it's performing poorly for CoVe (Chain of verification) to minimize hallucinations

https://github.com/jagilley/fact-checker

Any suggestions ?