Nondzu/Mistral-7B-code-16k-qlora

Oct 19, 2023

Hey,

I'd love to help testing this code model. I am an expert in writing code, but not in running/developing AI models.
How can I start testing it meaningfully?

TaidanaHito

Oct 30, 2023

•

edited Nov 6, 2023

So.. Under the listed quantizations, you can always try the GGUF. You'll need llama.cpp and if you want particularly good speeds, a GPU with at least ~5966MB RAM for the full 16k context. (8k context can run with 1GB less RAM)
If you don't have a GPU supported but you do have enough VRAM, I can give you my slightly edited copy of nomic-ai's vulkan fork which builds the server with the vulkan support. (it's like.. a few lines at most changed/added.)

Ok, so for my solution at the moment:
You need to build llama.cpp from source and then launch the server executable.

For example, with nomic-ai's vulkan fork, it was:
mkdir build && cd build
cmake .. -DLLAMA_KOMPUTE=ON
(For CUDA, the preceding is, if I recall: cmake .. -DLLAMA_CUBLAS=ON for OpenCL with CLBLAST it's cmake .. -DLLAMA_CLBLAST=ON )
cmake --build . --config Release
The binaries are in the bin folder, then. (This is probably obvious, sorry. Just being thorough.)
./bin/server -ngl 32 --host 127.0.0.1 --port 8080 --path "./examples/server/public" -c 16384 -m {path-to-gguf-model}

From there, you can use a plugin in vscode like.. Well, this is the only one I found that works with llama.cpp's server implementation: https://marketplace.visualstudio.com/items?itemName=ppipada.flexigpt
You need to edit the plugin settings, set "Flexigpt: Default Provider" to "llamacpp" and point "Flexigpt › Llamacpp: Default Origin" to "http://127.0.0.1:8080" then, for good measure, maybe restart vscode. This is my current setup for demoing the model at least.
I warn it has quirks, like almost always putting responses from the model that don't include any markdown in a box for markdown text when using llama.cpp for some reason.
Do keep in mind it won't have context awareness like Codeium, Cody, or Copilot. It has a few commands that will take the highlighted code into account, but not all do and your messages don't, either.

You could also, of course, always use llama.cpp's main executable for more flexibility (though without the convenience of a vscode plugin,) with ./main -ins -c 16384 --color -ngl 32 -m {path-to-gguf-model} or edit the plugin's source code to change the parameters it sends when calling to the llama.cpp server over HTTP.

You can also try another ggml-compatible application/library like Koboldcpp, gpt4all (though if I recall that one only works with q4_0, Q6_0 and q8_0,) ctransformers, or Oobabooga to name a few.
Other options exist, though since my RX 5700XT doesn't seem to work with ROCM, OpenCL or CUDA-based options (though Vulkan compute works excellently) I don't really know about those since CPU inference on my Ryzen 7 1800x back when I was getting started was painful outside llama.cpp so I kinda got really accustomed to llama.cpp and related applications.

I don't know if this is meaningful like you asked for, but if you meant something else in particular then do let me know! I've been enjoying the model so far, with what little I've experienced of it.

Hope this helps! Sorry if it's a bit long-winded or doesn't.

EDIT: I just saw that this was asked 11 days ago.. Sorry if this is irrelevant, now. Please let me know how it goes, whether that's it working out, or you needing more help. I'll try and check back in intermittently for a while.
Another edit: I forgot to mention that these instructions assume you're running on Linux and have the dependencies installed. If you encounter issues figuring that out, please let me know. I could assist further at request.

Moemamoe

Nov 7, 2023

•

edited Nov 7, 2023

Thanks a lot for the detailed explanation! Almost everything works like a charm so far, running the model on my RTX 3080 in a LXC container. I got main running and also the server including the VS Code plugin as you explained. So far I have not really tested the model itself, but I am gonna use it the next weeks for my coding and try it out.

Edit: For your information, I found the VS Code Continue extension which works with llama.cpp: https://marketplace.visualstudio.com/items?itemName=Continue.continue (https://continue.dev/docs/reference/Models/llamacpp)
Seems to work much better than the other one, at least at first glance.

Nondzu

Owner Nov 24, 2023

@Moemamoe @TaidanaHito Thanks for your answer, due I had hot time last weeks sorry for late response.
@Moemamoe do you have some updates about vscode integration and tests ?

BR @Nondzu

Nondzu
/

Mistral-7B-code-16k-qlora

Help testing