File size: 926 Bytes

5f8a081
 
 
ca3c949
efd6f62
5f8a081
 
 
 
 
 
 
ca3c949

This model is still uploading. README will be here shortly.

If you're too impatient to wait for that (of course you are), to run these files you need:
1. llama.cpp as of [this commit or later](https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb)
 - For users who don't want to compile from source, you can use the binaries from [release master-3602ac4](https://github.com/ggerganov/llama.cpp/releases/tag/master-3602ac4)
2. To add new command line parameter `-gqa 8`

Example command:
```
/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
```

There is no CUDA support at this time, but it should hopefully be coming soon.

There is no support in third-party UIs or Python libraries (llama-cpp-python, ctransformers) yet. That will come in due course.