TheBloke
/

Llama-2-70B-Chat-GGML

Text Generation

Model card Files Files and versions Community

Llama-2-70B-Chat-GGML / README.md

TheBloke's picture

Update README.md

efd6f62 over 1 year ago

|

926 Bytes

	This model is still uploading. README will be here shortly.

	If you're too impatient to wait for that (of course you are), to run these files you need:
	1. llama.cpp as of [this commit or later](https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb)
	- For users who don't want to compile from source, you can use the binaries from [release master-3602ac4](https://github.com/ggerganov/llama.cpp/releases/tag/master-3602ac4)
	2. To add new command line parameter `-gqa 8`

	Example command:
	```
	/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
	```

	There is no CUDA support at this time, but it should hopefully be coming soon.

	There is no support in third-party UIs or Python libraries (llama-cpp-python, ctransformers) yet. That will come in due course.