rozek
/

LLaMA-2-7B-32K_GGUF

Text Generation

text-generation-inference

togethercomputer

Inference Endpoints

Model card Files Files and versions Community

rozek commited on Aug 28, 2023

Commit

2da2643

·

1 Parent(s): c9dfcc4

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -75,8 +75,9 @@ python3 convert.py ../LLaMA-2-7B-32K
 ./quantize ../LLaMA-2-7B-32K/ggml-model-f16.gguf \
    ../LLaMA-2-7B-32K/LLaMA-2-7B-32K-Q4_0.gguf Q4_0
 ```
-11. run any quantizations you need and stop the container when finished (you may even delete it as the generated files
-will remain available on your host computer)
 You are now free to move the quanitization results to where you need them and run inferences with context
 lengths up to 32K (depending on the amount of memory you will have available - long contexts need an awful

 ./quantize ../LLaMA-2-7B-32K/ggml-model-f16.gguf \
    ../LLaMA-2-7B-32K/LLaMA-2-7B-32K-Q4_0.gguf Q4_0
 ```
+11. run any quantizations you need and stop the container when finished (you may even delete it as the
+generated files will remain available on your host computer)
+12. the `basic-python` image may also be deleted unless you plan to use it again in the near future
 You are now free to move the quanitization results to where you need them and run inferences with context
 lengths up to 32K (depending on the amount of memory you will have available - long contexts need an awful