lsetiawan commited on
Commit
f1dc21d
1 Parent(s): c1e2ef8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -3
README.md CHANGED
@@ -5,7 +5,55 @@ language:
5
  pipeline_tag: text-generation
6
  ---
7
 
8
- # OLMo-7B-Instruct-GGUF
9
 
10
- This repo contains GGUF files of the [ssec-uw/OLMo-7B-Instruct-hf](https://huggingface.co/ssec-uw/OLMo-7B-Instruct-hf) which is derived from [allenai/OLMo-7B-Instruct](https://huggingface.co/allenai/OLMo-7B-Instruct) model.
11
- These files can be used with llama.cpp or other software like ollama and LM Studio.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  pipeline_tag: text-generation
6
  ---
7
 
8
+ # OLMo 7B-Instruct-GGUF
9
 
10
+ > For more details on OLMO-7B-Instruct, refer to [Allen AI's OLMo-7B-Instruct model card](https://huggingface.co/allenai/OLMo-7B-Instruct).
11
+
12
+ OLMo is a series of **O**pen **L**anguage **Mo**dels designed to enable the science of language models.
13
+ The OLMo base models are trained on the [Dolma](https://huggingface.co/datasets/allenai/dolma) dataset.
14
+ The Instruct version is trained on the [cleaned version of the UltraFeedback dataset](https://huggingface.co/datasets/allenai/ultrafeedback_binarized_cleaned).
15
+
16
+ OLMo 7B Instruct is trained for better question answering. They show the performance gain that OLMo base models can achieve with existing fine-tuning techniques.
17
+
18
+ This version of the model is derived from [ssec-uw/OLMo-7B-Instruct-hf](https://huggingface.co/ssec-uw/OLMo-7B-Instruct-hf) as [GGUF format](https://huggingface.co/docs/hub/en/gguf),
19
+ a binary format that is optimized for quick loading and saving of models, making it highly efficient for inference purposes.
20
+
21
+ In addition to the model being in GGUF format, the model has been [quantized](https://huggingface.co/docs/optimum/en/concept_guides/quantization),
22
+ to reduce the computational and memory costs of running inference. *We are currently working on adding all of the [Quantization Types](https://huggingface.co/docs/hub/en/gguf#quantization-types)*.
23
+
24
+ These files are designed for use with [GGML](https://ggml.ai/) and executors based on GGML such as [llama.cpp](https://github.com/ggerganov/llama.cpp).
25
+
26
+ ## Get Started
27
+
28
+ To get started using one of the GGUF file, you can simply use [llama-cpp-python](https://github.com/abetlen/llama-cpp-python),
29
+ a Python binding for `llama.cpp`.
30
+
31
+ 1. Install `llama-cpp-python` with pip.
32
+
33
+ ```bash
34
+ pip install llama-cpp-python
35
+ ```
36
+
37
+ 2. Download one of the GGUF file. In this example,
38
+ we will download the [OLMo-7B-Instruct-Q4_K_M.gguf](https://huggingface.co/ssec-uw/OLMo-7B-Instruct-GGUF/resolve/main/OLMo-7B-Instruct-Q4_K_M.gguf?download=true),
39
+ when the link is clicked.
40
+
41
+ 3. Open up a python interpreter and run the following commands.
42
+ For example, we can ask it: `What is a solar system?`.
43
+ *You will need to modify the `model_path` argument to where
44
+ the GGUF model has been saved in your system*
45
+
46
+ ```python
47
+ from llama_cpp import Llama
48
+ llm = Llama(
49
+ model_path="path/to/OLMo-7B-Instruct-Q4_K_M.gguf"
50
+ )
51
+ result_dict = llm(prompt="What is solar system?", echo=True, max_tokens=500)
52
+ print(result_dict['choices'][0]['text'])
53
+ ```
54
+
55
+ 5. That's it, you should see the result fairly quickly! Have fun! 🤖
56
+
57
+ ## Contact
58
+
59
+ For errors in this model card, contact Don or Anant, {landungs, anmittal} at uw dot edu.