File size: 6,594 Bytes
da3dac0 a88e72e da3dac0 a88e72e d1d65ea a88e72e d1d65ea a88e72e d1d65ea a88e72e d1d65ea a88e72e d1d65ea a88e72e d1d65ea a88e72e d1d65ea a88e72e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
license: gpl
datasets:
- nomic-ai/gpt4all-j-prompt-generations
language:
- en
inference: false
---
# GPT4All-13B-snoozy-GGML
These files are GGML format model files of [Nomic.AI's GPT4all-13B-snoozy](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy).
GGML files are for CPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).
## Repositories available
* [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GPTQ).
* [4bit and 5bit GGML models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GGML).
* [Nomic.AI's original model in float32 HF for GPU inference](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy).
## REQUIRES LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
llama.cpp recently made a breaking change to its quantisation methods.
I have re-quantised the GGML files in this repo. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.
The previous files, which will still work in older versions of llama.cpp, can be found in branch `previous_llama`.
## Provided files
| Name | Quant method | Bits | Size | RAM required | Use case |
| ---- | ---- | ---- | ---- | ---- | ----- |
`GPT4All-13B-snoozy.q4_0.bin` | q4_0 | 4bit | 8.14GB | 10GB | 4-bit. |
`GPT4All-13B-snoozy.q5_0.bin` | q5_0 | 5bit | 8.95GB | 11GB | 5-bit. Higher accuracy, higher resource usage and slower inference. |
`GPT4All-13B-snoozy.q5_1.bin` | q5_1 | 5bit | 9.76GB | 12GB | 5-bit. Even higher accuracy, higher resource usage and slower inference. |
## How to run in `llama.cpp`
I use the following command line; adjust for your tastes and needs:
```
./main -t 12 -m GPT4All-13B-snoozy.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write a story about llamas
### Response:"
```
Change `-t 12` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
## How to run in `text-generation-webui`
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
Note: at this time text-generation-webui will not support the newly updated llama.cpp quantisation methods.
**Thireus** has written a [great guide on how to update it to the latest llama.cpp code](https://huggingface.co/TheBloke/wizardLM-7B-GGML/discussions/5) which may help get the newly updated llama.cpp quantisation methods working in text-gen-ui sooner.
## Repositories available
* [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GPTQ).
* [4bit and 5bit GGML models for GPU inference](https://huggingface.co/TheBloke/GPT4ALL-13B-snoozy-GGML).
* [Nomic.AI's original model in float32 HF for GPU inference](https://huggingface.co/nomic-ai/gpt4all-13b-snoozy).
# Original Model Card for GPT4All-13b-snoozy
An Apache-2 licensed chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This model has been finetuned from LLama 13B
- **Developed by:** [Nomic AI](https://home.nomic.ai)
- **Model Type:** A finetuned LLama 13B model on assistant style interaction data
- **Language(s) (NLP):** English
- **License:** Apache-2
- **Finetuned from model [optional]:** LLama 13B
This model was trained on `nomic-ai/gpt4all-j-prompt-generations` using `revision=v1.3-groovy`
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** [https://github.com/nomic-ai/gpt4all](https://github.com/nomic-ai/gpt4all)
- **Base Model Repository:** [https://github.com/facebookresearch/llama](https://github.com/facebookresearch/llama)
- **Demo [optional]:** [https://gpt4all.io/](https://gpt4all.io/)
### Results
Results on common sense reasoning benchmarks
```
Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA
----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
GPT4All-J 6B v1.0 73.4 74.8 63.4 64.7 54.9 36.0 40.2
GPT4All-J v1.1-breezy 74.0 75.1 63.2 63.6 55.4 34.9 38.4
GPT4All-J v1.2-jazzy 74.8 74.9 63.6 63.8 56.6 35.3 41.0
GPT4All-J v1.3-groovy 73.6 74.3 63.8 63.5 57.7 35.0 38.8
GPT4All-J Lora 6B 68.6 75.8 66.2 63.5 56.4 35.7 40.2
GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2
GPT4All 13B snoozy *83.3* 79.2 75.0 *71.3* 60.9 44.2 43.4
Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2
Dolly 12B 56.7 75.4 71.0 62.2 *64.6* 38.5 40.4
Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4
Alpaca Lora 7B 74.3 *79.3* 74.0 68.8 56.6 43.9 42.6
GPT-J 6B 65.4 76.2 66.2 64.1 62.2 36.6 38.2
LLama 7B 73.1 77.4 73.0 66.9 52.5 41.4 42.4
LLama 13B 68.5 79.1 *76.2* 70.1 60.0 *44.6* 42.2
Pythia 6.9B 63.5 76.3 64.0 61.1 61.3 35.2 37.2
Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38.0
Vicuña T5 81.5 64.6 46.3 61.8 49.3 33.3 39.4
Vicuña 13B 81.5 76.8 73.3 66.7 57.4 42.7 43.6
Stable Vicuña RLHF 82.3 78.6 74.1 70.9 61.0 43.5 *44.4*
StableLM Tuned 62.5 71.2 53.6 54.8 52.4 31.1 33.4
StableLM Base 60.1 67.4 41.2 50.1 44.9 27.0 32.0
```
|