Crataco commited on
Commit
716d6ff
1 Parent(s): 917311a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -11
README.md CHANGED
@@ -2,7 +2,8 @@
2
  license: agpl-3.0
3
  language:
4
  - en
5
- thumbnail:
 
6
  tags:
7
  - ggml
8
  - text generation
@@ -10,19 +11,40 @@ tags:
10
  inference: false
11
  ---
12
 
13
- *(Not to be confused with [Pygmalion 13B](https://huggingface.co/TehVenom/Pygmalion-13b-GGML).)*
14
 
15
- This is converted and quantized from [Pygmalion 1.3B](https://huggingface.co/PygmalionAI/pygmalion-1.3b), based on [an earlier version](https://huggingface.co/EleutherAI/pythia-1.4b-deduped-v0) of Pythia 1.4B Deduped.
 
 
16
 
17
- Notes:
18
- - Converted with ggerganov/ggml's gpt-neox conversion script, and tested with KoboldCpp.
19
- - I can't promise that this will work with other frontends, if at all. I've had problems with the tokenizer. Could be related to the ggml implementation of GPT-NeoX [(source)](https://github.com/ggerganov/ggml/tree/master/examples/gpt-neox#notes).
20
 
21
- ### RAM USAGE (on KoboldCpp w/ OpenBLAS)
22
- Model | Initial RAM
23
- :--:|:--:
24
- ggml-pygmalion-1.3b-q4_0.bin | 1.1 GiB
25
- ggml-pygmalion-1.3b-q5_1.bin | 1.3 GiB
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
  Below is the original model card for Pygmalion 1.3B.
28
 
 
2
  license: agpl-3.0
3
  language:
4
  - en
5
+ model_creator: PygmalionAI
6
+ quantized_by: Crataco
7
  tags:
8
  - ggml
9
  - text generation
 
11
  inference: false
12
  ---
13
 
14
+ *(Not to be confused with [Pygmalion 13B](https://huggingface.co/TehVenom/Pygmalion-13b-GGML) and [Pygmalion 2 13B](https://huggingface.co/TheBloke/Pygmalion-2-13B-GGUF).)*
15
 
16
+ # Pygmalion 1.3B GGML
17
+ ### This repository contains quantized conversions of the Pygmalion 1.3B checkpoint.
18
+ *For use with frontends that support GGML quantized GPT-NeoX models, such as KoboldCpp and Oobabooga (with the CTransformers loader).*
19
 
20
+ *Last updated on 2023-09-23.*
 
 
21
 
22
+ Model | Startup RAM usage (KoboldCpp) | Startup RAM usage (Oobabooga)
23
+ :--:|:--:|:--:
24
+ pygmalion-1.3b.q4_0.bin | 1.0 GiB | 1.3 GiB
25
+ pygmalion-1.3b.q4_1.bin | 1.1 GiB | 1.4 GiB
26
+ pygmalion-1.3b.q5_0.bin | 1.2 GiB | 1.5 GiB
27
+ pygmalion-1.3b.q5_1.bin | 1.3 GiB | 1.6 GiB
28
+ pygmalion-1.3b.q8_0.bin | 1.7 GiB | 2.0 GiB
29
+ pygmalion-1.3b.f16.bin | 2.9 GiB | 3.2 GiB
30
+
31
+ **Recommended settings:**
32
+
33
+ Pygmalion 1.3B is a limited model, left in the dust by the Pygmalion project's advancements since then. Which is a shame, as it remains one of the few conversational models available for systems with less than 2GB RAM, at least before we get [TinyLLaMA](https://github.com/jzhang38/TinyLlama) and quantized [Phi-1.5](https://huggingface.co/microsoft/phi-1_5).
34
+
35
+ Here are some tips to get the best results you can out of this model:
36
+ - Stick to a low temperature, preferably between 0.2 and 0.7.
37
+ - Keep your repetition penalty between 1.0 and 1.02. These tiny values are required for models based on Pythia Deduped. Any higher and you'll get [this]().
38
+ - If using SillyTavern, follow these settings:
39
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6251b9851842c08ef3111c4f/Yqvgv428hA9V67jC9VZTp.png)
40
+ - You also have to keep character descriptions to a few sentences, possibly following CharacterAI's 500-character descriptions.
41
+
42
+ **Notes:**
43
+ - KoboldCpp [[bfc696f]](https://github.com/LostRuins/koboldcpp/tree/bfc696fcc452975dbe8967c39301ba856d04a030) was tested without OpenBLAS.
44
+ - Oobabooga [[895ec9d]](https://github.com/oobabooga/text-generation-webui/tree/895ec9dadb96120e8202a83052bf9032ca3245ae) was tested with with the `--model <model> --loader ctransformers --model_type gptneox` launch arguments.
45
+ - ggerganov/ggml [[8ca2c19]](https://github.com/ggerganov/ggml/tree/8ca2c19a3bb8622954d858fbf6383522684eaf34) was used for conversion and quantization.
46
+ - The original model is available at [PygmalionAI/pygmalion-1.3b](https://huggingface.co/PygmalionAI/pygmalion-1.3b).
47
+ - Earlier ggmlv2 quantizations are available [here](https://huggingface.co/Crataco/Pygmalion-1.3B-GGML/tree/15d3aa5e07372e4200c598443d211a2976db47f9).
48
 
49
  Below is the original model card for Pygmalion 1.3B.
50