bhenrym14 commited on
Commit
74b22f0
1 Parent(s): 4ad66b7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -9,7 +9,7 @@ datasets:
9
 
10
 
11
  <!-- LoRA Weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-LoRA -->
12
- fp16 weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16
13
 
14
  ## Overview
15
 
@@ -27,8 +27,8 @@ All training was performed with 1x RTX 6000 Ada.
27
 
28
  This model employs [Partial NTK Rope Scaling](https://github.com/jquesnelle/scaled-rope/pull/1). This methodology is not yet implemented natively in Transformers or Exllama (as of 7/21). There are three options to run this.
29
  1. Transformers (use bnb for quantization). Use [fp16 weights](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16). This will require replacing the `LlamaEmbedding` with `LlamaPartNTKScaledRotaryEmbedding`, with `max_position_embeddings=16384` and `original_max_position_embeddings=4096`. A monkeypatch can be found [here](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_pntk_monkey_patch.py).
30
- 2. Autogptq/GPTQ-for-Llama. Use these quantized weights. Make the same replacement as in 1.
31
- 3. Use ExLLama, replacing the `model.py` file with the [modified version](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/exllama_pntk/model.py). Use `compress_pos_emb=1` and `alpha_value = 1` (defaults). The necessary scaling values should flow from the configuration file. If you have done this correctly, there should be a dump of indications in the console indicating the scaling factor used (should be 4). If not, be sure your client is importing exllama from where you replaced the file. (ooba was from sitepackages for me). I hacked this together very quickly so don't be surprised if something goes wrong.
32
 
33
  Please comment with any questions. This hasn't been extensively tested.
34
 
 
9
 
10
 
11
  <!-- LoRA Weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-LoRA -->
12
+ GPTQ weights can be found here: https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-GPTQ
13
 
14
  ## Overview
15
 
 
27
 
28
  This model employs [Partial NTK Rope Scaling](https://github.com/jquesnelle/scaled-rope/pull/1). This methodology is not yet implemented natively in Transformers or Exllama (as of 7/21). There are three options to run this.
29
  1. Transformers (use bnb for quantization). Use [fp16 weights](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-fp16). This will require replacing the `LlamaEmbedding` with `LlamaPartNTKScaledRotaryEmbedding`, with `max_position_embeddings=16384` and `original_max_position_embeddings=4096`. A monkeypatch can be found [here](https://github.com/bhenrym14/qlora-airoboros-longcontext/blob/main/scaledllama/llama_pntk_monkey_patch.py).
30
+ 2. Autogptq/GPTQ-for-Llama. See the [GPTQ weights](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-GPTQ)
31
+ 3. Use ExLLama, see the [GPTQ weights](https://huggingface.co/bhenrym14/airophin-13b-pntk-16k-GPTQ)
32
 
33
  Please comment with any questions. This hasn't been extensively tested.
34