bhenrym14 commited on
Commit
6f1e4b6
1 Parent(s): d7530b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -5
README.md CHANGED
@@ -3,10 +3,6 @@ datasets:
3
  - jondurbin/airoboros-gpt4-1.4.1
4
  ---
5
 
6
- **7/6: This could be a little undertrained. I'll update the weights if I end up training it longer and/or with better hyperparameters.**
7
-
8
- ---------------
9
-
10
  # RoPE Scaled QLoRA Fine-tune of Llama-13b on airoboros-gpt4-1.4.1 (GPTQ)
11
 
12
  LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-LoRA
@@ -48,9 +44,10 @@ Recent advancements in extending context by RoPE scaling ([kaiokendev](https://k
48
 
49
 
50
  - For contexts shorter than the original 2048, the original model has lower perplexity. This is consistent with the literature. The gap shrinks with context length, with the original becoming incoherent beyond this point.
51
- - I haven't used models with the SuperHOT LoRA enough to have any sense of performance differences, but feedback on the 33b variant suggests it is noticable particularly with coherence at longer context lengths.
52
  - This comparison isn't perfect. I did use the 1.4.1 dataset, the quantization method is slightly different, and the finetuning method is different (QLoRA vs full). In short, there are other potentially influential variables responsible for these performance differences.
53
 
 
54
  ## Quantization:
55
 
56
  The merged model was quantized with AutoGPTQ (bits = 4, group_size = 128, desc_act = True).
 
3
  - jondurbin/airoboros-gpt4-1.4.1
4
  ---
5
 
 
 
 
 
6
  # RoPE Scaled QLoRA Fine-tune of Llama-13b on airoboros-gpt4-1.4.1 (GPTQ)
7
 
8
  LoRA Weights can be found here: https://huggingface.co/bhenrym14/airoboros-13b-gpt4-1.4.1-PI-8192-LoRA
 
44
 
45
 
46
  - For contexts shorter than the original 2048, the original model has lower perplexity. This is consistent with the literature. The gap shrinks with context length, with the original becoming incoherent beyond this point.
47
+ - In terms of perplexity, this model outperforms the SuperHOT variant at all tested context lengths. I haven't used models with the SuperHOT LoRA enough to have any sense of performance differences, but feedback on the 33b variant suggests it is particularly noticable at longer context lengths.
48
  - This comparison isn't perfect. I did use the 1.4.1 dataset, the quantization method is slightly different, and the finetuning method is different (QLoRA vs full). In short, there are other potentially influential variables responsible for these performance differences.
49
 
50
+ This model could be a little undertrained. I'll update the weights if I end up training it longer and/or with better hyperparameters
51
  ## Quantization:
52
 
53
  The merged model was quantized with AutoGPTQ (bits = 4, group_size = 128, desc_act = True).