|
--- |
|
license: mit |
|
--- |
|
|
|
### SuperHOT Prototype 2 w/ 8K Context |
|
|
|
This is a second prototype of SuperHOT, a NSFW focused LoRA, this time 30B with 8K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k). |
|
Tests have shown that the model does indeed leverage the extended context at 8K. |
|
|
|
#### Looking for Merged & Quantized Models? |
|
- 30B 4-bit CUDA: [tmpupload/superhot-30b-8k-4bit-safetensors](https://huggingface.co/tmpupload/superhot-30b-8k-4bit-safetensors) |
|
- 30B 4-bit CUDA 128g: [tmpupload/superhot-30b-8k-4bit-128g-safetensors](https://huggingface.co/tmpupload/superhot-30b-8k-4bit-128g-safetensors) |
|
|
|
#### Using the monkey-patch? |
|
You will **NEED** to **apply the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.25 and the maximum sequence length to 8192** |
|
|
|
#### Using Oobabooga with Exllama? |
|
- `python server.py --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf` |
|
|
|
#### Training Details |
|
I trained the LoRA with the following configuration: |
|
- 1200 samples (~400 samples over 2048 sequence length) |
|
- learning rate of 3e-4 |
|
- 3 epochs |
|
- The exported modules are: |
|
- q_proj |
|
- k_proj |
|
- v_proj |
|
- o_proj |
|
- no bias |
|
- Rank = 4 |
|
- Alpha = 8 |
|
- no dropout |
|
- weight decay of 0.1 |
|
- AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5 |
|
- Trained on 4-bit base model |