---
license: mit
---

### SuperHOT Prototype 2 w/ 8K Context

This is a second prototype of SuperHOT, a NSFW focused LoRA, this time 7B with 8K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k).

#### Looking for Merged & Quantized Models?
Make some please :)

#### Using the monkey-patch?
You will **NEED** to **apply the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.25 and the maximum sequence length to 8192**

The monkeypatch is only necessary if you are using a front-end/back-end that does not already support scaling and said front-end/back-end is Python-based (i.e. Huggingface Transformers). To apply the patch, you will need to copy the `llama_rope_scaled_monkey_patch.py` into your working directory and call the exported function `replace_llama_rope_with_scaled_rope` at the very start of your Python program. It will modify the Transformers library's implementation of RoPE to properly apply the scaling factor.

#### Using Oobabooga with Exllama?
Switch your loader to `exllama` or `exllama_hf` Add the arguments `max_seq_len 8192` and `compress_pos_emb 4`. **While the model may work well with `compress_pos_emb 2`, it was trained on 4, so that is what I advocate for you to use**

Example in the command-line:
- `python server.py --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf`

In the UI, you will see the loader option in the `Models` tab. Once you select either `exllama` or `exllama_hf`, the `max_seq_len` and `compress_pos_emb` settings will appear.

#### Training Details
I trained the LoRA with the following configuration: 
- 1200 samples (~400 samples over 2048 sequence length)
- learning rate of 3e-4 
- 3 epochs
- The exported modules are:
    - q_proj
    - k_proj
    - v_proj
    - o_proj
    - no bias
- Rank = 4
- Alpha = 8
- no dropout
- weight decay of 0.1
- AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
- Trained on 4-bit base model
- Cutoff length: 4096