--- license: mit --- ### SuperHOT Prototype 2 w/ 16K Context This is a second prototype of SuperHOT, a NSFW focused LoRA, this time with 16K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k). Tests have shown that the model does indeed leverage the extended context at 8K, so naturally, let's try going even further. #### Looking for Merged & Quantized Models? - 13B 16K GGML: [tmpupload/superhot-13b-16k-no-rlhf-test-GGML](https://huggingface.co/tmpupload/superhot-13b-16k-no-rlhf-test-GGML) - 13B 16K CUDA (no groupsize): [tmpupload/superhot-13b-16k-no-rlhf-test-GPTQ](https://huggingface.co/tmpupload/superhot-13b-16k-no-rlhf-test-GPTQ) #### Using the monkey-patch? You will need to **use either the monkeypatch** or, if you are already using the monkeypatch, **change the scaling factor to 0.125 and the maximum sequence length to 16384** #### Using Oobabooga with Exllama? - `python server.py --max_seq_len 16384 --compress_pos_emb 8 --loader exllama_hf` I trained the LoRA with the following configuration: - 1200 samples (~400 samples over 2048 sequence length) - learning rate of 3e-4 - 3 epochs - The exported modules are: - q_proj - k_proj - v_proj - o_proj - no bias - Rank = 4 - Alpha = 8 - no dropout - weight decay of 0.1 - AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5 - Trained on 4-bit base model - Cutoff of 4096