kaiokendev
/

superhot-30b-8k-no-rlhf-test

Model card Files Files and versions Community

superhot-30b-8k-no-rlhf-test / README.md

kaiokendev's picture

Update README.md

db91c10 over 1 year ago

|

history blame contribute delete

1.4 kB

	---
	license: mit
	---

	### SuperHOT Prototype 2 w/ 8K Context

	This is a second prototype of SuperHOT, a NSFW focused LoRA, this time 30B with 8K context and no RLHF, using the same technique described in [the github blog](https://kaiokendev.github.io/til#extending-context-to-8k).
	Tests have shown that the model does indeed leverage the extended context at 8K.

	#### Looking for Merged & Quantized Models?
	- 30B 4-bit CUDA: [tmpupload/superhot-30b-8k-4bit-safetensors](https://huggingface.co/tmpupload/superhot-30b-8k-4bit-safetensors)
	- 30B 4-bit CUDA 128g: [tmpupload/superhot-30b-8k-4bit-128g-safetensors](https://huggingface.co/tmpupload/superhot-30b-8k-4bit-128g-safetensors)

	#### Using the monkey-patch?
	You will NEED to apply the monkeypatch or, if you are already using the monkeypatch, change the scaling factor to 0.25 and the maximum sequence length to 8192

	#### Using Oobabooga with Exllama?
	- `python server.py --max_seq_len 8192 --compress_pos_emb 4 --loader exllama_hf`

	#### Training Details
	I trained the LoRA with the following configuration:
	- 1200 samples (~400 samples over 2048 sequence length)
	- learning rate of 3e-4
	- 3 epochs
	- The exported modules are:
	- q_proj
	- k_proj
	- v_proj
	- o_proj
	- no bias
	- Rank = 4
	- Alpha = 8
	- no dropout
	- weight decay of 0.1
	- AdamW beta1 of 0.9 and beta2 0.99, epsilon of 1e-5
	- Trained on 4-bit base model