adamo1139
/

LWM-7B-1M-1000000ctx-AEZAKMI-3_1-1702

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

adamo1139 commited on Feb 17

Commit

fdb6787

•

1 Parent(s): 865ef9d

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -1,4 +1,7 @@
 ---
 license: llama2
 ---
-LargeWorldModel 7B 1000000 ctx finetuned on AEZAKMI v3.1 dataset for epochs at max_seq_len of 4000 using QLoRA with lora_r 32 and cosine lr decaying from 0.00015

 ---
 license: llama2
 ---
+LargeWorldModel 7B 1000000 ctx finetuned on AEZAKMI v3.1 dataset for epochs at max_seq_len of 4000 using QLoRA with lora_r 32 and cosine lr decaying from 0.00015.
+I will be uploading exl2 quants and base model in safetensors format soon.
+Fine-tuned with unsloth, FA2 on local RTX 3090 Ti. Training took around 6 hours. I think most of the long ctx capabilities remain.