bhenrym14
/

airophin-v2-13b-PI-8k-fp16

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bhenrym14 commited on Aug 14, 2023

Commit

26b7edf

•

1 Parent(s): f267968

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -30,6 +30,8 @@ This model employs linear RoPE scaling, which is now has native support in Trans
 Please comment with any questions. I'll likely upload a GPTQ and (possibly) a GGML version soon, especially if anyone expresses interest.
 ## Motivation
 Previous experiments have demonstrated that orca-like datasets yield substantial performance improvements on numerous benchmarks. Additionally, the PI method of context extension requires finetuning to minimize performance impacts relative to the original (non context extended) model. My most successful models for context extension with PI methods employ a pretraining phase on long sequences, but due to the compute requirements, I have not scaled this to more than 200 iterations or so. Many groups (including OpenAssistant) have performed such training at scale. This model uses such a model as a starting point.

 Please comment with any questions. I'll likely upload a GPTQ and (possibly) a GGML version soon, especially if anyone expresses interest.
+Ooba use: Be sure to increase the `Truncate the prompt up to this length` parameter to 8192 to utilize the full context capabilities.
 ## Motivation
 Previous experiments have demonstrated that orca-like datasets yield substantial performance improvements on numerous benchmarks. Additionally, the PI method of context extension requires finetuning to minimize performance impacts relative to the original (non context extended) model. My most successful models for context extension with PI methods employ a pretraining phase on long sequences, but due to the compute requirements, I have not scaled this to more than 200 iterations or so. Many groups (including OpenAssistant) have performed such training at scale. This model uses such a model as a starting point.