Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
---
|
6 |
+
|
7 |
+
# LongQLoRA: Efficient and Effective Method to Extend Context Length of LLMs
|
8 |
+
|
9 |
+
## Technical Report
|
10 |
+
|
11 |
+
Technical Report: [LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models](https://arxiv.org/abs/2311.04879)
|
12 |
+
|
13 |
+
## Introduction
|
14 |
+
LongQLoRA is a memory-efficient and effective method to extend context length of Large Language Models with less training GPUs.
|
15 |
+
**On a single 32GB V100 GPU**, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k.
|
16 |
+
LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile dataset after only 1000 finetuning steps, our model outperforms LongLoRA and is very close to MPT-7B-8K.
|
17 |
+
|
18 |
+
|
19 |
+
Evaluation perplexity on PG19 validation and Proof-pile test datasets in evaluation context length of 8192:
|
20 |
+
|
21 |
+
| Model | PG19 | Proof-pile |
|
22 |
+
|---------------------|----------|------------|
|
23 |
+
| LLaMA2-7B | \>1000 | \>1000 |
|
24 |
+
| MPT-7B-8K | 7.98 | 2.67 |
|
25 |
+
| LongLoRA-LoRA-7B-8K | 8.20 | 2.78 |
|
26 |
+
| LongLoRA-Full-7B-8K | 7.93 | 2.73 |
|
27 |
+
| **LongQLoRA-7B-8K** | **7.96** | **2.73** |
|