weizhiwang
commited on
Commit
•
9f706ca
1
Parent(s):
61e985b
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,59 @@
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
---
|
4 |
+
# LongMem
|
5 |
+
|
6 |
+
Official implementation of our paper "[Augmenting Language Models with Long-Term Memory](https://arxiv.org/abs//2306.07174)".
|
7 |
+
|
8 |
+
Please cite our paper if you find this repository interesting or helpful:
|
9 |
+
```bibtex
|
10 |
+
@article{LongMem,
|
11 |
+
title={Augmenting Language Models with Long-Term Memory},
|
12 |
+
author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},
|
13 |
+
journal={arXiv preprint arXiv:2306.07174},
|
14 |
+
year={2023}
|
15 |
+
}
|
16 |
+
```
|
17 |
+
|
18 |
+
## Environment Setup
|
19 |
+
* torch: Please follow [torch official installation guide](https://pytorch.org/get-started/previous-versions/). We recommend torch>=1.8.0. Please select the torch-gpu version which is consistent with your cuda driver version.
|
20 |
+
|
21 |
+
* Faiss-GPU: For Nvidia V100 GPUs, simply install via ``pip install faiss-gpu``. For Nvidia A100, A6000 GPUs, please run ``conda install faiss-gpu cudatoolkit=11.0 -c pytorch``. The A100 GPU is not officially supported by faiss-gpu, sometimes it will lead to errors, you can refer to this git [issue](https://github.com/facebookresearch/faiss/issues/2064) of faiss for help.
|
22 |
+
|
23 |
+
* fairseq: ``pip install --editable ./fairseq`` Then the revised `fairseq` and dependency packages will be installed. We strongly recommend you to use python 3.8 for stability.
|
24 |
+
|
25 |
+
* other packages: ``pip install -r requirements.txt``
|
26 |
+
|
27 |
+
## Project Structure
|
28 |
+
* Pre-trained LLM Class (L24, E1024, Alibi positional embedding): [`fairseq/fairseq/models/newgpt.py`](fairseq/fairseq/models/newgpt.py)
|
29 |
+
|
30 |
+
* Transformer Decoder with SideNetwork (L12, E1024): [`fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.py`](fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.py)
|
31 |
+
|
32 |
+
* Transformer Language Model with SideNetwork Class: [`fairseq/fairseq/models/transformer_lm_sidenet.py`](fairseq/fairseq/models/transformer_lm_sidenet.py)
|
33 |
+
|
34 |
+
* Memory Bank and Retrieval: [`fairseq/fairseq/modules/dynamic_memory_with_chunk.py`](fairseq/fairseq/modules/dynamic_memory_with_chunk.py)
|
35 |
+
|
36 |
+
* Joint Attention for Memory Fusion: [`fairseq/fairseq/modules/joint_multihead_attention_sum.py`](fairseq/fairseq/modules/joint_multihead_attention_sum.py)
|
37 |
+
|
38 |
+
## Memory-Augmented Adaptation Training
|
39 |
+
### Data collection and Preprocessing
|
40 |
+
Please download the Pile from [official release](https://pile.eleuther.ai/). Each sub-dataset in the Pile is organized as various jsonline splits. You can refer to [`preprocess/filter_shard_tnlg.py`](preprocess/filter_shard_tnlg.py) fpr how we sample the training set and binalize following standard fairseq preprocessing process.
|
41 |
+
|
42 |
+
Memory-Augmented Adaptation Training:
|
43 |
+
```
|
44 |
+
bash train_scripts/train_longmem.sh
|
45 |
+
```
|
46 |
+
|
47 |
+
## Evaluation
|
48 |
+
Please firstly download the checkpoints for pre-trained [GPT2-medium model and LongMem model](https://huggingface.co/weizhiwang/LongMem-558M) to ``checkpoints/``.
|
49 |
+
|
50 |
+
### Memory-Augmented In-Context Learning
|
51 |
+
```
|
52 |
+
# Evaluate gpt2 baseline
|
53 |
+
python eval_scripts/eval_longmem_icl.py --path /path/to/gpt2_pretrained_model
|
54 |
+
# Evaluate LongMem model
|
55 |
+
python eval_scripts/eval_longmem_icl.py --path /path/to/longmem_model --pretrained-model-path /path/to/gpt2_pretrained_model
|
56 |
+
```
|
57 |
+
|
58 |
+
## Credits
|
59 |
+
LongMem is developed based on [fairseq](https://github.com/facebookresearch/fairseq). Thanks to the team from eleuther.ai who constructed the largest high-quality corpora, the Pile.
|