weizhiwang commited on
Commit
9f706ca
1 Parent(s): 61e985b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +56 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ # LongMem
5
+
6
+ Official implementation of our paper "[Augmenting Language Models with Long-Term Memory](https://arxiv.org/abs//2306.07174)".
7
+
8
+ Please cite our paper if you find this repository interesting or helpful:
9
+ ```bibtex
10
+ @article{LongMem,
11
+ title={Augmenting Language Models with Long-Term Memory},
12
+ author={Wang, Weizhi and Dong, Li and Cheng, Hao and Liu, Xiaodong and Yan, Xifeng and Gao, Jianfeng and Wei, Furu},
13
+ journal={arXiv preprint arXiv:2306.07174},
14
+ year={2023}
15
+ }
16
+ ```
17
+
18
+ ## Environment Setup
19
+ * torch: Please follow [torch official installation guide](https://pytorch.org/get-started/previous-versions/). We recommend torch>=1.8.0. Please select the torch-gpu version which is consistent with your cuda driver version.
20
+
21
+ * Faiss-GPU: For Nvidia V100 GPUs, simply install via ``pip install faiss-gpu``. For Nvidia A100, A6000 GPUs, please run ``conda install faiss-gpu cudatoolkit=11.0 -c pytorch``. The A100 GPU is not officially supported by faiss-gpu, sometimes it will lead to errors, you can refer to this git [issue](https://github.com/facebookresearch/faiss/issues/2064) of faiss for help.
22
+
23
+ * fairseq: ``pip install --editable ./fairseq`` Then the revised `fairseq` and dependency packages will be installed. We strongly recommend you to use python 3.8 for stability.
24
+
25
+ * other packages: ``pip install -r requirements.txt``
26
+
27
+ ## Project Structure
28
+ * Pre-trained LLM Class (L24, E1024, Alibi positional embedding): [`fairseq/fairseq/models/newgpt.py`](fairseq/fairseq/models/newgpt.py)
29
+
30
+ * Transformer Decoder with SideNetwork (L12, E1024): [`fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.py`](fairseq/fairseq/models/sidenet/transformer_decoder_sidenet.py)
31
+
32
+ * Transformer Language Model with SideNetwork Class: [`fairseq/fairseq/models/transformer_lm_sidenet.py`](fairseq/fairseq/models/transformer_lm_sidenet.py)
33
+
34
+ * Memory Bank and Retrieval: [`fairseq/fairseq/modules/dynamic_memory_with_chunk.py`](fairseq/fairseq/modules/dynamic_memory_with_chunk.py)
35
+
36
+ * Joint Attention for Memory Fusion: [`fairseq/fairseq/modules/joint_multihead_attention_sum.py`](fairseq/fairseq/modules/joint_multihead_attention_sum.py)
37
+
38
+ ## Memory-Augmented Adaptation Training
39
+ ### Data collection and Preprocessing
40
+ Please download the Pile from [official release](https://pile.eleuther.ai/). Each sub-dataset in the Pile is organized as various jsonline splits. You can refer to [`preprocess/filter_shard_tnlg.py`](preprocess/filter_shard_tnlg.py) fpr how we sample the training set and binalize following standard fairseq preprocessing process.
41
+
42
+ Memory-Augmented Adaptation Training:
43
+ ```
44
+ bash train_scripts/train_longmem.sh
45
+ ```
46
+
47
+ ## Evaluation
48
+ Please firstly download the checkpoints for pre-trained [GPT2-medium model and LongMem model](https://huggingface.co/weizhiwang/LongMem-558M) to ``checkpoints/``.
49
+
50
+ ### Memory-Augmented In-Context Learning
51
+ ```
52
+ # Evaluate gpt2 baseline
53
+ python eval_scripts/eval_longmem_icl.py --path /path/to/gpt2_pretrained_model
54
+ # Evaluate LongMem model
55
+ python eval_scripts/eval_longmem_icl.py --path /path/to/longmem_model --pretrained-model-path /path/to/gpt2_pretrained_model
56
+ ```
57
+
58
+ ## Credits
59
+ LongMem is developed based on [fairseq](https://github.com/facebookresearch/fairseq). Thanks to the team from eleuther.ai who constructed the largest high-quality corpora, the Pile.