Text Generation
English
Megatron-LM
nvidia
Retro
InstructRetro
48B
boxin-wbx commited on
Commit
70f2354
·
1 Parent(s): 6adc250

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -1
README.md CHANGED
@@ -13,4 +13,133 @@ tags:
13
  - InstructRetro
14
  - 48B
15
  library_name: Megatron-LM
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  - InstructRetro
14
  - 48B
15
  library_name: Megatron-LM
16
+ ---
17
+
18
+ # InstructRetro
19
+
20
+ Retro [(Borgeaud et al., 2022)](https://arxiv.org/abs/2112.04426) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation.
21
+ Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token.
22
+ Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT.
23
+ Retro also provides the flexibility to update the
24
+ knowledge stored in LMs [(Wang et al., 2023a)](https://arxiv.org/abs/2304.06762)
25
+ by updating the retrieval database without training LMs again.
26
+
27
+ InstructRetro [(Wang et al., 2023b)](https://arxiv.org/abs/2310.07713) further scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023).
28
+ The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity.
29
+ With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results.
30
+
31
+ ## Model Overview
32
+
33
+ ### License
34
+
35
+ The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license).
36
+
37
+ ### Supported Hardware
38
+
39
+ - H100
40
+ - A100 80GB, A100 40GB
41
+
42
+ ### Model Version(s)
43
+
44
+ `retro-48b-instruct-4k`: Pretrained Retro 48B LM with instruction tuning.
45
+
46
+
47
+ ### Toolkit
48
+ [Megatron-LM Framework](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro)
49
+
50
+
51
+ ## Environment
52
+
53
+ We recommend using docker environment to run the code.
54
+
55
+ ### Docker image
56
+
57
+
58
+ We provide a docker build file in [Dockerfile](https://github.com/NVIDIA/Megatron-LM/blob/InstructRetro/tools/retro/examples/Dockerfile) for the reproduction. The docker image is based on `nvcr.io/nvidia/pytorch:23.09-py3`.
59
+
60
+
61
+ ### Install dependencies
62
+
63
+ Clone the Megatron repo:
64
+
65
+ ```bash
66
+ git clone --branch InstructRetro https://github.com/NVIDIA/Megatron-LM.git
67
+ ```
68
+
69
+ If docker is not available, we recommend starting from a clean conda environment with the following runtime dependencies:
70
+
71
+ - Python 3.10
72
+ - NVIDIA CUDA® 12.2.1
73
+ - NVIDIA cuBLAS 12.2.5.6
74
+ - NVIDIA cuDNN 8.9.5
75
+ - NVIDIA NCCL 2.18.5
76
+ - PyTorch 2.1.0a0+32f93b1
77
+
78
+ Then install Retro-specific dependencies, including:
79
+ ```bash
80
+ pip install -U faiss-gpu
81
+ pip install -U transformers
82
+ pip install -U sentencepiece
83
+ pip install -U h5py
84
+ pip install -U nltk
85
+ pip install -U einops
86
+ ```
87
+
88
+ ## Evaluation Command
89
+
90
+ Download our model checkpoint and tokenizer.
91
+
92
+ Specify the blank args in the [tools/retro/text_generation/retro_generate.sh](https://github.com/NVIDIA/Megatron-LM/blob/InstructRetro/tools/retro/text_generation/retro_generate.sh) script, including model path, Retro workdir, and model related params.
93
+
94
+ | Parameter | Value | Explanation |
95
+ |-----------|-------|-----------------------------------|
96
+ | mod_par | 8 | Tensor parallelism |
97
+ | layers | 48 | Number of layers in the model |
98
+ | hid_dim | 8192 | Hidden dimension size |
99
+ | heads | 64 | Number of attention heads |
100
+ | pip_par | 1 | Pipeline parallelism |
101
+
102
+ We present an example command to run retro generation with the InstructRetro checkpoints for the Natural Question (NQ) task. The example command is for the 48b InstructRetro. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints.
103
+
104
+ ```
105
+ bash tools/retro/text_generation/retro_generate.sh nq 48b greedy test 0 20000 1000 5 pp1 <path/to/checkpoint> 2
106
+ ```
107
+
108
+ The generated responses will be saved in the corresponding checkpoint directory. For example, for the 48b InstructRetro, it will be saved to
109
+ `<path/to/retro>/retro-generate-nq_5_2_48b_test_greedy_0_20000_1000.txt`.
110
+
111
+ To evaluate the F1 / Exact Match (EM) scores of the generated responses, we provide an example script to run the evaluation on the NQ dataset. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints and downstream tasks.
112
+
113
+ ```bash
114
+ python3 tools/retro/text_generation/evaluate.py
115
+ ```
116
+
117
+ # Citations
118
+
119
+ See more details from our papers:
120
+
121
+ [Shall we Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.](https://arxiv.org/abs/2304.06762)
122
+
123
+ _Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro._ (EMNLP 2023)
124
+
125
+ [InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining.](https://arxiv.org/abs/2310.07713)
126
+
127
+ _Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro._
128
+
129
+ Please cite the papers as follows if you use the data or code from this repo:
130
+
131
+ ```bibtex
132
+ @inproceedings{wang2023shall,
133
+ title = {Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study},
134
+ author = {Boxin Wang and Wei Ping and Peng Xu and Lawrence McAfee and Zihan Liu and Mohammad Shoeybi and Yi Dong and Oleksii Kuchaiev and Bo Li and Chaowei Xiao and Anima Anandkumar and Bryan Catanzaro},
135
+ journal = {The 2023 Conference on Empirical Methods in Natural Language Processing},
136
+ year = {2023}
137
+ }
138
+
139
+ @article{wang2023instructretro,
140
+ title = {InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining},
141
+ author = {Boxin Wang and Wei Ping and Lawrence McAfee and Peng Xu and Bo Li and Mohammad Shoeybi and Bryan Catanzaro},
142
+ year = {2023},
143
+ journal = {arXiv preprint arXiv: 2310.07713}
144
+ }
145
+ ```