Text Generation
English
Megatron-LM
nvidia
Retro
InstructRetro
8B
File size: 6,371 Bytes
274e7b3
 
ab289a8
274e7b3
 
ab289a8
 
 
 
 
 
 
 
 
 
274e7b3
ab289a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
license: other
license_name: nv-ai-foundation-models-license
license_link: >-
  https://developer.download.nvidia.com/ai-foundation-models/nvidia-ai-foundation-models-license-10Nov2023.pdf
language:
- en
pipeline_tag: text-generation
tags:
  - nvidia
  - Megatron-LM
  - Retro
  - InstructRetro
  - 48B
library_name: Megatron-LM
---

# InstructRetro

Retro [(Borgeaud et al., 2022)](https://arxiv.org/abs/2112.04426) is an autoregressive decoder-only language model (LM) pretrained with retrieval-augmentation. 
Retro features practical scalibility to support large-scale pretraining from scratch by retrieving from trillions of token.
Pretraining with retrieval provides a more efficient storage mechanism of factual knowledge, when compared to storing factual knowledge implicitly within the network's parameters, thus largely reducing model parameters while achieving lower perplexity than standard GPT. 
Retro also provides the flexibility to update the
knowledge stored in LMs [(Wang et al., 2023a)](https://arxiv.org/abs/2304.06762)
by updating the retrieval database without training LMs again.

InstructRetro [(Wang et al., 2023b)](https://arxiv.org/abs/2310.07713) further scales up the size of Retro to 48B, featuring the largest LLM pretrained with retrieval (as of December 2023). 
The obtained foundation model, Retro 48B, largely outperforms the GPT counterpart in terms of perplexity.
With instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on downstream tasks in the zero-shot setting. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. We also find that one can ablate the encoder from InstructRetro architecture and directly use the InstructRetro decoder backbone as GPT, while achieving comparable results.

## Model Overview

### License

The use of this model is governed by the [NVIDIA AI Foundation Models Community License Agreement](https://developer.nvidia.com/downloads/nv-ai-foundation-models-license).

### Supported Hardware

- H100
- A100 80GB, A100 40GB

### Model Version(s)

`retro-8b-instruct-4k`: Pretrained Retro 8B LM with instruction tuning.


### Toolkit
[Megatron-LM Framework](https://github.com/NVIDIA/Megatron-LM/tree/InstructRetro)


## Environment

We recommend using docker environment to run the code.

### Docker image


We provide a docker build file in [Dockerfile](https://github.com/NVIDIA/Megatron-LM/blob/InstructRetro/tools/retro/examples/Dockerfile) for the reproduction. The docker image is based on `nvcr.io/nvidia/pytorch:23.09-py3`.


### Install dependencies

Clone the Megatron repo:

```bash
git clone --branch InstructRetro https://github.com/NVIDIA/Megatron-LM.git
```

If docker is not available, we recommend starting from a clean conda environment with the following runtime dependencies:

- Python 3.10
- NVIDIA CUDA® 12.2.1
- NVIDIA cuBLAS 12.2.5.6
- NVIDIA cuDNN 8.9.5
- NVIDIA NCCL 2.18.5
- PyTorch 2.1.0a0+32f93b1

Then install Retro-specific dependencies, including:
```bash
pip install -U faiss-gpu
pip install -U transformers
pip install -U sentencepiece
pip install -U h5py
pip install -U nltk
pip install -U einops
```

## Evaluation Command

Download our model checkpoint and tokenizer.

Specify the blank args in the [tools/retro/text_generation/retro_generate.sh](https://github.com/NVIDIA/Megatron-LM/blob/InstructRetro/tools/retro/text_generation/retro_generate.sh) script, including model path, Retro workdir, and model related params.

| Parameter | Value | Explanation                       |
|-----------|-------|-----------------------------------|
| mod_par   | 4     | Tensor parallelism        |
| layers    | 32    | Number of layers in the model     |
| hid_dim   | 4096  | Hidden dimension size             |
| heads     | 32    | Number of attention heads         |
| pip_par   | 1     | Pipeline parallelism     |


We present an example command to run retro generation with the InstructRetro checkpoints for the Natural Question (NQ) task. The example command is for the 8b InstructRetro. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints.

```
bash tools/retro/text_generation/retro_generate.sh nq 8b greedy test  0 20000 1000 5 pp1 <path/to/checkpoint> 2
```

The generated responses will be saved in the corresponding checkpoint directory. For example, for the 8b InstructRetro, it will be saved to 
`<path/to/retro>/retro-generate-nq_5_2_8b_test_greedy_0_20000_1000.txt`.

To evaluate the F1 / Exact Match (EM) scores of the generated responses, we provide an example script to run the evaluation on the NQ dataset. Please specify the directory for the NQ dataset and update the command accordingly for other checkpoints and downstream tasks.  

```bash
python3 tools/retro/text_generation/evaluate.py
```

# Citations

See more details from our papers:

[Shall we Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study.](https://arxiv.org/abs/2304.06762)

_Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro._ (EMNLP 2023)

[InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining.](https://arxiv.org/abs/2310.07713) 

_Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro._ 

Please cite the papers as follows if you use the data or code from this repo:

```bibtex
@inproceedings{wang2023shall,
    title   = {Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study},
    author  = {Boxin Wang and Wei Ping and Peng Xu and Lawrence McAfee and Zihan Liu and Mohammad Shoeybi and Yi Dong and Oleksii Kuchaiev and Bo Li and Chaowei Xiao and Anima Anandkumar and Bryan Catanzaro},
    journal = {The 2023 Conference on Empirical Methods in Natural Language Processing},
    year    = {2023}
}

@article{wang2023instructretro,
    title   = {InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining},
    author  = {Boxin Wang and Wei Ping and Lawrence McAfee and Peng Xu and Bo Li and Mohammad Shoeybi and Bryan Catanzaro},
    year    = {2023},
    journal = {arXiv preprint arXiv: 2310.07713}
}
```