NPM
NPM is a nonparametric masked language model, pretrained on English text data. It was introduced by "Nonparametric Masked Language Modeling" and first released in facebookresearch/NPM.
Model description
NPM consists of an encoder and a reference corpus, and models a nonparametric distribution over a reference corpus. The key idea is to map all the phrases in the corpus into a dense vector space using the encoder and, when given a query with a MASK at inference, use the encoder to locate the nearest phrase from the corpus and fill in the MASK.
Intended uses & limitations
While this repo includes the encoder weights, NPM has to be used together with a datstore. For more details on how to use NPM, please refer to the original repo.
Note that this model is primarily for filling in a MASK token. Future work can investigate how to use NPM for text generation.
Training procedure
NPM was trained on English Wikipedia (August 2019) and an English portion of CC-News (Mackenzie et al. (2020), February 2019), which contains 13B tokens in total. NPM used the model architecture and initial weights of RoBERTa large (Liu et al., 2019), consisting of 354M parameters. Training is done for 100,000 steps, using thirty-two 32GB GPUs.
More details about training can be found in the paper. Code for training NPM can be found in the original repo.
Evaluation results
NPM is evaluated on nine closed-set tasks (tasks with a small set of options given) and seven open-set tasks (tasks whose answers are arbitrary-length). NPM consistently outperforms significantly larger models such as GPT-3, OPT and T5. Detailed results can be found from the paper.
BibTeX entry and citation info
@article{ min2022nonparametric,
title={ Nonparametric Masked Language Modeling },
author={ Min, Sewon and Shi, Weijia and Lewis, Mike and Chen, Xilun and Yih, Wen-tau and Hajishirzi, Hannaneh and Zettlemoyer, Luke },
year={ 2022 }
}
- Downloads last month
- 18