NPM-single

NPM-single is a nonparametric masked language model, pretrained on English text data. It was introduced by "Nonparametric Masked Language Modeling" and first released in facebookresearch/NPM.

Model description

NPM consists of an encoder and a reference corpus, and models a nonparametric distribution over a reference corpus. The key idea is to map all the phrases in the corpus into a dense vector space using the encoder and, when given a query with a MASK at inference, use the encoder to locate the nearest phrase from the corpus and fill in the MASK.

NPM-single is a variant of NPM that retrieves a token from the corpus, instead of a phrase.

Intended uses & limitations

While this repo includes the encoder weights, NPM-single has to be used together with a datstore. For more details on how to use NPM-single, please refer to the original repo.

Note that this model is primarily for filling in a MASK token. Future work can investigate how to use NPM-single for text generation.

Training procedure

NPM-single was trained on English Wikipedia (August 2019) and an English portion of CC-News (Mackenzie et al. (2020), February 2019), which contains 13B tokens in total. NPM-single used the model architecture and initial weights of RoBERTa large (Liu et al., 2019), consisting of 354M parameters. Training is done for 100,000 steps, using thirty-two 32GB GPUs.

More details about training can be found in the paper. Code for training NPM-single can be found in the original repo.

Evaluation results

NPM-single is evaluated on nine closed-set tasks (tasks with a small set of options given). NPM-single consistently outperforms significantly larger models such as GPT-3 and T5. Detailed results can be found from the paper.

BibTeX entry and citation info

@article{ min2022nonparametric,
    title={ Nonparametric Masked Language Modeling },
    author={ Min, Sewon and Shi, Weijia and Lewis, Mike and Chen, Xilun and Yih, Wen-tau and Hajishirzi, Hannaneh and Zettlemoyer, Luke },
    year={ 2022 }
}
Downloads last month
456
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.