Mega Masked LM on wikitext-103
This is the location on the Hugging Face hub for the Mega MLM checkpoint. I trained this model on the wikitext-103
dataset using standard
BERT-style masked LM pretraining using the original Mega repository and uploaded the weights
initially to hf.co/mnaylor/mega-wikitext-103. When the implementation of Mega into Hugging Face's transformers
is finished, the weights here
are designed to be used with MegaForMaskedLM
and are compatible with the other (encoder-based) MegaFor*
model classes.
This model uses the RoBERTa base tokenizer since the Mega paper does not implement a specific tokenizer aside from the character-level tokenizer used to illustrate long-sequence performance.
- Downloads last month
- 1,261
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.