Edit model card

Mega Masked LM on wikitext-103

This is the location on the Hugging Face hub for the Mega MLM checkpoint. I trained this model on the wikitext-103 dataset using standard BERT-style masked LM pretraining using the original Mega repository and uploaded the weights initially to hf.co/mnaylor/mega-wikitext-103. When the implementation of Mega into Hugging Face's transformers is finished, the weights here are designed to be used with MegaForMaskedLM and are compatible with the other (encoder-based) MegaFor* model classes.

This model uses the RoBERTa base tokenizer since the Mega paper does not implement a specific tokenizer aside from the character-level tokenizer used to illustrate long-sequence performance.

Downloads last month
1,261
Safetensors
Model size
7.33M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mnaylor/mega-base-wikitext

Finetunes
14 models