Monarch Mixer-BERT
The 260M checkpoint for M2-BERT-large from the paper Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture.
Check out our GitHub for instructions on how to download and fine-tune it!
How to use
You can load this model using Hugging Face AutoModel
:
from transformers import AutoModelForMaskedLM
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-260M', trust_remote_code=True)
This model uses the Hugging Face bert-base-uncased tokenizer
:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
You can use this model with a pipeline for masked language modeling:
from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
mlm = AutoModelForMaskedLM.from_pretrained('alycialee/m2-bert-260M', trust_remote_code=True)
unmasker = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
unmasker('Every morning, I enjoy a cup of [MASK] to start my day.')
Remote Code
This model requires trust_remote_code=True
to be passed to the from_pretrained
method. This is because we use custom PyTorch code (see our GitHub). You should consider passing a revision
argument that specifies the exact git commit of the code, for example:
mlm = AutoModelForMaskedLM.from_pretrained(
'alycialee/m2-bert-260M',
trust_remote_code=True,
revision='e8d17ae',
)
Configuration
Note use_flash_mm
is false by default. Using FlashMM is currently not supported.
Using hyena_training_additions
is turned off.
- Downloads last month
- 10