jacobfulano commited on
Commit
d6f27fe
1 Parent(s): 3bb233a

Clarify how to load model in README

Browse files
Files changed (1) hide show
  1. README.md +36 -16
README.md CHANGED
@@ -34,42 +34,62 @@ The primary use case of these models is for research on efficient pretraining an
34
 
35
  April 2023
36
 
 
 
 
 
37
  ## Documentation
38
 
39
- * [Blog post](https://www.mosaicml.com/blog/mosaicbert)
40
- * [Github (mosaicml/examples/bert repo)](https://github.com/mosaicml/examples/tree/main/examples/bert)
 
 
 
 
41
 
42
  ## How to use
43
 
44
  ```python
45
- from transformers import AutoModelForMaskedLM
46
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-1024', trust_remote_code=True)
47
- ```
 
48
 
49
- The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
50
 
51
- ```python
52
- from transformers import BertTokenizer
53
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
 
 
 
54
  ```
55
 
56
- To use this model directly for masked language modeling, use `pipeline`:
 
 
 
57
 
58
  ```python
59
- from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
 
60
 
61
- tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
62
- mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-1024', trust_remote_code=True)
63
 
64
- classifier = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
65
 
66
- classifier("I [MASK] to the store yesterday.")
67
- ```
68
 
69
  **To continue MLM pretraining**, follow the [MLM pre-training section of the mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert#mlm-pre-training).
70
 
71
  **To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert#single-task-fine-tuning).
72
 
 
 
 
 
 
 
73
  ### Remote Code
74
 
75
  This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example:
 
34
 
35
  April 2023
36
 
37
+ ## Model Date
38
+
39
+ April 2023
40
+
41
  ## Documentation
42
 
43
+ * [Project Page (mosaicbert.github.io)](mosaicbert.github.io)
44
+ * [Github (mosaicml/examples/tree/main/examples/benchmarks/bert)](https://github.com/mosaicml/examples/tree/main/examples/benchmarks/bert)
45
+ * [Paper (NeurIPS 2023)](https://openreview.net/forum?id=5zipcfLC2Z)
46
+ * Colab Tutorials:
47
+ * [MosaicBERT Tutorial Part 1: Load Pretrained Weights and Experiment with Sequence Length Extrapolation Using ALiBi](https://colab.research.google.com/drive/1r0A3QEbu4Nzs2Jl6LaiNoW5EumIVqrGc?usp=sharing)
48
+ * [Blog Post (March 2023)](https://www.mosaicml.com/blog/mosaicbert)
49
 
50
  ## How to use
51
 
52
  ```python
53
+ import torch
54
+ import transformers
55
+ from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
56
+ from transformers import BertTokenizer, BertConfig
57
 
58
+ tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') # MosaicBERT uses the standard BERT tokenizer
59
 
60
+ config = transformers.BertConfig.from_pretrained('mosaicml/mosaic-bert-base-seqlen-1024') # the config needs to be passed in
61
+ mosaicbert = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-1024',config=config,trust_remote_code=True)
62
+
63
+ # To use this model directly for masked language modeling
64
+ mosaicbert_classifier = pipeline('fill-mask', model=mosaicbert, tokenizer=tokenizer,device="cpu")
65
+ mosaicbert_classifier("I [MASK] to the store yesterday.")
66
  ```
67
 
68
+ Note that the tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
69
+
70
+ In order to take advantage of ALiBi by extrapolating to longer sequence lengths, simply change the `alibi_starting_size` flag in the
71
+ config file and reload the model.
72
 
73
  ```python
74
+ config = transformers.BertConfig.from_pretrained('mosaicml/mosaic-bert-base-seqlen-1024')
75
+ config.alibi_starting_size = 2048 # maximum sequence length updated to 2048 from config default of 1024
76
 
77
+ mosaicbert = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base-seqlen-2048',config=config,trust_remote_code=True)
78
+ ```
79
 
80
+ This simply presets the non-learned linear bias matrix in every attention block to 2048 tokens (note that this particular model was trained with a sequence length of 1024 tokens).
81
 
 
 
82
 
83
  **To continue MLM pretraining**, follow the [MLM pre-training section of the mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert#mlm-pre-training).
84
 
85
  **To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert#single-task-fine-tuning).
86
 
87
+ ### [Update 1/2/2024] Triton Flash Attention with ALiBi
88
+
89
+ Note that by default, triton Flash Attention is **not** enabled or required. In order to enable our custom implementation of triton Flash Attention with ALiBi from March 2023,
90
+ set `attention_probs_dropout_prob: 0.0`. We are currently working on supporting Flash Attention 2 (see [PR here](https://github.com/mosaicml/examples/pull/440)).
91
+
92
+
93
  ### Remote Code
94
 
95
  This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example: