expand usage instructions in README

#2
by sam-mosaic - opened
Files changed (1) hide show
  1. README.md +36 -4
README.md CHANGED
@@ -8,26 +8,27 @@ inference: false
8
  ---
9
 
10
  # MosaicBERT-Base model
 
11
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
12
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
13
  Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
14
 
15
- ### Model Date
16
 
17
  March 2023
18
 
19
  ## Documentation
 
20
  * Blog post
21
  * [Github (mosaicml/examples/bert repo)](https://github.com/mosaicml/examples/tree/main/examples/bert)
22
 
23
- # How to use
24
-
25
- We recommend using the code in the [mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert) for pretraining and finetuning this model.
26
 
27
  ```python
28
  from transformers import AutoModelForMaskedLM
29
  mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
30
  ```
 
31
  The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
32
 
33
  ```python
@@ -35,6 +36,37 @@ from transformers import BertTokenizer
35
  tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
36
  ```
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ## Model description
39
 
40
  In order to build MosaicBERT, we adopted architectural choices from the recent transformer literature.
 
8
  ---
9
 
10
  # MosaicBERT-Base model
11
+
12
  MosaicBERT-Base is a new BERT architecture and training recipe optimized for fast pretraining.
13
  MosaicBERT trains faster and achieves higher pretraining and finetuning accuracy when benchmarked against
14
  Hugging Face's [bert-base-uncased](https://huggingface.co/bert-base-uncased).
15
 
16
+ ## Model Date
17
 
18
  March 2023
19
 
20
  ## Documentation
21
+
22
  * Blog post
23
  * [Github (mosaicml/examples/bert repo)](https://github.com/mosaicml/examples/tree/main/examples/bert)
24
 
25
+ ## How to use
 
 
26
 
27
  ```python
28
  from transformers import AutoModelForMaskedLM
29
  mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
30
  ```
31
+
32
  The tokenizer for this model is simply the Hugging Face `bert-base-uncased` tokenizer.
33
 
34
  ```python
 
36
  tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
37
  ```
38
 
39
+ To use this model directly for masked language modeling, use `pipeline`:
40
+
41
+ ```python
42
+ from transformers import AutoModelForMaskedLM, BertTokenizer, pipeline
43
+
44
+ tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
45
+ mlm = AutoModelForMaskedLM.from_pretrained('mosaicml/mosaic-bert-base', trust_remote_code=True)
46
+
47
+ classifier = pipeline('fill-mask', model=mlm, tokenizer=tokenizer)
48
+
49
+ classifier("I [MASK] to the store yesterday.")
50
+ ```
51
+
52
+ **To continue MLM pretraining**, follow the [MLM pre-training section of the mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert#mlm-pre-training).
53
+
54
+ **To fine-tune this model for classification**, follow the [Single-task fine-tuning section of the mosaicml/examples/bert repo](https://github.com/mosaicml/examples/tree/main/examples/bert#single-task-fine-tuning).
55
+
56
+ ### Remote Code
57
+
58
+ This model requires that `trust_remote_code=True` be passed to the `from_pretrained` method. This is because we train using [FlashAttention (Dao et al. 2022)](https://arxiv.org/pdf/2205.14135.pdf), which is not part of the `transformers` library and depends on [Triton](https://github.com/openai/triton) and some custom PyTorch code. Since this involves executing arbitrary code, you should consider passing a git `revision` argument that specifies the exact commit of the code, for example:
59
+
60
+ ```python
61
+ mlm = AutoModelForMaskedLM.from_pretrained(
62
+ 'mosaicml/mosaic-bert-base',
63
+ trust_remote_code=True,
64
+ revision='24512df',
65
+ )
66
+ ```
67
+
68
+ However, if there are updates to this model or code and you specify a revision, you will need to manually check for them and update the commit hash accordingly.
69
+
70
  ## Model description
71
 
72
  In order to build MosaicBERT, we adopted architectural choices from the recent transformer literature.