google-bert
/

bert-large-uncased-whole-word-masking

@@ -58,34 +58,42 @@ You can use this model directly with a pipeline for masked language modeling:
 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='bert-large-uncased')
 >>> unmasker("Hello I'm a [MASK] model.")
-[{'sequence': "[CLS] hello i'm a fashion model. [SEP]",
-  'score': 0.1886913776397705,
-  'token': 4827,
-  'token_str': 'fashion'},
- {'sequence': "[CLS] hello i'm a professional model. [SEP]",
-  'score': 0.07157472521066666,
-  'token': 2658,
-  'token_str': 'professional'},
- {'sequence': "[CLS] hello i'm a male model. [SEP]",
-  'score': 0.04053466394543648,
-  'token': 3287,
-  'token_str': 'male'},
- {'sequence': "[CLS] hello i'm a role model. [SEP]",
-  'score': 0.03891477733850479,
-  'token': 2535,
-  'token_str': 'role'},
- {'sequence': "[CLS] hello i'm a fitness model. [SEP]",
-  'score': 0.03038121573626995,
-  'token': 10516,
-  'token_str': 'fitness'}]
 ```
 Here is how to use this model to get the features of a given text in PyTorch:
 ```python
 from transformers import BertTokenizer, BertModel
-tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
-model = BertModel.from_pretrained("bert-large-uncased")
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
@@ -95,8 +103,8 @@ and in TensorFlow:
 ```python
 from transformers import BertTokenizer, TFBertModel
-tokenizer = BertTokenizer.from_pretrained('bert-large-uncased')
-model = TFBertModel.from_pretrained("bert-large-uncased")
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
@@ -111,50 +119,72 @@ predictions:
 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='bert-large-uncased')
 >>> unmasker("The man worked as a [MASK].")
-[{'sequence': '[CLS] the man worked as a bartender. [SEP]',
-  'score': 0.10426565259695053,
-  'token': 15812,
-  'token_str': 'bartender'},
- {'sequence': '[CLS] the man worked as a waiter. [SEP]',
-  'score': 0.10232779383659363,
-  'token': 15610,
-  'token_str': 'waiter'},
- {'sequence': '[CLS] the man worked as a mechanic. [SEP]',
-  'score': 0.06281787157058716,
-  'token': 15893,
-  'token_str': 'mechanic'},
- {'sequence': '[CLS] the man worked as a lawyer. [SEP]',
-  'score': 0.050936125218868256,
-  'token': 5160,
-  'token_str': 'lawyer'},
- {'sequence': '[CLS] the man worked as a carpenter. [SEP]',
-  'score': 0.041034240275621414,
-  'token': 10533,
-  'token_str': 'carpenter'}]
 >>> unmasker("The woman worked as a [MASK].")
-[{'sequence': '[CLS] the woman worked as a waitress. [SEP]',
-  'score': 0.28473711013793945,
-  'token': 13877,
-  'token_str': 'waitress'},
- {'sequence': '[CLS] the woman worked as a nurse. [SEP]',
-  'score': 0.11336520314216614,
-  'token': 6821,
-  'token_str': 'nurse'},
- {'sequence': '[CLS] the woman worked as a bartender. [SEP]',
-  'score': 0.09574324637651443,
-  'token': 15812,
-  'token_str': 'bartender'},
- {'sequence': '[CLS] the woman worked as a maid. [SEP]',
-  'score': 0.06351090222597122,
-  'token': 10850,
-  'token_str': 'maid'},
- {'sequence': '[CLS] the woman worked as a secretary. [SEP]',
-  'score': 0.048970773816108704,
-  'token': 3187,
-  'token_str': 'secretary'}]
 ```
 This bias will also affect all fine-tuned versions of this model.

 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='bert-large-uncased')
 >>> unmasker("Hello I'm a [MASK] model.")
+[
+    {
+        'sequence': "[CLS] hello i'm a fashion model. [SEP]",
+        'score': 0.15813860297203064,
+        'token': 4827,
+        'token_str': 'fashion'
+    }, {
+        'sequence': "[CLS] hello i'm a cover model. [SEP]",
+        'score': 0.10551052540540695,
+        'token': 3104,
+        'token_str': 'cover'
+    }, {
+        'sequence': "[CLS] hello i'm a male model. [SEP]",
+        'score': 0.08340442180633545,
+        'token': 3287,
+        'token_str': 'male'
+    }, {
+        'sequence': "[CLS] hello i'm a super model. [SEP]",
+        'score': 0.036381796002388,
+        'token': 3565,
+        'token_str': 'super'
+    }, {
+        'sequence': "[CLS] hello i'm a top model. [SEP]",
+        'score': 0.03609578311443329,
+        'token': 2327,
+        'token_str': 'top'
+    }
+]
 ```
 Here is how to use this model to get the features of a given text in PyTorch:
 ```python
 from transformers import BertTokenizer, BertModel
+tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking')
+model = BertModel.from_pretrained("bert-large-uncased-whole-word-masking")
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='pt')
 output = model(**encoded_input)
 ```python
 from transformers import BertTokenizer, TFBertModel
+tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking')
+model = TFBertModel.from_pretrained("bert-large-uncased-whole-word-masking")
 text = "Replace me by any text you'd like."
 encoded_input = tokenizer(text, return_tensors='tf')
 output = model(encoded_input)
 >>> from transformers import pipeline
 >>> unmasker = pipeline('fill-mask', model='bert-large-uncased')
 >>> unmasker("The man worked as a [MASK].")
+[
+   {
+      "sequence":"[CLS] the man worked as a waiter. [SEP]",
+      "score":0.09823174774646759,
+      "token":15610,
+      "token_str":"waiter"
+   },
+   {
+      "sequence":"[CLS] the man worked as a carpenter. [SEP]",
+      "score":0.08976428955793381,
+      "token":10533,
+      "token_str":"carpenter"
+   },
+   {
+      "sequence":"[CLS] the man worked as a mechanic. [SEP]",
+      "score":0.06550426036119461,
+      "token":15893,
+      "token_str":"mechanic"
+   },
+   {
+      "sequence":"[CLS] the man worked as a butcher. [SEP]",
+      "score":0.04142395779490471,
+      "token":14998,
+      "token_str":"butcher"
+   },
+   {
+      "sequence":"[CLS] the man worked as a barber. [SEP]",
+      "score":0.03680137172341347,
+      "token":13362,
+      "token_str":"barber"
+   }
+]
 >>> unmasker("The woman worked as a [MASK].")
+[
+   {
+      "sequence":"[CLS] the woman worked as a waitress. [SEP]",
+      "score":0.2669651508331299,
+      "token":13877,
+      "token_str":"waitress"
+   },
+   {
+      "sequence":"[CLS] the woman worked as a maid. [SEP]",
+      "score":0.13054853677749634,
+      "token":10850,
+      "token_str":"maid"
+   },
+   {
+      "sequence":"[CLS] the woman worked as a nurse. [SEP]",
+      "score":0.07987703382968903,
+      "token":6821,
+      "token_str":"nurse"
+   },
+   {
+      "sequence":"[CLS] the woman worked as a prostitute. [SEP]",
+      "score":0.058545831590890884,
+      "token":19215,
+      "token_str":"prostitute"
+   },
+   {
+      "sequence":"[CLS] the woman worked as a cleaner. [SEP]",
+      "score":0.03834161534905434,
+      "token":20133,
+      "token_str":"cleaner"
+   }
+]
 ```
 This bias will also affect all fine-tuned versions of this model.