lysandre HF staff commited on
Commit
0ba5a4b
1 Parent(s): 03954ae

XXLarge v1 -> xlarge v2

Browse files
Files changed (1) hide show
  1. README.md +11 -11
README.md CHANGED
@@ -36,15 +36,15 @@ classifier using the features produced by the ALBERT model as inputs.
36
 
37
  ALBERT is particular in that it shares its layers across its Transformer. Therefore, all layers have the same weights. Using repeating layers results in a small memory footprint, however, the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
38
 
39
- This is the first version of the xxlarge model. Version 2 is different from version 1 due to different dropout rates, additional training data, and longer training. It has better results in nearly all downstream tasks.
40
 
41
  This model has the following configuration:
42
 
43
- - 12 repeating layers
44
  - 128 embedding dimension
45
- - 4096 hidden dimension
46
- - 64 attention heads
47
- - 223M parameters
48
 
49
  ## Intended uses & limitations
50
 
@@ -62,7 +62,7 @@ You can use this model directly with a pipeline for masked language modeling:
62
 
63
  ```python
64
  >>> from transformers import pipeline
65
- >>> unmasker = pipeline('fill-mask', model='albert-xxlarge-v1')
66
  >>> unmasker("Hello I'm a [MASK] model.")
67
  [
68
  {
@@ -102,8 +102,8 @@ Here is how to use this model to get the features of a given text in PyTorch:
102
 
103
  ```python
104
  from transformers import AlbertTokenizer, AlbertModel
105
- tokenizer = AlbertTokenizer.from_pretrained('albert-xxlarge-v1')
106
- model = AlbertModel.from_pretrained("albert-xxlarge-v1")
107
  text = "Replace me by any text you'd like."
108
  encoded_input = tokenizer(text, return_tensors='pt')
109
  output = model(**encoded_input)
@@ -113,8 +113,8 @@ and in TensorFlow:
113
 
114
  ```python
115
  from transformers import AlbertTokenizer, TFAlbertModel
116
- tokenizer = AlbertTokenizer.from_pretrained('albert-xxlarge-v1')
117
- model = TFAlbertModel.from_pretrained("albert-xxlarge-v1")
118
  text = "Replace me by any text you'd like."
119
  encoded_input = tokenizer(text, return_tensors='tf')
120
  output = model(encoded_input)
@@ -127,7 +127,7 @@ predictions:
127
 
128
  ```python
129
  >>> from transformers import pipeline
130
- >>> unmasker = pipeline('fill-mask', model='albert-xxlarge-v1')
131
  >>> unmasker("The man worked as a [MASK].")
132
 
133
  [
 
36
 
37
  ALBERT is particular in that it shares its layers across its Transformer. Therefore, all layers have the same weights. Using repeating layers results in a small memory footprint, however, the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
38
 
39
+ This is the second version of the xlarge model. Version 2 is different from version 1 due to different dropout rates, additional training data, and longer training. It has better results in nearly all downstream tasks.
40
 
41
  This model has the following configuration:
42
 
43
+ - 24 repeating layers
44
  - 128 embedding dimension
45
+ - 2048 hidden dimension
46
+ - 16 attention heads
47
+ - 58M parameters
48
 
49
  ## Intended uses & limitations
50
 
 
62
 
63
  ```python
64
  >>> from transformers import pipeline
65
+ >>> unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
66
  >>> unmasker("Hello I'm a [MASK] model.")
67
  [
68
  {
 
102
 
103
  ```python
104
  from transformers import AlbertTokenizer, AlbertModel
105
+ tokenizer = AlbertTokenizer.from_pretrained('albert-xlarge-v2')
106
+ model = AlbertModel.from_pretrained("albert-xlarge-v2")
107
  text = "Replace me by any text you'd like."
108
  encoded_input = tokenizer(text, return_tensors='pt')
109
  output = model(**encoded_input)
 
113
 
114
  ```python
115
  from transformers import AlbertTokenizer, TFAlbertModel
116
+ tokenizer = AlbertTokenizer.from_pretrained('albert-xlarge-v2')
117
+ model = TFAlbertModel.from_pretrained("albert-xlarge-v2")
118
  text = "Replace me by any text you'd like."
119
  encoded_input = tokenizer(text, return_tensors='tf')
120
  output = model(encoded_input)
 
127
 
128
  ```python
129
  >>> from transformers import pipeline
130
+ >>> unmasker = pipeline('fill-mask', model='albert-xlarge-v2')
131
  >>> unmasker("The man worked as a [MASK].")
132
 
133
  [