Milan Straka commited on
Commit
20ab64f
1 Parent(s): 41cfe08

Describe that we dropped the pooler in v1.1.

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -14,9 +14,10 @@ tags:
14
  ## Version History
15
 
16
  - **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
17
- tokenizer; the model parameters were mostly kept the same, but the embeddings
18
- were enlarged (by copying suitable rows) to correspond to the updated
19
- tokenizer.
 
20
 
21
  The tokenizer in the initial release (a) contained a hole (51959 did not
22
  correspond to any token), and (b) mapped several tokens (unseen during training
@@ -29,8 +30,9 @@ tags:
29
  mapping all tokens to a unique ID. That also required increasing the
30
  vocabulary size and embeddings weights (by replicating the embedding of the
31
  `[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
32
- the same results on any input, and the tokens in version 1.0 that mapped to
33
- a different ID than the `[UNK]` token map to the same ID in version 1.1.
 
34
 
35
  However, the sizes of the embeddings (and LM head weights and biases) are
36
  different, so the weights of the version 1.1 are not compatible with the
14
  ## Version History
15
 
16
  - **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
17
+ tokenizer described below; the model parameters were mostly kept the same, but
18
+ (a) the embeddings were enlarged (by copying suitable rows) to correspond to
19
+ the updated tokenizer, (b) the pooler was dropped (originally it was only
20
+ randomly initialized).
21
 
22
  The tokenizer in the initial release (a) contained a hole (51959 did not
23
  correspond to any token), and (b) mapped several tokens (unseen during training
30
  mapping all tokens to a unique ID. That also required increasing the
31
  vocabulary size and embeddings weights (by replicating the embedding of the
32
  `[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
33
+ the same embeddings on any input (apart from the pooler missing in v1.1),
34
+ and the tokens in version 1.0 that mapped to a different ID than the `[UNK]`
35
+ token map to the same ID in version 1.1.
36
 
37
  However, the sizes of the embeddings (and LM head weights and biases) are
38
  different, so the weights of the version 1.1 are not compatible with the