minhtriphan commited on
Commit
eda0730
1 Parent(s): c2664de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -19,9 +19,9 @@ We're training the model again with more care and some tricks to enhance the sem
19
  Furthermore, the model is trained longer (~10 epochs~ 8 epochs). ~The new pre-trained model weights will be updated as soon as the training and validation are completed.~
20
 
21
  # Time and space efficiency
22
- We compare the time and space efficiency of this model and some competitors. For these competitors, we clone the positional embedding layers so that they can accept input sequences with maximum length of 65536 tokens.
23
 
24
- The experiments are implemented with an NVIDIA A100-SXM4-40GB. Batch size of 1. The figures show the time and memory needed to run one batch. In the training mode, forward pass and backpropagation is included. In the inferring model, only forward pass is included.
25
 
26
  ## Training mode
27
 
 
19
  Furthermore, the model is trained longer (~10 epochs~ 8 epochs). ~The new pre-trained model weights will be updated as soon as the training and validation are completed.~
20
 
21
  # Time and space efficiency
22
+ We compare the time and space efficiency of this model and some competitors. For these competitors, we clone the positional embedding layers so that they can accept input sequences with the maximum length of 65536 tokens.
23
 
24
+ The experiments are implemented with an NVIDIA A100-SXM4-40GB. Batch size of 1. The figures show the time and memory needed to run one batch. In the training mode, forward pass and backpropagation are included. In the inferring model, only forward pass is included.
25
 
26
  ## Training mode
27