This model was derived from the bert-base-uncased checkpoint by replacing the GELU with ReLU activation function and further pre-training through several iterations to adapt it to the change of the activation function.
- Downloads last month
- 120
This model was derived from the bert-base-uncased checkpoint by replacing the GELU with ReLU activation function and further pre-training through several iterations to adapt it to the change of the activation function.