sentence-transformers
/

all-distilroberta-v1

Sentence Similarity

sentence-transformers

feature-extraction

Inference Endpoints

text-embeddings-inference

Model card Files Files and versions Community

nreimers commited on Aug 18, 2021

Commit

278590c

•

1 Parent(s): ac8d0d1

update readme

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -106,7 +106,7 @@ We then apply the cross entropy loss by comparing with true pairs.
 #### Hyper parameters
-We trained ou model on a TPU v3-8. We train the model during 920k steps using a batch size of 1024 (128 per TPU core).
 We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
 a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.

 #### Hyper parameters
+We trained ou model on a TPU v3-8. We train the model during 920k steps using a batch size of 512 (64 per TPU core).
 We use a learning rate warm up of 500. The sequence length was limited to 128 tokens. We used the AdamW optimizer with
 a 2e-5 learning rate. The full training script is accessible in this current repository: `train_script.py`.