aseker00 commited on
Commit
5cf1afd
1 Parent(s): 9e947bd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -38,6 +38,8 @@ alephbert.eval()
38
 
39
  Trained on a DGX machine (8 V100 GPUs) using the standard huggingface training procedure.
40
 
 
 
41
  To optimize training time we split the data into 4 sections based on max number of tokens:
42
 
43
  1. num tokens < 32 (70M sentences)
38
 
39
  Trained on a DGX machine (8 V100 GPUs) using the standard huggingface training procedure.
40
 
41
+ Since the larger part of our training data is based on tweets we decided to start by optimizing using Masked Language Model loss only.
42
+
43
  To optimize training time we split the data into 4 sections based on max number of tokens:
44
 
45
  1. num tokens < 32 (70M sentences)