pere commited on
Commit
6388137
1 Parent(s): 24a1f9a
README.md CHANGED
@@ -24,4 +24,6 @@ widget:
24
 
25
  # Scandinavian XLM-RoBERTa (base-sized model)
26
 
27
- This model is currently being created. Do not use yet.
 
 
24
 
25
  # Scandinavian XLM-RoBERTa (base-sized model)
26
 
27
+ This model is currently being created. Do not use yet.
28
+
29
+ Adjusting down lr from 1e4 to 5e5 since we have some instability. Restarting Nov 6. Training only 500k steps.
events.out.tfevents.1667732309.t1v-n-101cf975-w-0.1839856.0.v2 ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:faa4b7b6481395d636dde622fbfede7a6d179ea93e0afdefe38f2c9e95ae8430
3
+ size 28280812
flax_model.msgpack ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4de7fc184dcf3cc1b013651a25806d1417baa8be6140839cd6bdb58b3f0d2bb
3
+ size 1113187999
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:949e34bfaa78f30854bf24df0994074ba8bf900ca0016370c1e54e04120c689c
3
+ size 1113251641
run.sh CHANGED
@@ -8,10 +8,10 @@ python run_mlm_flax_stream.py \
8
  --weight_decay="0.01" \
9
  --per_device_train_batch_size="62" \
10
  --per_device_eval_batch_size="62" \
11
- --learning_rate="1e-4" \
12
  --warmup_steps="10000" \
13
  --overwrite_output_dir \
14
- --num_train_steps="1000000" \
15
  --adam_beta1="0.9" \
16
  --adam_beta2="0.98" \
17
  --logging_steps="5000" \
8
  --weight_decay="0.01" \
9
  --per_device_train_batch_size="62" \
10
  --per_device_eval_batch_size="62" \
11
+ --learning_rate="5e-5" \
12
  --warmup_steps="10000" \
13
  --overwrite_output_dir \
14
+ --num_train_steps="500000" \
15
  --adam_beta1="0.9" \
16
  --adam_beta2="0.98" \
17
  --logging_steps="5000" \