dmitry-vorobiev
/

rubert_ria_headlines

encoder-decoder

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

dmitry-vorobiev commited on Jan 31, 2021

Commit

5b67024

•

1 Parent(s): f70d5ba

upd readme

Files changed (1) hide show

README.md +10 -11

README.md CHANGED Viewed

@@ -10,7 +10,7 @@ license: MIT
 ## Description
 *bert2bert* model, initialized with the `DeepPavlov/rubert-base-cased` pretrained weights and
-   fine-tuned on the first 90% of ["Rossiya Segodnya" news dataset](https://github.com/RossiyaSegodnya/ria_news_dataset) for 1.6 epochs.
 ## Usage example
@@ -50,27 +50,26 @@ print(headline)
 ## How it was trained?
-Short answer - it's a mess :D
-1. [0.4 ep](https://www.kaggle.com/dvorobiev/train-seq2seq?scriptVersionId=52758945)
-2. [0.8 ep](https://www.kaggle.com/dvorobiev/train-seq2seq?scriptVersionId=52794838)
-3. [1.2 ep](https://www.kaggle.com/dvorobiev/train-seq2seq?scriptVersionId=52838778)
-4. [1.6 ep](https://www.kaggle.com/dvorobiev/train-seq2seq?scriptVersionId=52876230)
 Common train params:
 ```shell
 python nlp_headline_rus/src/train_seq2seq.py \
     --do_train \
-    --fp16 \
     --tie_encoder_decoder \
     --max_source_length 512 \
     --max_target_length 32 \
     --val_max_target_length 48 \
-    --per_device_train_batch_size 14 \
-    --gradient_accumulation_steps 4 \
-    --warmup_steps 2000 \
-    --learning_rate 3e-4 \
     --adam_epsilon 1e-6 \
     --weight_decay 1e-5 \
 ```

 ## Description
 *bert2bert* model, initialized with the `DeepPavlov/rubert-base-cased` pretrained weights and
+   fine-tuned on the first 90% of ["Rossiya Segodnya" news dataset](https://github.com/RossiyaSegodnya/ria_news_dataset) for 3 epochs.
 ## Usage example
 ## How it was trained?
+I used free TPUv3 on kaggle. The model was trained for 3 epochs with effective batch size 256 and soft restarts.
+1. [1 ep](https://www.kaggle.com/dvorobiev/try-train-seq2seq-ria-tpu?scriptVersionId=53094837)
+2. [2 ep](https://www.kaggle.com/dvorobiev/try-train-seq2seq-ria-tpu?scriptVersionId=53109219)
+3. [3 ep](https://www.kaggle.com/dvorobiev/try-train-seq2seq-ria-tpu?scriptVersionId=53171375)
 Common train params:
 ```shell
 python nlp_headline_rus/src/train_seq2seq.py \
     --do_train \
     --tie_encoder_decoder \
     --max_source_length 512 \
     --max_target_length 32 \
     --val_max_target_length 48 \
+    --tpu_num_cores 8 \
+    --per_device_train_batch_size 32 \
+    --gradient_accumulation_steps 1 \
+    --warmup_steps 500 \
+    --learning_rate 1e-3 \
     --adam_epsilon 1e-6 \
     --weight_decay 1e-5 \
 ```