aapot

Add 1M train step model

42db976 over 2 years ago

No virus

314 Bytes

python3 build_pretraining_dataset.py --corpus-dir /researchdisk/training_dataset_sentences/train_splitted/ --vocab-file /researchdisk/convbert-base-finnish/vocab.txt --output-dir /researchdisk/training_dataset_sentences/train_tokenized_512 --max-seq-length 512 --num-processes 64 --no-lower-case --no-strip-accents