convbert-base-finnish / build_data.sh
aapot
Add 1M train step model
42db976
raw
history blame
314 Bytes
python3 build_pretraining_dataset.py --corpus-dir /researchdisk/training_dataset_sentences/train_splitted/ --vocab-file /researchdisk/convbert-base-finnish/vocab.txt --output-dir /researchdisk/training_dataset_sentences/train_tokenized_512 --max-seq-length 512 --num-processes 64 --no-lower-case --no-strip-accents