diff --git "a/log.txt" "b/log.txt" --- "a/log.txt" +++ "b/log.txt" @@ -1,55058 +1,3 @@ -ARGUMENT: --model-name=bert-fromscratch-galician-tiny -ARGUMENT: --dmodel=./models/tiny -ARGUMENT: --tokens=./models/tiny -ARGUMENT: --train-source=./models/corpus/glwiki.train.sorted.txt -ARGUMENT: --test-source=./models/corpus/glwiki.test.txt -ARGUMENT: --use-pretrained -ARGUMENT: --epochs=15 -ARGUMENT: --batch-size=8 -torch.cuda.is_available True -torch.cuda.device_count 1 -torch.cuda.current_device cuda -torch.cuda.device #0 -torch.cuda.get_device_name #0 NVIDIA RTX A5000 -Device = cuda -DModel = /home/pcjf/CESGA/works/lmodels/models/tiny -Sources = ['/home/pcjf/CESGA/works/lmodels/models/corpus/glwiki.train.sorted.txt', '/home/pcjf/CESGA/works/lmodels/models/corpus/glwiki.test.txt'] -Config = /home/pcjf/CESGA/works/lmodels/models/tiny/config.json -GPU memory occupied: 308 MB. -/home/pcjf/miniconda3/envs/lmodels/lib/python3.7/site-packages/transformers/data/datasets/language_modeling.py:125: FutureWarning: This dataset will be removed from the library soon, preprocessing should be handled with the 🤗 Datasets library. You can have a look at this example script for pointers: https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py - FutureWarning, -/home/pcjf/miniconda3/envs/lmodels/lib/python3.7/site-packages/huggingface_hub/utils/_deprecation.py:97: FutureWarning: Deprecated argument(s) used in '__init__': private. Will not be supported from version '0.12'. - warnings.warn(message, FutureWarning) -/home/pcjf/CESGA/works/lmodels/models/tiny is already a clone of https://huggingface.co/fpuentes/bert-fromscratch-galician-tiny. Make sure you pull the latest changes with `repo.git_pull()`. -Using cuda_amp half precision backend -/home/pcjf/miniconda3/envs/lmodels/lib/python3.7/site-packages/accelerate/memory_utils.py:26: FutureWarning: memory_utils has been reorganized to utils.memory. Import `find_executable_batchsize` from the main `__init__`: `from accelerate import find_executable_batch_size` to avoid this warning. - FutureWarning, -***** Running training ***** - Num examples = 2296488 - Num Epochs = 15 - Instantaneous batch size per device = 8 - Total train batch size (w. parallel, distributed & accumulation) = 256 - Gradient Accumulation steps = 32 - Total optimization steps = 134550 - Number of trainable parameters = 67356954 - 0%| | 0/134550 [00:00