How to create DataSet for Human Language learning not programming language

#2
by Deepjyoti120 - opened

Hi i am not able to train with my own dataset which is Deepjyoti120/AssameseDataTrain
some time its worked but if i randomly add some data more than 100 but if i try good data set build i own not working
give me example by check my dataset
Thank you

I usually use this colab notebook
https://colab.research.google.com/drive/1Uk9eWkUNR-KxRJL4tkgryIXDKUpMGG6j?authuser=4#scrollTo=_kbS7nRxcMt7

I think it's working

But try using max_seq_length less

Here is me cmd
!autotrain llm --train --project_name AssamAiModelTrain --model TinyPixel/Llama-2-7B-bf16-sharded --data_path Deepjyoti120/AssameseDataTrain --use_peft --use_int4 --learning_rate 2e-4 --gradient_accumulation_steps 1 --train_batch_size 2 --num_train_epochs 3 --trainer sft --model_max_length 2048 --block_size 2048 --push_to_hub --repo_id myrepo
and Thank you for
https://colab.research.google.com/drive/1Uk9eWkUNR-KxRJL4tkgryIXDKUpMGG6j?authuser=4#scrollTo=_kbS7nRxcMt7
but i am not able to access

use --block_size 1024 or less because dataset is small

https://colab.research.google.com/drive/19wL4kEWPRPJfYb1Idxsn7RyypWrUaaI7?usp=sharing

Thank you very much
but i got error

INFO creating trainer
{'loss': 1.0515, 'learning_rate': 0.0002, 'epoch': 0.5}
{'loss': 1.0515, 'learning_rate': 0.00016, 'epoch': 1.0}
{'loss': 0.9777, 'learning_rate': 0.00012, 'epoch': 1.5}
{'loss': 0.9135, 'learning_rate': 8e-05, 'epoch': 2.0}
{'loss': 0.8651, 'learning_rate': 4e-05, 'epoch': 2.5}
{'loss': 0.8316, 'learning_rate': 0.0, 'epoch': 3.0}
{'train_runtime': 152.9057, 'train_samples_per_second': 0.078, 'train_steps_per_second': 0.039, 'train_loss': 0.9484844207763672, 'epoch': 3.0}
100% 6/6 [02:32<00:00, 25.48s/it]
INFO Finished training, saving model...
INFO Merging adapter weights...
INFO Loading adapter...
Loading checkpoint shards: 57% 8/14 [00:45<00:34, 5.73s/it]^C


always end with "Loading checkpoint shards: 57% 8/14 [00:45<00:34, 5.73s/it]^C" in colab
any idea?
Screenshot 2023-08-15 at 10.31.30 AM.png

The free colab goes out of memory for merging
You should push checkpoint-6 folder to huggingface hub
And later try to merge it

Thank you Very much for your time and share your knowledge with me πŸ™‚

Deepjyoti120 changed discussion status to closed

Sign up or log in to comment