How to create DataSet for Human Language learning not programming language

by Deepjyoti120 - opened Aug 14, 2023

Aug 14, 2023

Hi i am not able to train with my own dataset which is Deepjyoti120/AssameseDataTrain
some time its worked but if i randomly add some data more than 100 but if i try good data set build i own not working
give me example by check my dataset
Thank you

TinyPixel

Owner Aug 15, 2023

I usually use this colab notebook
https://colab.research.google.com/drive/1Uk9eWkUNR-KxRJL4tkgryIXDKUpMGG6j?authuser=4#scrollTo=_kbS7nRxcMt7

I think it's working

But try using max_seq_length less

Deepjyoti120

Aug 15, 2023

Here is me cmd
!autotrain llm --train --project_name AssamAiModelTrain --model TinyPixel/Llama-2-7B-bf16-sharded --data_path Deepjyoti120/AssameseDataTrain --use_peft --use_int4 --learning_rate 2e-4 --gradient_accumulation_steps 1 --train_batch_size 2 --num_train_epochs 3 --trainer sft --model_max_length 2048 --block_size 2048 --push_to_hub --repo_id myrepo
and Thank you for
https://colab.research.google.com/drive/1Uk9eWkUNR-KxRJL4tkgryIXDKUpMGG6j?authuser=4#scrollTo=_kbS7nRxcMt7
but i am not able to access

TinyPixel

Owner Aug 15, 2023

use --block_size 1024 or less because dataset is small

https://colab.research.google.com/drive/19wL4kEWPRPJfYb1Idxsn7RyypWrUaaI7?usp=sharing

Deepjyoti120

Aug 15, 2023

•

edited Aug 15, 2023

Thank you very much
but i got error

INFO creating trainer
{'loss': 1.0515, 'learning_rate': 0.0002, 'epoch': 0.5}
{'loss': 1.0515, 'learning_rate': 0.00016, 'epoch': 1.0}
{'loss': 0.9777, 'learning_rate': 0.00012, 'epoch': 1.5}
{'loss': 0.9135, 'learning_rate': 8e-05, 'epoch': 2.0}
{'loss': 0.8651, 'learning_rate': 4e-05, 'epoch': 2.5}
{'loss': 0.8316, 'learning_rate': 0.0, 'epoch': 3.0}
{'train_runtime': 152.9057, 'train_samples_per_second': 0.078, 'train_steps_per_second': 0.039, 'train_loss': 0.9484844207763672, 'epoch': 3.0}
100% 6/6 [02:32<00:00, 25.48s/it]
INFO Finished training, saving model...
INFO Merging adapter weights...
INFO Loading adapter...
Loading checkpoint shards: 57% 8/14 [00:45<00:34, 5.73s/it]^C

always end with "Loading checkpoint shards: 57% 8/14 [00:45<00:34, 5.73s/it]^C" in colab
any idea?

TinyPixel

Owner Aug 15, 2023

The free colab goes out of memory for merging
You should push checkpoint-6 folder to huggingface hub
And later try to merge it

Deepjyoti120

Aug 15, 2023

Thank you Very much for your time and share your knowledge with me 🙂

Deepjyoti120 changed discussion status to closed Aug 15, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

How to create DataSet for Human Language learning not programming language

Thank you very muchbut i got error

Thank you very much
but i got error