Any colab to reproduce the training

#2
by wilfoderek - opened

Hi friend,
I was wondering if you have any colab to reproduce your experiment of training bloom with msmarco.

It should be as easy as:

git clone https://github.com/Muennighoff/sgpt.git
pip install git+https://github.com/huggingface/accelerate
accelerate config

cd sgpt/biencoder/nli_msmarco
cd sentence-transformers; pip install -e .
cd sentence-transformers/sentence_transformers/losses/GradCache; pip install --editable .
pip install wandb
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name bigscience/bloom-7b1 --train_batch_size 32 --eval_batch_size 16 --freezenonbias --specb --lr 4e-4 --wandb --wandbwatchlog gradients --pooling weightedmean --gradcache --chunksize 8

How many gpu's have you used for this training?
Other thing. I have tested your already fine tuned bloom sgpt model for sentence embedding asymetric search but it is not so good for law domain. So i was thinking to make a new dataset like msmarco to get better results. Ehat do you think about that? Thank you in advance.

The number of GPUs are defined in accelerate config & via CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - Here I'm using 8 (A100s with 80GB). You can use much less, but it will take longer. If you run out of memory decrease `--chunksize.

Yeah I think it could help, especially if you have negatives. I.e. for each sample you want both a sample that the embedding should be close to & one it should be far away from.

Thank you so much friend!

The number of GPUs are defined in accelerate config & via CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - Here I'm using 8 (A100s with 80GB). You can use much less, but it will take longer. If you run out of memory decrease `--chunksize.

How much did it cost to run the training? And how long did it take?

Thanks

How much it costs depends on your cloud provider; In my case using those 8 A100s w/ 80GB it took like 5 hours

Sign up or log in to comment