running very long without results?

#53
by wirrkopp - opened

Hi there,
to compare trained images I tried (perhaps) to much:
52 images from a person and 5000 steps.
I let it run for 4 hours without ressult.

the log looks like this for a longer time period already:

Downloading: 97%|█████████▋| 1.18G/1.22G [00:15<00:00, 73.6MB/s]

Downloading: 98%|█████████▊| 1.19G/1.22G [00:15<00:00, 73.5MB/s]

Downloading: 98%|█████████▊| 1.19G/1.22G [00:15<00:00, 74.1MB/s]

Downloading: 99%|█████████▉| 1.20G/1.22G [00:15<00:00, 74.7MB/s]

Downloading: 99%|█████████▉| 1.21G/1.22G [00:15<00:00, 75.1MB/s]
Downloading: 100%|██████████| 1.22G/1.22G [00:16<00:00, 75.8MB/s]

Fetching 5 files: 100%|██████████| 5/5 [00:16<00:00, 3.99s/it]
Fetching 5 files: 100%|██████████| 5/5 [00:16<00:00, 3.28s/it]
Running on local URL: http://0.0.0.0:7860

To create a public link, set share=True in launch().

The estimated costs looked like this:

You are going to train 1 person(s), with 52 images for 5 steps. The training should take around 6.25 seconds, or 0.1 minutes. The setup, compression and uploading the model can take up to 20 minutes.
As the T4-Small GPU costs US$0.6 for 1h, the estimated cost for this training is below US$0.24.

Any help would be very nice!

I had the same experience. Ended up rebooting and moving back to CPU after 27+ hours. Original estimate said 1 hour.

This issues should be fixed. Also now if an error occurs, the GPU is removed automatically to avoid the Space staying on with a CUDA error on it

multimodalart changed discussion status to closed

Sign up or log in to comment