teticio commited on
Commit
7c89b23
1 Parent(s): 825c8bf

update README

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -11,7 +11,7 @@ Audio can be represented as images by transforming to a [mel spectrogram](https:
11
  A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
12
 
13
  ## Generate Mel spectrogram dataset from directory of audio files
14
- ### Training can be run with Mel spectrograms of resolution 64x64 on a single commercial grade GPU (e.g. RTX 2080 Ti). The `hop_length` should be set to 1024 for better results.
15
 
16
  ```bash
17
  python src/audio_to_images.py \
@@ -21,7 +21,7 @@ python src/audio_to_images.py \
21
  --output_dir data-test
22
  ```
23
 
24
- ### Generate dataset of 256x256 Mel spectrograms and push to hub (you will need to be authenticated with `huggingface-cli login`).
25
 
26
  ```bash
27
  python src/audio_to_images.py \
@@ -31,7 +31,7 @@ python src/audio_to_images.py \
31
  --push_to_hub teticio\audio-diffusion-256
32
  ```
33
  ## Train model
34
- ### Run training on local machine.
35
 
36
  ```bash
37
  accelerate launch --config_file accelerate_local.yaml \
@@ -48,7 +48,7 @@ accelerate launch --config_file accelerate_local.yaml \
48
  --mixed_precision no
49
  ```
50
 
51
- ### Run training on local machine with `batch_size` of 1 and `gradient_accumulation_steps` 16 to compensate, so that 256x256 resolution model fits on commercial grade GPU.
52
 
53
  ```bash
54
  accelerate launch --config_file accelerate_local.yaml \
@@ -65,7 +65,7 @@ accelerate launch --config_file accelerate_local.yaml \
65
  --mixed_precision no
66
  ```
67
 
68
- ### Run training on SageMaker.
69
 
70
  ```bash
71
  accelerate launch --config_file accelerate_sagemaker.yaml \
 
11
  A DDPM model is trained on a set of mel spectrograms that have been generated from a directory of audio files. It is then used to synthesize similar mel spectrograms, which are then converted back into audio. See the `test-model.ipynb` notebook for an example.
12
 
13
  ## Generate Mel spectrogram dataset from directory of audio files
14
+ #### Training can be run with Mel spectrograms of resolution 64x64 on a single commercial grade GPU (e.g. RTX 2080 Ti). The `hop_length` should be set to 1024 for better results.
15
 
16
  ```bash
17
  python src/audio_to_images.py \
 
21
  --output_dir data-test
22
  ```
23
 
24
+ #### Generate dataset of 256x256 Mel spectrograms and push to hub (you will need to be authenticated with `huggingface-cli login`).
25
 
26
  ```bash
27
  python src/audio_to_images.py \
 
31
  --push_to_hub teticio\audio-diffusion-256
32
  ```
33
  ## Train model
34
+ #### Run training on local machine.
35
 
36
  ```bash
37
  accelerate launch --config_file accelerate_local.yaml \
 
48
  --mixed_precision no
49
  ```
50
 
51
+ #### Run training on local machine with `batch_size` of 1 and `gradient_accumulation_steps` 16 to compensate, so that 256x256 resolution model fits on commercial grade GPU.
52
 
53
  ```bash
54
  accelerate launch --config_file accelerate_local.yaml \
 
65
  --mixed_precision no
66
  ```
67
 
68
+ #### Run training on SageMaker.
69
 
70
  ```bash
71
  accelerate launch --config_file accelerate_sagemaker.yaml \