Training and Inference details are provided at the following Github repo: https://github.com/atharva20038/music4all/blob/main/musicgen/Readme.md
Training Configuration
Below table provides an overview of the key hyperparameters and paths used in the training process.
Parameter | Description | Value |
---|---|---|
Pretrained Model | Name of the pre-trained MusicGen model used for fine-tuning. | facebook/musicgen-medium |
Dataset Path | Path to the CSV file containing metadata for training. | /home/shivam.chauhan/Music/Atharva/Processed_Dataset/Makam_32KHz/metadata.csv |
Audio Base Path | Directory containing audio files for training. | /home/shivam.chauhan/Music/Atharva/Processed_Dataset/Makam_32KHz/ |
Model Save Path | Path where the fine-tuned model will be saved. | ./ModelsFinetuned/MusicgenMedium_with_adapters_EncoderDecoder_newMaqam.pt |
Adapter Bottleneck Dim | Size of the bottleneck layer in the adapter. | 32 |
Batch Size | Number of samples per training batch. | 4 |
Learning Rate | Step size for updating model weights. | 5e-5 |
Weight Decay | Regularization parameter to prevent overfitting. | 0.05 |
Number of Epochs | Total number of training iterations over the dataset. | 30 |
Dropout Probability | Probability of dropping units in adapter layers. | 0.1 |
Max Gradient Norm | Maximum norm for gradient clipping to prevent explosion. | 1.0 |
Train-Test Split Ratio | Proportion of data used for training vs validation. | 90:10 |
Early Stopping Patience | Number of epochs without improvement before stopping training. | 5 epochs |
Explanation of Key Components:
- Pretrained Model: A foundation model (
facebook/musicgen-medium
) that is fine-tuned for a specific task. - Adapter Bottleneck: A technique to introduce lightweight modifications without retraining the entire model.
- Batch Size: A lower batch size (4) is used, likely due to memory constraints with large audio models.
- Dropout: Helps prevent overfitting by randomly deactivating parts of the model during training.
- Gradient Clipping: Ensures stability in training by capping large gradient updates.
- Early Stopping: Prevents unnecessary training epochs if validation loss stops improving.
This configuration is optimized for fine-tuning MusicGen with adapter-based modifications for improved music generation capabilities.
Inference Configuration
This table provides an overview of the key parameters used in the inference process for generating music.
Parameter | Description | Value |
---|---|---|
Pretrained Model | Name of the pre-trained MusicGen model used for inference. | facebook/musicgen-medium |
Fine-tuned Model Path | Path where the fine-tuned model is stored. | ./ModelsFinetuned/New/MusicgenMedium_with_adapters_EncoderDecoder.pt |
Output Audio Path | Path where the generated audio file is saved. | ./GeneratedAudios/1.wav |
Waveform Graph Path | Path where the waveform visualization is stored. | ./GeneratedGraphs/1.jpeg |
Sample Rate | Desired sample rate for the generated audio. | 16,000 Hz |
Adapter Bottleneck Dim | Size of the bottleneck layer in the adapter network. | 32 |
Max New Tokens | Controls the length of the generated music (512 β 10 sec). | 512 |
Device | Specifies whether to use GPU or CPU for inference. | CUDA if available, else CPU |
Use Fine-tuned Model | Determines whether to use the fine-tuned model or pre-trained. | True (uses fine-tuned model) |
Explanation of Key Components:
- Pretrained Model: Uses
facebook/musicgen-medium
, which is fine-tuned for customized music generation. - Fine-tuned Model Path: If
use_finetuned_model = True
, the model loads from this path. - Waveform Graph Path: Saves the waveform visualization as an image.
- Max New Tokens: Higher values generate longer music samples.
- Device Selection: Automatically chooses GPU (if available) for faster inference.
π₯ How the Inference Works:
- The model is loaded (
pre-trained
orfine-tuned
based on configuration). - The user inputs a text prompt describing the music to be generated.
- The model generates an audio waveform based on the text input.
- The generated music is saved as a
.wav
file. - A waveform graph is plotted and saved for visualization.
This setup ensures efficient, high-quality music generation using MusicGen with adapter-based fine-tuning. π
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.