CyborgPaloma
/

0Shot1Shot-v0.1

Audio Classification

Transformers

music

code

Inference Endpoints

Model card Files Files and versions Community

CyborgPaloma commited on May 17

Commit

b254148

•

1 Parent(s): c0d8e31

Update README.md

Browse files

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -21,12 +21,12 @@ The dataset is one of my own creation, and isn't... amazing. It's called CYPAL 1
 The dataset prep script converts audio files into spectrograms for deep learning, and validates and processes audio files by resampling and removing silence. Spectrograms are generated using Librosa, validated, and saved as numpy arrays. The process includes augmentation with noise and transformations. A DataLoader and custom sampler efficiently batch the spectrograms. The training script then trains an audio classifier using a ResNet-based model on spectrogram data. It uses Optuna for hyperparameter optimization, running fifty tests at 50 epochs. Finally, it trains the model, evaluates its performance on a test set, and logs results. The resulting model includes an initial convolutional layer, followed by four residual blocks with increasing channels (64, 128, 256, 512). Each block contains two convolutional layers with batch normalization and ReLU activation. The network uses global average pooling, followed by a fully connected layer and a dropout layer, ending with a final fully connected layer for classification with softmax activation. Finally, the model training is continued to ensure convergence with fifty more epochs under the best hyperparameters that Optuna found using weighted sampling.
-Included is a sample sorting script that sorts audio files using spectrograms that it creates, and a GUI built with Tkinter. The model loaded and used to classify audio files converted to spectrograms. The classification results are used to sort the files into labeled folders, and the process is managed through a Tkinter interface that allows folder selection and displays a progress bar during sorting.
 V1 wants:
-more samples, cleaner dataset, more features (Crash, Ride, Rimshot, Tom, riser, fades, snaps, FX), higher accuracy.
 v2 wants:
-Melodic samples, instrument oneshot (for keygroups/pitched usage), breaks, loops, alternative percussion (bongos, conga, timps, shaker, rattle), Foley, Soundscapes,

 The dataset prep script converts audio files into spectrograms for deep learning, and validates and processes audio files by resampling and removing silence. Spectrograms are generated using Librosa, validated, and saved as numpy arrays. The process includes augmentation with noise and transformations. A DataLoader and custom sampler efficiently batch the spectrograms. The training script then trains an audio classifier using a ResNet-based model on spectrogram data. It uses Optuna for hyperparameter optimization, running fifty tests at 50 epochs. Finally, it trains the model, evaluates its performance on a test set, and logs results. The resulting model includes an initial convolutional layer, followed by four residual blocks with increasing channels (64, 128, 256, 512). Each block contains two convolutional layers with batch normalization and ReLU activation. The network uses global average pooling, followed by a fully connected layer and a dropout layer, ending with a final fully connected layer for classification with softmax activation. Finally, the model training is continued to ensure convergence with fifty more epochs under the best hyperparameters that Optuna found using weighted sampling.
+Included is a sample sorting script that sorts audio files using spectrograms that it creates. The model loaded and used to classify audio files converted to spectrograms. The classification results that are above 90% confidence are used to copy and sort the files into labeled folders, and the process is managed through a Tkinter interface that allows folder selection and displays a progress bar during sorting.
 V1 wants:
+more samples, cleaner dataset, more features (Crash, Ride, Rimshot, Tom, riser, fades, snaps, FX), higher accuracy (currently around 87 I think, even considering abysmal clap performance)
 v2 wants:
+Melodic samples, instrument oneshot (for keygroups/pitched usage), breaks, loops, alternative percussion (bongos, conga, timps, shaker, rattle), Foley, Soundscapes.