Model Card for Model ID
This is a Wav2Vec2-BERT model that can be used as an audio conditioning mechanism for Stable Diffusion instead of the CLIP text encoder.
Model Details
Model Description
- Developed by: Youssef Kardous
- Model type: Enhanced Wav2Vec2-BERT with a custom loss function and adapter
- License: apache-2.0
- Finetuned from model : facebook/w2v-bert-2.0
Model Sources
- Repository: https://github.com/kardSIM/audio2img
- Demo: https://huggingface.co/spaces/youzarsif/audio2img
Results
Uses
The audio2img project aims to enhance generative modeling by integrating audio embeddings into the conditioning process of models like Stable Diffusion. This integration allows for the exploration of new creative possibilities by leveraging the rich semantic information contained in audio data.
Potential Users:
Researchers and Developers
Artists and Creatives
Content Creators
Training Details
Training Data
https://huggingface.co/datasets/nateraw/fsd50k
Training Procedure
The core idea behind our training process is to achieve cross-modal alignment between audio and text embeddings using a two-stream architecture. This involves leveraging the powerful CLIPTextModel to generate text embeddings that serve as true labels for the audio embeddings produced by our Wav2Vec2Bert model.
Model Card Contact
- Email: youssefkardous1@gmail.com
- Downloads last month
- 7