Spaces:
Runtime error
Runtime error
File size: 2,449 Bytes
35779aa c43b0d6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
---
title: Anything2Image
emoji: π
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 3.29.0
app_file: app.py
pinned: false
---
# Anything To Image
Generate image from anything with [ImageBind](https://github.com/facebookresearch/ImageBind)'s unified latent space and [stable-diffusion-2-1-unclip](https://huggingface.co/stabilityai/stable-diffusion-2-1-unclip).
- No training is need.
- Integration with π€ [Diffusers](https://github.com/huggingface/diffusers).
- `imagebind` is directly copy from [official repo](https://github.com/facebookresearch/ImageBind) with modification.
- Gradio Demo.
## Audio to Image
| `assets/wav/bird_audio.wav` | `assets/wav/dog_audio.wav` | `assets/wav/cattle.wav`
| --- | --- | --- |
| ![](assets/generated/bird_audio.png) | ![](assets/generated/dog_audio.png) |![](assets/generated/cattle.png) |
```python
import imagebind
import torch
from diffusers import StableUnCLIPImg2ImgPipeline
# construct models
device = "cuda:0" if torch.cuda.is_available() else "cpu"
pipe = StableUnCLIPImg2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-unclip", torch_dtype=torch.float16, variation="fp16"
)
pipe = pipe.to(device)
model = imagebind.imagebind_huge(pretrained=True)
model.eval()
model.to(device)
# generate image
with torch.no_grad():
audio_paths=["assets/wav/bird_audio.wav"]
embeddings = model.forward({
imagebind.ModalityType.AUDIO: imagebind.load_and_transform_audio_data(audio_paths, device),
})
embeddings = embeddings[imagebind.ModalityType.AUDIO]
images = pipe(image_embeds=embeddings.half()).images
images[0].save("bird_audio.png")
```
## More
Under construction
## Citation
Latent Diffusion
```bibtex
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Bj\"orn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
```
ImageBind
```bibtex
@inproceedings{girdhar2023imagebind,
title={ImageBind: One Embedding Space To Bind Them All},
author={Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang
and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
booktitle={CVPR},
year={2023}
}
``` |