thu-ml
/

unidiffuser-v1

UniDiffuserPipeline

image-captioning

image-variation

generative model

Model card Files Files and versions Community

Update README.md

#1

by dg845 - opened May 12, 2023

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +64 -0

README.md CHANGED Viewed

@@ -33,10 +33,74 @@ These files are:
 Note that UniDiffuser-v0 and UniDiffuser-v1 share the same `autoencoder_kl.pth` and `caption_decoder.pth`. You only need to download them once.
 As for other components, they will be automatically downloaded.
 ## Usage
 Use the model with [UniDiffuser codebase](https://github.com/thu-ml/unidiffuser).
 ## Model Details
 - **Model type:** Diffusion-based multi-modal generation model

 Note that UniDiffuser-v0 and UniDiffuser-v1 share the same `autoencoder_kl.pth` and `caption_decoder.pth`. You only need to download them once.
 As for other components, they will be automatically downloaded.
+The `diffusers` pipeline for UniDiffuser-v1 can be downloaded as follows:
+```python
+from diffusers import UniDiffuserPipeline
+pipe = UniDiffuserPipeline.from_pretrained("thu-ml/unidiffuser-v1")
+```
 ## Usage
 Use the model with [UniDiffuser codebase](https://github.com/thu-ml/unidiffuser).
+Here is an example using UniDiffuser-v1 with `diffusers`:
+```python
+import requests
+import torch
+from PIL import Image
+from io import BytesIO
+from diffusers import UniDiffuserPipeline
+device = "cuda"
+model_id_or_path = "thu-ml/unidiffuser-v1"
+pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path)
+pipe.to(device)
+# Joint image-text generation. The generation task is automatically inferred.
+sample = pipe(num_inference_steps=20, guidance_scale=8.0)
+image = sample.images[0]
+text = sample.text[0]
+image.save("unidiffuser_sample_joint_image.png")
+print(text)
+# The mode can be set manually. The following is equivalent to the above:
+pipe.set_joint_mode()
+sample2 = pipe(num_inference_steps=20, guidance_scale=8.0)
+# Note that if you set the mode manually the pipeline will no longer attempt
+# to automatically infer the mode. You can re-enable this with reset_mode().
+pipe.reset_mode()
+# Text-to-image generation.
+prompt = "an elephant under the sea"
+sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)
+t2i_image = sample.images[0]
+t2i_image.save("unidiffuser_sample_text2img_image.png")
+# Image-to-text generation.
+image_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/unidiffuser/unidiffuser_example_image.jpg"
+response = requests.get(image_url)
+init_image = Image.open(BytesIO(response.content)).convert("RGB")
+init_image = init_image.resize((512, 512))
+sample = pipe(image=init_image, num_inference_steps=20, guidance_scale=8.0)
+i2t_text = sample.text[0]
+print(text)
+# Image variation can be performed with a image-to-text generation followed by a text-to-image generation:
+sample = pipe(prompt=i2t_text, num_inference_steps=20, guidance_scale=8.0)
+final_image = sample.images[0]
+final_image.save("unidiffuser_image_variation_sample.png")
+# Text variation can be performed with a text-to-image generation followed by a image-to-text generation:
+sample = pipe(image=t2i_image, num_inference_steps=20, guidance_scale=8.0)
+final_prompt = sample.text[0]
+print(final_prompt)
+```
 ## Model Details
 - **Model type:** Diffusion-based multi-modal generation model