Files changed (1) hide show
  1. README.md +64 -0
README.md CHANGED
@@ -33,10 +33,74 @@ These files are:
33
  Note that UniDiffuser-v0 and UniDiffuser-v1 share the same `autoencoder_kl.pth` and `caption_decoder.pth`. You only need to download them once.
34
  As for other components, they will be automatically downloaded.
35
 
 
 
 
 
 
 
 
36
 
37
  ## Usage
38
  Use the model with [UniDiffuser codebase](https://github.com/thu-ml/unidiffuser).
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## Model Details
42
  - **Model type:** Diffusion-based multi-modal generation model
33
  Note that UniDiffuser-v0 and UniDiffuser-v1 share the same `autoencoder_kl.pth` and `caption_decoder.pth`. You only need to download them once.
34
  As for other components, they will be automatically downloaded.
35
 
36
+ The `diffusers` pipeline for UniDiffuser-v1 can be downloaded as follows:
37
+
38
+ ```python
39
+ from diffusers import UniDiffuserPipeline
40
+
41
+ pipe = UniDiffuserPipeline.from_pretrained("thu-ml/unidiffuser-v1")
42
+ ```
43
 
44
  ## Usage
45
  Use the model with [UniDiffuser codebase](https://github.com/thu-ml/unidiffuser).
46
 
47
+ Here is an example using UniDiffuser-v1 with `diffusers`:
48
+
49
+ ```python
50
+ import requests
51
+ import torch
52
+ from PIL import Image
53
+ from io import BytesIO
54
+
55
+ from diffusers import UniDiffuserPipeline
56
+
57
+ device = "cuda"
58
+ model_id_or_path = "thu-ml/unidiffuser-v1"
59
+ pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path)
60
+ pipe.to(device)
61
+
62
+ # Joint image-text generation. The generation task is automatically inferred.
63
+ sample = pipe(num_inference_steps=20, guidance_scale=8.0)
64
+ image = sample.images[0]
65
+ text = sample.text[0]
66
+ image.save("unidiffuser_sample_joint_image.png")
67
+ print(text)
68
+
69
+ # The mode can be set manually. The following is equivalent to the above:
70
+ pipe.set_joint_mode()
71
+ sample2 = pipe(num_inference_steps=20, guidance_scale=8.0)
72
+
73
+ # Note that if you set the mode manually the pipeline will no longer attempt
74
+ # to automatically infer the mode. You can re-enable this with reset_mode().
75
+ pipe.reset_mode()
76
+
77
+ # Text-to-image generation.
78
+ prompt = "an elephant under the sea"
79
+
80
+ sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)
81
+ t2i_image = sample.images[0]
82
+ t2i_image.save("unidiffuser_sample_text2img_image.png")
83
+
84
+ # Image-to-text generation.
85
+ image_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/unidiffuser/unidiffuser_example_image.jpg"
86
+ response = requests.get(image_url)
87
+ init_image = Image.open(BytesIO(response.content)).convert("RGB")
88
+ init_image = init_image.resize((512, 512))
89
+
90
+ sample = pipe(image=init_image, num_inference_steps=20, guidance_scale=8.0)
91
+ i2t_text = sample.text[0]
92
+ print(text)
93
+
94
+ # Image variation can be performed with a image-to-text generation followed by a text-to-image generation:
95
+ sample = pipe(prompt=i2t_text, num_inference_steps=20, guidance_scale=8.0)
96
+ final_image = sample.images[0]
97
+ final_image.save("unidiffuser_image_variation_sample.png")
98
+
99
+ # Text variation can be performed with a text-to-image generation followed by a image-to-text generation:
100
+ sample = pipe(image=t2i_image, num_inference_steps=20, guidance_scale=8.0)
101
+ final_prompt = sample.text[0]
102
+ print(final_prompt)
103
+ ```
104
 
105
  ## Model Details
106
  - **Model type:** Diffusion-based multi-modal generation model