dg845 commited on
Commit
f2a0ef2
1 Parent(s): 4ecc957

Update README.md

Browse files

Add usage example for UniDiffuser-v0

Files changed (1) hide show
  1. README.md +55 -1
README.md CHANGED
@@ -25,7 +25,61 @@ We provide two versions of UniDiffuser:
25
 
26
  ## Example
27
 
28
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Model Details
31
  - **Model type:** Diffusion-based multi-modal generation model
 
25
 
26
  ## Example
27
 
28
+ ```python
29
+ import requests
30
+ import torch
31
+ from PIL import Image
32
+ from io import BytesIO
33
+
34
+ from diffusers import UniDiffuserPipeline
35
+
36
+ device = "cuda"
37
+ model_id_or_path = "dg845/unidiffuser-diffusers-v0"
38
+ pipe = UniDiffuserPipeline.from_pretrained(model_id_or_path)
39
+ pipe.to(device)
40
+
41
+ # Joint image-text generation. The generation task is automatically inferred.
42
+ sample = pipe(num_inference_steps=20, guidance_scale=8.0)
43
+ image = sample.images[0]
44
+ text = sample.text[0]
45
+ image.save("unidiffuser_sample_joint_image.png")
46
+ print(text)
47
+
48
+ # The mode can be set manually. The following is equivalent to the above:
49
+ pipe.set_joint_mode()
50
+ sample2 = pipe(num_inference_steps=20, guidance_scale=8.0)
51
+
52
+ # Note that if you set the mode manually the pipeline will no longer attempt
53
+ # to automatically infer the mode. You can re-enable this with reset_mode().
54
+ pipe.reset_mode()
55
+
56
+ # Text-to-image generation.
57
+ prompt = "an elephant under the sea"
58
+
59
+ sample = pipe(prompt=prompt, num_inference_steps=20, guidance_scale=8.0)
60
+ t2i_image = sample.images[0]
61
+ t2i_image.save("unidiffuser_sample_text2img_image.png")
62
+
63
+ # Image-to-text generation.
64
+ image_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/unidiffuser/unidiffuser_example_image.jpg"
65
+ response = requests.get(image_url)
66
+ init_image = Image.open(BytesIO(response.content)).convert("RGB")
67
+ init_image = init_image.resize((512, 512))
68
+
69
+ sample = pipe(image=init_image, num_inference_steps=20, guidance_scale=8.0)
70
+ i2t_text = sample.text[0]
71
+ print(text)
72
+
73
+ # Image variation can be performed with a image-to-text generation followed by a text-to-image generation:
74
+ sample = pipe(prompt=i2t_text, num_inference_steps=20, guidance_scale=8.0)
75
+ final_image = sample.images[0]
76
+ final_image.save("unidiffuser_image_variation_sample.png")
77
+
78
+ # Text variation can be performed with a text-to-image generation followed by a image-to-text generation:
79
+ sample = pipe(image=t2i_image, num_inference_steps=20, guidance_scale=8.0)
80
+ final_prompt = sample.text[0]
81
+ print(final_prompt)
82
+ ```
83
 
84
  ## Model Details
85
  - **Model type:** Diffusion-based multi-modal generation model