dcher95 commited on
Commit
667991a
·
verified ·
1 Parent(s): e237035

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - controlnet
5
+ - stable-diffusion
6
+ - satellite-imagery
7
+ - osm
8
+ - image-to-image
9
+ - diffusers
10
+ base_model: stabilityai/stable-diffusion-2-1-base
11
+ pipeline_tag: image-to-image
12
+ library_name: diffusers
13
+ ---
14
+
15
+ # VectorSynth
16
+
17
+ **VectorSynth** is a ControlNet model that generates satellite imagery from OpenStreetMap (OSM) vector data embeddings. It conditions [Stable Diffusion 2.1 Base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) on rendered OSM text to synthesize realistic aerial imagery.
18
+
19
+ ## Model Description
20
+
21
+ VectorSynth uses a two-stage pipeline:
22
+ 1. **RenderEncoder**: Projects 768-dim CLIP text embeddings of OSM text to 3-channel control images
23
+ 2. **ControlNet**: Conditions Stable Diffusion 2.1 on the rendered control images
24
+
25
+ This model uses standard CLIP embeddings. For the COSA embedding variant, see [VectorSynth-COSA](https://huggingface.co/MVRL/VectorSynth-COSA).
26
+
27
+ ## Usage
28
+
29
+ ```python
30
+ import torch
31
+ from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, DDIMScheduler
32
+ from huggingface_hub import hf_hub_download
33
+
34
+ device = "cuda"
35
+
36
+ # Load ControlNet
37
+ controlnet = ControlNetModel.from_pretrained("MVRL/VectorSynth", torch_dtype=torch.float16)
38
+
39
+ # Load pipeline
40
+ pipe = StableDiffusionControlNetPipeline.from_pretrained(
41
+ "stabilityai/stable-diffusion-2-1-base",
42
+ controlnet=controlnet,
43
+ torch_dtype=torch.float16
44
+ )
45
+ pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
46
+ pipe = pipe.to(device)
47
+
48
+ # Load RenderEncoder
49
+ render_path = hf_hub_download("MVRL/VectorSynth", "render_encoder/clip-render_encoder.pth")
50
+ checkpoint = torch.load(render_path, map_location=device, weights_only=False)
51
+ render_encoder = checkpoint['model'].to(device).eval()
52
+
53
+ # Your hint tensor should be (H, W, 768) - per-pixel CLIP embeddings of OSM text
54
+ # hint = torch.load("your_hint.pt").to(device)
55
+ # hint = hint.unsqueeze(0).permute(0, 3, 1, 2) # (1, 768, H, W)
56
+
57
+ # with torch.no_grad():
58
+ # control_image = render_encoder(hint).sigmoid()
59
+
60
+ # Generate
61
+ # output = pipe(
62
+ # prompt="Satellite image of a city neighborhood",
63
+ # image=control_image,
64
+ # num_inference_steps=40,
65
+ # guidance_scale=7.5
66
+ # ).images[0]
67
+ ```
68
+
69
+ ## Files
70
+
71
+ - `config.json` - ControlNet configuration
72
+ - `diffusion_pytorch_model.safetensors` - ControlNet weights
73
+ - `render_encoder/clip-render_encoder.pth` - RenderEncoder weights
74
+ - `render.py` - RenderEncoder class definition
75
+
76
+ ## Citation
77
+
78
+ ```bibtex
79
+ @inproceedings{cher2025vectorsynth,
80
+ title={VectorSynth: Fine-Grained Satellite Image Synthesis with Structured Semantics},
81
+ author={Cher, Daniel and Wei, Brian and Sastry, Srikumar and Jacobs, Nathan},
82
+ year={2025},
83
+ eprint={arXiv:2511.07744},
84
+ note={arXiv preprint}
85
+ }
86
+ ```
87
+
88
+ ## Related Models
89
+
90
+ - [VectorSynth-COSA](https://huggingface.co/MVRL/VectorSynth-COSA) - COSA embedding variant
91
+ - [GeoSynth](https://huggingface.co/MVRL/GeoSynth) - Text-to-satellite image generation