Baptlem commited on
Commit
39db156
1 Parent(s): b1d16a6

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +31 -26
app.py CHANGED
@@ -19,54 +19,56 @@ if gr.__version__ != "3.28.3": #doesn't work...
19
  os.system("pip install gradio==3.28.3")
20
 
21
  title_description = """
22
- # SynDRoM
23
- ## Synthetic Data augmentation for Robotic Manipulation
24
 
25
  """
26
 
27
  description = """
28
- Our project is to use diffusion model to change the texture of our robotic arm simulation.
29
- To do so, we first get our simulated images. After, we process these images to get Canny Edge maps. Finally, we can get brand new images by using ControlNet.
30
- Therefore, we are able to change our simulation texture, and still keep the image composition.
31
 
 
 
32
 
33
- Our objectif for the sprint is to perform data augmentation using ControlNet. So we look for having a model that can augment an image quickly.
34
- To do so, we trained many Controlnets from scratch with different datasets :
 
 
 
35
  * [Coyo-700M](https://github.com/kakaobrain/coyo-dataset)
36
  * [Bridge](https://sites.google.com/view/bridgedata)
37
 
38
- A method to accelerate the inference of diffusion model is by simply generating small images. So we decided to work with low resolution images.
39
- After downloading the datasets, we processed them by resizing images to a 128 resolution.
40
- The smallest side of the image (width or height) is resized to 128 and the other side is resized keeping the initial ratio.
41
- After, we retrieve the Canny Edge Map of the images. We performed this preprocess for every datasets we use during the sprint.
42
-
43
-
44
- We train four different Controlnets. For each one of them, we processed the datasets differently. You can find the description of the processing in the readme file attached to the model repo
45
- [Our ControlNet repo](https://huggingface.co/Baptlem/baptlem-controlnet)
46
 
47
- For now, we benchmarked our model on a node of 4 Titan RTX 24Go. We were able to generate a batch of 4 images in a average time of 1.3 seconds!
48
- We also have access to nodes composed of 8 A100 80Go GPUs. The benchmark on one of these nodes will come soon.
49
-
50
 
51
  """
52
 
53
  traj_description = """
54
- We generated a trajectory of our simulated environment. We will then use it with our different models.
55
- We made these videos on our Titan RTX node.
56
- The prompt we use for every video is "A robotic arm with a gripper and a small cube on a table, super realistic, industrial background"
 
57
  """
58
 
59
 
60
  perfo_description = """
61
- The Table on the right shows the performances of our models running on different nodes.
62
- To make the benchmark, we loaded one of our model on every GPUs of the node. We then retrieve an episode of our simulation.
63
- For every frame of the episode, we preprocess the image (resize, canny, ...) and process the Canny image on the GPUs.
64
- We repeated this procedure for different Batch Size (BS).
 
65
 
66
  We can see that the greater the BS the greater the FPS. By increazing the BS, we take advantage of the parallelization of the GPUs.
67
-
68
  """
69
 
 
 
 
 
 
70
 
71
  def create_key(seed=0):
72
  return jax.random.PRNGKey(seed)
@@ -317,6 +319,9 @@ def create_demo(process, max_images=12, default_num_images=4):
317
  with gr.Column():
318
  gr.Image("./perfo_rtx.png",
319
  interactive=False)
 
 
 
320
 
321
 
322
 
 
19
  os.system("pip install gradio==3.28.3")
20
 
21
  title_description = """
22
+ # UCDR-Net
23
+ ## Unlimited Controlled Domain Randomization Network for Bridging the Sim2Real Gap in Robotics
24
 
25
  """
26
 
27
  description = """
28
+ While existing ControlNet and public diffusion models are predominantly geared towards high-resolution images (512x512 or above) and intricate artistic detail generation, there's an untapped potential of these models in Automatic Data Augmentation (ADA).
29
+ By harnessing the inherent variance in prompt-conditioned generated images, we can significantly boost the visual diversity of training samples for computer vision pipelines.
30
+ This is particularly relevant in the field of robotics, where deep learning is increasingly playing a pivotal role in training policies for robotic manipulation from images.
31
 
32
+ In this HuggingFace sprint, we present UCDR-Net (Unlimited Controlled Domain Randomization Network), a novel CannyEdge mini-ControlNet trained on Stable Diffusion 1.5 with mixed datasets.
33
+ Our model generates photorealistic and varied renderings from simplistic robotic simulation images, enabling real-time data augmentation for robotic vision training.
34
 
35
+ We specifically designed UCDR-Net to be fast and composition preserving, with an emphasis on lower resolution images (128x128) for online data augmentation in typical preprocessing pipelines.
36
+ Our choice of Canny Edge version of ControlNet ensures shape and structure preservation in the image, which is crucial for visuomotor policy learning.
37
+
38
+ We trained ControlNet from scratch using only 128x128 images, preprocessing the training datasets and extracting Canny Edge maps.
39
+ We then trained four Control-Nets with different mixtures of 2 datasets (Coyo-700M and Bridge Data) and showcased the results.
40
  * [Coyo-700M](https://github.com/kakaobrain/coyo-dataset)
41
  * [Bridge](https://sites.google.com/view/bridgedata)
42
 
43
+ Model Description and Training Process: Please refer to the readme file attached to the model repository.
 
 
 
 
 
 
 
44
 
45
+ Model Repository: [ControlNet repo](https://huggingface.co/Baptlem/baptlem-controlnet)
 
 
46
 
47
  """
48
 
49
  traj_description = """
50
+ To demonstrate UCDR-Net's capabilities, we generated a trajectory of our simulated robotic environment and presented the resulting videos for each model.
51
+ We batched the frames for each video and performed independent inference for each frame, which explains the "wobbling" effect.
52
+ Prompt used for every video: "A robotic arm with a gripper and a small cube on a table, super realistic, industrial background"
53
+
54
  """
55
 
56
 
57
  perfo_description = """
58
+ Our model has been benchmarked on a node of 4 Titan RTX 24Go GPUs, achieving an impressive 14 FPS image generation rate!
59
+ The Table on the right shows the performances of our models running on different nodes.
60
+ To make the benchmark, we loaded one of our model on every GPUs of the node. We then retrieve an episode of our simulation.
61
+ For every frame of the episode, we preprocess the image (resize, canny, …) and process the Canny image on the GPUs.
62
+ We repeated this procedure for different Batch Size (BS).
63
 
64
  We can see that the greater the BS the greater the FPS. By increazing the BS, we take advantage of the parallelization of the GPUs.
 
65
  """
66
 
67
+ conclusion_description = """
68
+ UCDR-Net stands as a natural development in bridging the Sim2Real gap in robotics by providing real-time data augmentation for training visual policies.
69
+ We are excited to share our work with the HuggingFace community and contribute to the advancement of robotic vision training techniques.
70
+
71
+ """
72
 
73
  def create_key(seed=0):
74
  return jax.random.PRNGKey(seed)
 
319
  with gr.Column():
320
  gr.Image("./perfo_rtx.png",
321
  interactive=False)
322
+
323
+ with gr.Row():
324
+ gr.Markdown(conclusion_description)
325
 
326
 
327