Spaces:

szukevin
/

VISOR-GPT

Runtime error

App Files Files Community

szukevin commited on May 29, 2023

Commit

7900c16

1 Parent(s): dd1146d

upload

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

LICENSE +21 -0
README.md +64 -12
app.py +425 -0
demo/ckpts/controlnet/cldm_v15.yaml +79 -0
demo/ckpts/controlnet/control_v11p_sd15_openpose.pth +3 -0
demo/ckpts/controlnet/idle +0 -0
demo/ckpts/controlnet/v1-5-pruned-emaonly.safetensors +3 -0
demo/ckpts/gligen/diffusion_pytorch_model_box.bin +3 -0
demo/ckpts/gligen/idle +0 -0
demo/ckpts/visorgpt/idle +0 -0
demo/ckpts/visorgpt/visorgpt_dagger_ta_tb.pt +3 -0
requirements.txt +160 -0
train/README.md +1 -0
train/__init__.py +1 -0
train/__pycache__/__init__.cpython-38.pyc +0 -0
train/__pycache__/__init__.cpython-39.pyc +0 -0
train/beginning.txt +1 -0
train/corpora/CLUECorpusSmall_bert_sampled.txt +0 -0
train/corpora/CLUECorpusSmall_sampled.txt +0 -0
train/corpora/book_review.txt +0 -0
train/corpora/book_review_bert.txt +0 -0
train/corpora/book_review_cls.txt +0 -0
train/datasets/book_review/dev.tsv +0 -0
train/datasets/book_review/test.tsv +0 -0
train/datasets/book_review/test_nolabel.tsv +0 -0
train/datasets/book_review/train.tsv +0 -0
train/datasets/test_data/book_review/dev.tsv +201 -0
train/datasets/test_data/book_review/test.tsv +201 -0
train/datasets/test_data/book_review/test_nolabel.tsv +201 -0
train/datasets/test_data/book_review/train.tsv +501 -0
train/documents/llama.md +52 -0
train/finetune/run_c3.py +215 -0
train/finetune/run_chid.py +225 -0
train/finetune/run_classifier.py +366 -0
train/finetune/run_classifier_cv.py +173 -0
train/finetune/run_classifier_deepspeed.py +212 -0
train/finetune/run_classifier_grid.py +120 -0
train/finetune/run_classifier_mt.py +203 -0
train/finetune/run_classifier_multi_label.py +287 -0
train/finetune/run_classifier_prompt.py +308 -0
train/finetune/run_classifier_siamese.py +340 -0
train/finetune/run_cmrc.py +447 -0
train/finetune/run_dbqa.py +232 -0
train/finetune/run_image_classifier.py +195 -0
train/finetune/run_ner.py +339 -0
train/finetune/run_regression.py +199 -0
train/finetune/run_simcse.py +274 -0
train/finetune/run_speech2text.py +311 -0
train/finetune/run_text2text.py +314 -0
train/inference/run_c3_infer.py +94 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2023 Jinheng Xie
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,13 +1,65 @@
----
-title: VISOR GPT
-emoji: 🔥
-colorFrom: yellow
-colorTo: pink
-sdk: gradio
-sdk_version: 3.32.0
-app_file: app.py
-pinned: false
-license: mit
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+<div align=center>
+<img src="visorgpt_title.png" width="400">
+</div>
+## Learning Visual Prior via Generative Pre-Training [[Arxiv](http://arxiv.org/abs/2305.13777)] [[Demo]()] [[Video](https://www.youtube.com/watch?v=8FDoBfxSY8I)]
+<img src="demo.gif" width="1000">
+## Updates
+- Gradio demo is available.
+- [Hugging Face demo will be available]().
+## Quick Start
+### Step 1
+```
+# clone the repo
+git clone https://github.com/Sierkinhane/VisorGPT.git
+# go to directory
+cd VisorGPT
+# create a new environment
+conda create -n visorgpt python=3.8
+# activate the new environment
+conda activate visorgpt
+# prepare the basic environments
+pip3 install -r requirements.txt
+# install controlnet and gligen
+cd demo/ControlNet
+pip3 install -v -e .
+cd ../demo/GLIGEN
+pip3 install -v -e .
+```
+### Step 2 - Download pre-trained weights
+Download [visorgpt](https://drive.google.com/file/d/1Pk4UPNKBMH-0uRLmK5COYTca7FUrN8XY/view?usp=share_link), [controlnet-pose2img](https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_openpose.pth), [controlnet-sd](https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors), [gligen-bbox2img](https://huggingface.co/gligen/gligen-generation-text-box/blob/main/diffusion_pytorch_model.bin), and put them as follow:
+```
+├── demo/
+|   ├── ckpts
+|   |   ├── controlnet
+|   |   |   ├── control_v11p_sd15_openpose.pth
+|   |   |   ├── v1-5-pruned-emaonly.safetensors
+|   |   ├── gligen
+|   |   |   ├── diffusion_pytorch_model_box.bin
+|   |   ├── visorgpt
+|   |   |   ├── visorgpt_dagger_ta_tb.pt
+```
+### Step 3 - Run demo
+```
+CUDA_VISIBLE_DEVICES=0 python3 gradio_demo.py
+```
+If you are using our code, please consider citing our paper.
+```
+@article{xie2023visorgpt,
+  title={VisorGPT: Learning Visual Prior via Generative Pre-Training},
+  author={Xie, Jinheng and Ye, Kai and Li, Yudong and Li, Yuexiang and Lin, Kevin Qinghong and Zheng, Yefeng and Shen, Linlin and Shou, Mike Zheng},
+  journal={arXiv preprint arXiv:2305.13777},
+  year={2023}
+}
+```

app.py ADDED Viewed

	@@ -0,0 +1,425 @@

+from share import *
+import gradio as gr
+import numpy as np
+import torch
+import re
+from PIL import Image
+from tqdm import tqdm
+from train.scripts.generate_lm_multiple import gen_sequence, build_visorgpt
+from utils.seq2coord import gen_cond_mask
+from gligen.gligen_inference_box import gligen_infer, build_gligen_model
+from controlnet.gradio_pose2image_v2 import control_infer, build_control_model, build_controlv11_model
+# init models
+visorgpt_config_path = 'train/models/gpt2/config.json'
+visorgpt_model_path = 'demo/ckpts/visorgpt/visorgpt_dagger_ta_tb.pt'
+visorgpt_vocab_path = 'train/models/google_uncased_en_coord_vocab.txt'
+# control_model_path = 'demo/ckpts/controlnet/control_sd15_openpose.pth'
+control_model_path = 'demo/ckpts/controlnet/control_v11p_sd15_openpose.pth' # v1.1
+control_sd_path = 'demo/ckpts/controlnet/v1-5-pruned-emaonly.safetensors'
+control_model_config = 'demo/ckpts/controlnet/cldm_v15.yaml'
+gligen_model_path = 'demo/ckpts/gligen/diffusion_pytorch_model_box.bin'
+visorgpt_args, visorgpt_model = build_visorgpt(model_config=visorgpt_config_path,
+                                                  model_path=visorgpt_model_path,
+                                                  vocab_path=visorgpt_vocab_path)
+control_model, ddim_sampler = build_controlv11_model(model_path=control_model_path,
+                                                     sd_path=control_sd_path,
+                                                  config_path=control_model_config)
+# build gligen model
+g_model, g_autoencoder, g_text_encoder, g_diffusion, \
+    g_config, g_grounding_tokenizer_input = build_gligen_model(ckpt=gligen_model_path)
+# maximum number of instances
+max_num_keypoint = 16
+max_num_bbox = 16
+max_num_mask = 8
+def generate_sequence(gen_type,
+                        data_type,
+                        instance_size,
+                        num_instance,
+                        object_name_inbox):
+    ctn = True
+    if gen_type == 'key point':
+        num_keypoint = 18
+        if num_instance > max_num_keypoint:
+            num_instance = max_num_keypoint
+        seq_prompt = '; '.join([gen_type, data_type, instance_size, str(num_instance), str(num_keypoint)]) + ' ; [person'
+    elif gen_type == 'box' or gen_type == 'mask':
+        if not object_name_inbox.strip():
+            if gen_type == 'mask':
+                object_name_inbox = "bottle; cup"
+            else:
+                if data_type == 'object centric':
+                    object_name_inbox = "great white shark"
+                else:
+                    object_name_inbox = "person; frisbee"
+        num_keypoint = 0
+        if gen_type == 'mask':
+            if num_instance > max_num_mask:
+                num_instance = max_num_mask
+        if gen_type == 'box':
+            if num_instance > max_num_bbox:
+                num_instance = max_num_bbox
+        if data_type == 'object centric':
+            num_instance = 1
+        objects = ', '.join(object_name_inbox.strip().split(";"))
+        seq_prompt = '; '.join([gen_type, data_type, instance_size,
+                                str(num_instance), str(num_keypoint)]) + '; ' + objects
+        if len(object_name_inbox.split(';')) > num_instance:
+            return {
+                raw_sequence: gr.update(
+                    value="The umber of category names should be less than the number of instances, please try again :)",
+                    visible=True)
+            }
+    print("input prompt: \n", seq_prompt)
+    sequence = gen_sequence(visorgpt_args, visorgpt_model, seq_prompt)
+    assert isinstance(sequence, list)
+    try:
+        cond_mask, cond_json = gen_cond_mask(sequence, ctn)
+        if gen_type == 'key point':
+            ori_sequence = cond_json[2]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'box':
+            ori_sequence = cond_json[0]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'mask':
+            ori_sequence = cond_json[1]['sequences'][0][0] + '[SEP]'
+    except:
+        cond_mask, cond_json = gen_cond_mask(sequence, not ctn)
+        if gen_type == 'key point':
+            ori_sequence = cond_json[2]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'box':
+            ori_sequence = cond_json[0]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'mask':
+            ori_sequence = cond_json[1]['sequences'][0][0] + '[SEP]'
+    ret_img = Image.fromarray(cond_mask)
+    if not gen_type == 'mask':
+        return {
+            result_gallery: [ret_img],
+            raw_sequence: gr.update(value=ori_sequence, visible=True),
+            images_button: gr.update(visible=True),
+            text_container: cond_json,
+            sequence_container: ori_sequence
+        }
+    else:
+        return {
+            result_gallery: [ret_img],
+            raw_sequence: gr.update(value=ori_sequence, visible=True),
+            images_button: gr.update(visible=False),
+            text_container: cond_json,
+            sequence_container: ori_sequence
+        }
+def add_contents(gen_type,
+                        data_type,
+                        instance_size,
+                        num_instance,
+                        object_name_inbox,
+                        num_continuous_gen,
+                        global_seq):
+    ctn = True
+    if gen_type == 'key point':
+        num_keypoint = 18
+        seq_prompt = '; '.join([gen_type, data_type, instance_size, str(num_instance), str(num_keypoint)]) + ' ; [person'
+        if num_continuous_gen:
+            ctn = True
+            cur_instance = int(global_seq.split(';')[3].strip())
+            new_number = cur_instance + num_continuous_gen
+            if new_number > max_num_keypoint:
+                new_number = max_num_keypoint
+            # prompt type a
+            if global_seq.split(';')[5].find('[') == -1:
+                global_seq = global_seq.replace('[CLS]', '').replace('[SEP]', '')
+                objects = re.findall(re.compile(r'[\[](.*?)[]]', re.S), global_seq)
+                objects = ' '.join(['[ person' + x + ']' for x in objects])
+                seq_prompt = '; '.join([gen_type, data_type, instance_size, str(new_number), str(num_keypoint), objects])
+            # prompt type b
+            else:
+                global_seq = global_seq.replace('[CLS]', '').replace('[SEP]', '')
+                seq_list = global_seq.split(';')
+                seq_list[3] = str(new_number)
+                seq_prompt = ';'.join(seq_list)
+    elif gen_type == 'box' or gen_type == 'mask':
+        num_keypoint = 0
+        if data_type == 'object centric':
+            num_instance = 1
+        objects = ', '.join(object_name_inbox.strip().split(";"))
+        seq_prompt = '; '.join([gen_type, data_type, instance_size,
+                                str(num_instance), str(num_keypoint)]) + '; ' + objects
+        if len(object_name_inbox.split(';')) > num_instance:
+            return {
+                raw_sequence: gr.update(value=f"The umber of category names should be less than the number of instances, please try again :)", visible=True)
+            }
+        if num_continuous_gen:
+            cur_instance = int(global_seq.split(';')[3].strip())
+            new_number = cur_instance + num_continuous_gen
+            if gen_type == 'mask':
+                if new_number > max_num_mask:
+                    new_number = max_num_mask
+            if gen_type == 'box':
+                if new_number > max_num_bbox:
+                    new_number = max_num_bbox
+            # prompt type a
+            if global_seq.split(';')[5].find('[') == -1:
+                global_seq = global_seq.replace('[CLS]', '').replace('[SEP]', '')
+                coords = re.findall(re.compile(r'[\[](.*?)[]]', re.S), global_seq)
+                objects = global_seq.split(';')[5].split(',')
+                objects = ' '.join(['[ ' + objects[i] + coords[i] + ']' for i in range(len(coords))])
+                seq_prompt = '; '.join([gen_type, data_type, instance_size, str(new_number), str(num_keypoint), objects])
+            # prompt type b
+            else:
+                global_seq = global_seq.replace('[CLS]', '').replace('[SEP]', '')
+                seq_list = global_seq.split(';')
+                seq_list[3] = str(new_number)
+                seq_prompt = ';'.join(seq_list)
+    # import ipdb;ipdb.set_trace()
+    print("input prompt: \n", seq_prompt)
+    with torch.no_grad():
+        sequence = gen_sequence(visorgpt_args, visorgpt_model, seq_prompt)
+        torch.cuda.empty_cache()
+    assert isinstance(sequence, list)
+    try:
+        cond_mask, cond_json = gen_cond_mask(sequence, ctn)
+        if gen_type == 'key point':
+            ori_sequence = cond_json[2]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'box':
+            ori_sequence = cond_json[0]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'mask':
+            ori_sequence = cond_json[1]['sequences'][0][0] + '[SEP]'
+    except:
+        cond_mask, cond_json = gen_cond_mask(sequence, not ctn)
+        if gen_type == 'key point':
+            ori_sequence = cond_json[2]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'box':
+            ori_sequence = cond_json[0]['sequences'][0][0] + '[SEP]'
+        elif gen_type == 'mask':
+            ori_sequence = cond_json[1]['sequences'][0][0] + '[SEP]'
+    ret_img = Image.fromarray(cond_mask)
+    if not gen_type == 'mask':
+        return {
+            result_gallery: [ret_img],
+            raw_sequence: gr.update(value=ori_sequence, visible=True),
+            images_button: gr.update(visible=True),
+            text_container: cond_json,
+            sequence_container: ori_sequence
+        }
+    else:
+        return {
+            result_gallery: [ret_img],
+            raw_sequence: gr.update(value=ori_sequence, visible=True),
+            images_button: gr.update(visible=False),
+            text_container: cond_json,
+            sequence_container: ori_sequence
+        }
+def generate_images(gen_type,
+                    num_samples,
+                    ddim_steps,
+                    object_prompt,
+                    seed,
+                    global_text,
+                    global_seq):
+    if gen_type == 'key point':
+        data = global_text[2]['keypoints']
+        idx = np.arange(len(data))
+        split_idx = list(np.array_split(idx, 1)[0])
+        for idx in tqdm(split_idx):
+            item = data[idx]
+            keypoint_list = []
+            for ins in item:
+                kv = list(ins.items())[0]
+                keypoint = (np.array(kv[1])).tolist()
+                keypoint_list.append(keypoint)
+        with torch.no_grad():
+            ret_img = control_infer(model=control_model,
+                                ddim_sampler=ddim_sampler,
+                                keypoint_list=keypoint_list,
+                                prompt=object_prompt.strip(),
+                                num_samples=num_samples,
+                                ddim_steps=ddim_steps,
+                                seed=seed)
+            torch.cuda.empty_cache()
+    elif gen_type == 'box':
+        data = global_text[0]['bboxes']
+        with torch.no_grad():
+            ret_img = gligen_infer(model=g_model,
+                               autoencoder=g_autoencoder,
+                               text_encoder=g_text_encoder,
+                               diffusion=g_diffusion,
+                               config=g_config,
+                               grounding_tokenizer_input=g_grounding_tokenizer_input,
+                               context_prompt=object_prompt.strip(),
+                               bbox_lists=data,
+                               ddim_steps=ddim_steps,
+                               batch_size=num_samples,
+                               seed=seed)
+            torch.cuda.empty_cache()
+    if not gen_type == 'mask':
+        return {
+            result_gallery: ret_img,
+            text_container: global_text,
+            sequence_container: global_seq
+        }
+    else:
+        return {
+            raw_sequence: "sequence to mask is not supported yet :)",
+            text_container: global_text,
+            sequence_container: global_seq
+        }
+def object_name_inbox_fn(gen_type):
+    if gen_type == 'key point':
+        return {
+            object_name_inbox: gr.update(visible=False),
+            data_type: gr.update(choices=['multiple instances']),
+            images_button: gr.update(value='Synthesize images using ControlNet'),
+            ddim_steps: gr.update(value=20),
+            object_prompt: gr.update(placeholder='in suit'),
+            num_instance: gr.update(visible=True, minimum=1, maximum=16, value=2, step=1),
+            sequence_container: None
+        }
+    elif gen_type == 'box':
+        return {
+            object_name_inbox: gr.update(visible=True, value='person; frisbee'),
+            data_type: gr.update(choices=['multiple instances', 'object centric']),
+            images_button: gr.update(value='Synthesize images using GLIGEN'),
+            ddim_steps: gr.update(value=50),
+            object_prompt: gr.update(placeholder='man and frisbee'),
+            num_instance: gr.update(visible=True, minimum=1, maximum=16, value=2, step=1),
+            sequence_container: None
+        }
+    elif gen_type == 'mask':
+        return {
+            object_name_inbox: gr.update(visible=True,
+                                         label="MS COCO categories to be generated (separated by semicolon)", value='bottle; cup'),
+            data_type: gr.update(choices=['multiple instances']),
+            images_button: gr.update(value='Synthesize images using GLIGEN'),
+            ddim_steps: gr.update(value=50),
+            object_prompt: gr.update(placeholder='bottle and cup'),
+            num_instance: gr.update(visible=True, minimum=1, maximum=8, value=2, step=1),
+            sequence_container: None
+        }
+def instance_type_change_fn(data_type):
+    if data_type == 'multiple instances':
+        return {
+            md_title: gr.update(visible=True),
+            num_continuous_gen: gr.update(visible=True),
+            continuous_btn: gr.update(visible=True),
+            object_name_inbox: gr.update(label="MS COCO categories to be generated (separated by semicolon)", value='person; frisbee'),
+            object_prompt: gr.update(placeholder='man and frisbee'),
+            num_instance: gr.update(visible=True, minimum=1, maximum=16, value=2, step=1),
+        }
+    elif data_type == 'object centric':
+        return {
+            md_title: gr.update(visible=False),
+            num_continuous_gen: gr.update(visible=False),
+            continuous_btn: gr.update(visible=False),
+            object_name_inbox: gr.update(label="ImageNet-1K categories to be generated", value='great white shark'),
+            object_prompt: gr.update(placeholder='great white shark'),
+            num_instance: gr.update(visible=False, value=1),
+        }
+block = gr.Blocks()
+with block:
+    text_container = gr.State()
+    sequence_container = gr.State()
+    gr.Markdown('<div align=center> <img src="file/visorgpt_title_all.jpg" width = "100%" height = "100%" /> </div>')
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### Params to generate sequences")
+            gen_type = gr.inputs.Dropdown(choices=['key point', 'box', 'mask'], type='value', default='key point', label='Anotation Type')
+            data_type = gr.inputs.Dropdown(choices=['multiple instances'], type='value', default='multiple instances', label='Data Type')
+            instance_size = gr.inputs.Dropdown(choices=['small', 'medium', 'large'], type='value', default='large', label='Instance Size')
+            num_instance = gr.Slider(label="Number of instances per image", minimum=1, maximum=16, value=2, step=1)
+            object_name_inbox = gr.Textbox(label="MS COCO categories to be generated (separated by semicolon)", placeholder="person; frisbee", visible=False)
+            sequence_button = gr.Button(value="Customize sequential output")
+            md_title = gr.Markdown("### Continuous generation (Optional)")
+            num_continuous_gen = gr.Slider(label="Add instances to the current scene", minimum=1, maximum=16, value=1, step=1)
+            continuous_btn = gr.Button(value="Add instances to the current scene")
+            gr.Markdown("### Params to synthesize images")
+            object_prompt = gr.Textbox(label="Context Prompt", placeholder="in suit", visible=True)
+            num_samples = gr.Slider(label="Batch Size", minimum=1, maximum=36, value=1, step=1)
+            ddim_steps = gr.Slider(label="Steps", minimum=1, maximum=100, value=20, step=1)
+            seed = gr.Slider(label="Seed", minimum=-1, maximum=2147483647, step=1, randomize=True)
+            images_button = gr.Button(value="Synthesize images using ControlNet", visible=False)
+        with gr.Column():
+            raw_sequence = gr.Textbox(label="Raw Sequence", visible=False)
+            result_gallery = gr.Gallery(label='Output', show_label=False, elem_id="gallery").style(grid=2, height='auto', preview=True)
+    gen_type.change(object_name_inbox_fn, inputs=[gen_type],
+                    outputs=[object_name_inbox, data_type, images_button, ddim_steps, object_prompt, num_instance, sequence_container])
+    data_type.change(instance_type_change_fn, inputs=[data_type],
+                     outputs=[md_title, num_continuous_gen, continuous_btn, object_name_inbox, object_prompt, num_instance])
+    ips = [gen_type, data_type, instance_size, num_instance, object_name_inbox]
+    sequence_button.click(fn=generate_sequence, inputs=ips, outputs=[result_gallery, raw_sequence, images_button, text_container, sequence_container])
+    ips = [gen_type, data_type, instance_size, num_instance, object_name_inbox, num_continuous_gen, sequence_container]
+    continuous_btn.click(fn=add_contents, inputs=ips, outputs=[result_gallery, raw_sequence, images_button, text_container, sequence_container])
+    ips = [gen_type, num_samples, ddim_steps, object_prompt, seed, text_container, sequence_container]
+    images_button.click(fn=generate_images, inputs=ips, outputs=[result_gallery, raw_sequence, text_container, sequence_container])
+block.launch(server_name='0.0.0.0', server_port=10086, debug=False, share=False)

demo/ckpts/controlnet/cldm_v15.yaml ADDED Viewed

	@@ -0,0 +1,79 @@

+model:
+  target: visor_controlnet.cldm.cldm.ControlLDM
+  params:
+    linear_start: 0.00085
+    linear_end: 0.0120
+    num_timesteps_cond: 1
+    log_every_t: 200
+    timesteps: 1000
+    first_stage_key: "jpg"
+    cond_stage_key: "txt"
+    control_key: "hint"
+    image_size: 64
+    channels: 4
+    cond_stage_trainable: false
+    conditioning_key: crossattn
+    monitor: val/loss_simple_ema
+    scale_factor: 0.18215
+    use_ema: False
+    only_mid_control: False
+    control_stage_config:
+      target: visor_controlnet.cldm.cldm.ControlNet
+      params:
+        image_size: 32 # unused
+        in_channels: 4
+        hint_channels: 3
+        model_channels: 320
+        attention_resolutions: [ 4, 2, 1 ]
+        num_res_blocks: 2
+        channel_mult: [ 1, 2, 4, 4 ]
+        num_heads: 8
+        use_spatial_transformer: True
+        transformer_depth: 1
+        context_dim: 768
+        use_checkpoint: True
+        legacy: False
+    unet_config:
+      target: visor_controlnet.cldm.cldm.ControlledUnetModel
+      params:
+        image_size: 32 # unused
+        in_channels: 4
+        out_channels: 4
+        model_channels: 320
+        attention_resolutions: [ 4, 2, 1 ]
+        num_res_blocks: 2
+        channel_mult: [ 1, 2, 4, 4 ]
+        num_heads: 8
+        use_spatial_transformer: True
+        transformer_depth: 1
+        context_dim: 768
+        use_checkpoint: True
+        legacy: False
+    first_stage_config:
+      target: visor_controlnet.ldm.models.autoencoder.AutoencoderKL
+      params:
+        embed_dim: 4
+        monitor: val/rec_loss
+        ddconfig:
+          double_z: true
+          z_channels: 4
+          resolution: 256
+          in_channels: 3
+          out_ch: 3
+          ch: 128
+          ch_mult:
+          - 1
+          - 2
+          - 4
+          - 4
+          num_res_blocks: 2
+          attn_resolutions: []
+          dropout: 0.0
+        lossconfig:
+          target: torch.nn.Identity
+    cond_stage_config:
+      target: visor_controlnet.ldm.modules.encoders.modules.FrozenCLIPEmbedder

demo/ckpts/controlnet/control_v11p_sd15_openpose.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:db97becd92cd19aff71352a60e93c2508decba3dee64f01f686727b9b406a9dd
+size 1445235707

demo/ckpts/controlnet/idle ADDED Viewed

File without changes

demo/ckpts/controlnet/v1-5-pruned-emaonly.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa
+size 4265146304

demo/ckpts/gligen/diffusion_pytorch_model_box.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5f3d2d5ec6e01c7ad7ca811a39904db675d1c5fccfeca9d34d63e4bf65ccd7b
+size 6775067861

demo/ckpts/gligen/idle ADDED Viewed

File without changes

demo/ckpts/visorgpt/idle ADDED Viewed

File without changes

demo/ckpts/visorgpt/visorgpt_dagger_ta_tb.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ad8c442caf1ec58accc97dbc5b0636e4398a9853cf6e6475f5be01f087da245
+size 219003175

requirements.txt ADDED Viewed

	@@ -0,0 +1,160 @@

+absl-py==1.4.0
+addict==2.4.0
+aiofiles==23.1.0
+aiohttp==3.8.4
+aiosignal==1.3.1
+altair==4.2.2
+antlr4-python3-runtime==4.9.3
+anyio==3.6.2
+asttokens==2.2.1
+async-timeout==4.0.2
+attrs==22.2.0
+backcall==0.2.0
+basicsr==1.4.2
+blinker==1.5
+brotlipy==0.7.0
+cachetools==5.3.0
+certifi @ file:///croot/certifi_1671487769961/work/certifi
+cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
+charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
+click==8.1.3
+clip==0.2.0
+contourpy==1.0.7
+# Editable install with no version control (controlnet==1.0)
+-e /home/cvi_demo/PPSM/demo/ControlNet
+coqpit==0.0.17
+cryptography @ file:///croot/cryptography_1673298753778/work
+cycler==0.11.0
+decorator==5.1.1
+einops==0.6.0
+elasticsearch==1.9.0
+entrypoints==0.4
+executing==1.2.0
+fastapi==0.95.0
+ffmpy==0.3.0
+filelock==3.11.0
+flit-core @ file:///opt/conda/conda-bld/flit-core_1644941570762/work/source/flit_core
+fonttools==4.39.3
+frozenlist==1.3.3
+fsspec==2023.4.0
+ftfy==6.1.1
+future==0.18.3
+# Editable install with no version control (gligen==1.0)
+-e /home/cvi_demo/PPSM/demo/GLIGEN
+google-auth==2.17.2
+google-auth-oauthlib==1.0.0
+gradio==3.25.0
+gradio-client==0.0.10
+grpcio==1.51.3
+h11==0.14.0
+httpcore==0.17.0
+httpx==0.24.0
+huggingface-hub==0.13.4
+idna @ file:///croot/idna_1666125576474/work
+imageio==2.27.0
+importlib-metadata==6.1.0
+importlib-resources==5.12.0
+ipdb==0.13.13
+ipython==8.11.0
+iso8601==1.1.0
+jedi==0.18.2
+Jinja2==3.1.2
+jsonschema==4.17.3
+kiwisolver==1.4.4
+kornia==0.6.10
+lazy-loader==0.2
+lightning-utilities==0.8.0
+linkify-it-py==2.0.0
+lmdb==1.4.0
+Markdown==3.4.3
+markdown-it-py==2.2.0
+MarkupSafe==2.1.2
+matplotlib==3.7.1
+matplotlib-inline==0.1.6
+mdit-py-plugins==0.3.3
+mdurl==0.1.2
+mkl-fft==1.3.1
+mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186064646/work
+mkl-service==2.4.0
+multidict==6.0.4
+networkx==3.0
+numpy @ file:///tmp/abs_653_j00fmm/croots/recipe/numpy_and_numpy_base_1659432701727/work
+oauthlib==3.2.2
+omegaconf==2.3.0
+open-clip-torch==2.16.0
+opencv-python==4.7.0.72
+orjson==3.8.10
+packaging==23.0
+pandas==2.0.0
+parso==0.8.3
+pexpect==4.8.0
+pickleshare==0.7.5
+Pillow==9.3.0
+pkgutil-resolve-name==1.3.10
+prompt-toolkit==3.0.38
+protobuf==3.19.6
+psutil==5.9.4
+ptyprocess==0.7.0
+pure-eval==0.2.2
+pyasn1==0.4.8
+pyasn1-modules==0.2.8
+pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
+pydantic==1.10.7
+pyDeprecate==0.3.2
+pydub==0.25.1
+Pygments==2.14.0
+PyJWT==2.6.0
+pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
+pyparsing==3.0.9
+pyrsistent==0.19.3
+PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work
+python-dateutil==2.8.2
+python-multipart==0.0.6
+pytorch-lightning==1.6.5
+pytz==2023.3
+PyWavelets==1.4.1
+PyYAML==6.0
+regex==2023.3.23
+requests @ file:///opt/conda/conda-bld/requests_1657734628632/work
+requests-oauthlib==1.3.1
+rsa==4.9
+scikit-image==0.20.0
+scipy==1.9.1
+semantic-version==2.10.0
+sentencepiece==0.1.97
+share==1.0.4
+six @ file:///tmp/build/80754af9/six_1644875935023/work
+sniffio==1.3.0
+soundfile==0.12.1
+stack-data==0.6.2
+starlette==0.26.1
+tb-nightly==2.13.0a20230410
+tensorboard==2.12.0
+tensorboard-data-server==0.7.0
+tensorboard-plugin-wit==1.8.1
+tensorboardX==2.6
+tifffile==2023.3.21
+timm==0.6.13
+tokenizers==0.13.2
+tomli==2.0.1
+toolz==0.12.0
+torch==1.12.1
+torchmetrics==0.11.4
+torchvision==0.13.1
+tqdm==4.65.0
+traitlets==5.9.0
+transformers==4.27.4
+typing-extensions @ file:///croot/typing_extensions_1669924550328/work
+tzdata==2023.3
+uc-micro-py==1.0.1
+urllib3 @ file:///croot/urllib3_1673575502006/work
+uvicorn==0.21.1
+wcwidth==0.2.6
+websockets==11.0.1
+Werkzeug==2.2.3
+yapf==0.32.0
+yarl==1.8.2
+zipp==3.15.0
+visor_controlnet
+visor-gligen==1.1
+visor-controlnet==1.1

train/README.md ADDED Viewed

	@@ -0,0 +1 @@


1	+ The code is highly based on [TencentPretrain](https://github.com/Tencent/TencentPretrain).

train/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from .scripts import *

train/__pycache__/__init__.cpython-38.pyc ADDED Viewed

Binary file (152 Bytes). View file

train/__pycache__/__init__.cpython-39.pyc ADDED Viewed

Binary file (152 Bytes). View file

train/beginning.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ key point; Multiple instances; 5; 14; medium;

train/corpora/CLUECorpusSmall_bert_sampled.txt ADDED Viewed