rsortino
/

trf-sg2im

Inference Endpoints

Model card Files Files and versions Community

rsortino commited on Jan 29

Commit

d098f09

•

1 Parent(s): e8d4cbc

Update README.md

Files changed (1) hide show

README.md +30 -0

README.md CHANGED Viewed

@@ -1,3 +1,33 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+datasets:
+- multi-train/coco_captions_1107
+- visual_genome
+language:
+- en
+pipeline_tag: text-to-image
+tags:
+- scene_graph
+- transformers
+- laplacian
+- autoregressive
+- vqvae
 ---
+# trf-sg2im
+Model card for the paper __"[Transformer-Based Image Generation from Scene Graphs](https://arxiv.org/abs/2303.04634)"__.
+Original GitHub implementation at [](https://github.com/perceivelab/trf-sg2im).
+![teaser](docs/teaser.gif)
+## Model
+This model is a two-stage scene-graph-to-image approach. It takes a scene graph as input and generates a layout using a transformer-based architecture with Laplacian Positional Encoding.
+Then, it uses this estimated layout to condition an autoregressive GPT-like transformer to compose the image in the latent, discrete space, converted into the final image by a VQVAE.
+![architecture](docs/architecture.png)
+## Usage
+For usage instructions, please refer to the original [GitHub repo](https://github.com/perceivelab/trf-sg2im).