--- library_name: transformers tags: [] --- # Model Card for Model ID This model is based on [LLaVA1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b). The model is finetuned with LoRA on [OpenCOLE1.0 dataset](naoto0804/opencole) to generate text layouts. ## Model Details ### Model Description This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated. - **Language(s) (NLP):** English - **License:** Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved. - **Finetuned from model:** [LLaVA1.5-7b](https://huggingface.co/liuhaotian/llava-v1.5-7b) ### Model Sources [optional] - **Repository:** [CyberAgentAILab/OpenCOLE](https://github.com/CyberAgentAILab/OpenCOLE) - **Paper:** [OpenCOLE: Towards Reproducible Automatic Graphic Design Generation]() ## Uses Please refer to [OpenCOLE](https://github.com/CyberAgentAILab/OpenCOLE). ## Training Details ### Training Data - About 18k image-text extracted automatically from OpenCOLE Below is an example. ``` [ { "id": "592d203395a7a863ddcd9df1", "image": "images/592/592d203395a7a863ddcd9df1.png", "conversations": [ { "from": "human", "value": "\nGiven an image and text input including set of keywords to be placed on the image and its properties (optional), plan the layout of the texts. The output should be formatted as a JSON instance that conforms to the JSON schema below.\n\nAs an example, for the schema {\"properties\": {\"foo\": {\"title\": \"Foo\", \"description\": \"a list of strings\", \"type\": \"array\", \"items\": {\"type\": \"string\"}}}, \"required\": [\"foo\"]}\nthe object {\"foo\": [\"bar\", \"baz\"]} is a well-formatted instance of the schema. The object {\"properties\": {\"foo\": [\"bar\", \"baz\"]}} is not well-formatted.\n\nHere is the output schema:\n```\n{\"properties\": {\"elements\": {\"title\": \"Elements\", \"default\": [], \"type\": \"array\", \"items\": {\"$ref\": \"#/definitions/Element\"}}}, \"definitions\": {\"Element\": {\"title\": \"Element\", \"type\": \"object\", \"properties\": {\"text\": {\"title\": \"Text\", \"description\": \"Dummy text\", \"type\": \"string\"}, \"width\": {\"title\": \"Width\", \"description\": \"range: 0 <= width <= 127\", \"type\": \"integer\"}, \"height\": {\"title\": \"Height\", \"description\": \"range: 0 <= height <= 127\", \"type\": \"integer\"}, \"left\": {\"title\": \"Left\", \"description\": \"range: 0 <= left <= 127\", \"type\": \"integer\"}, \"top\": {\"title\": \"Top\", \"description\": \"range: 0 <= top <= 127\", \"type\": \"integer\"}, \"font\": {\"title\": \"Font\", \"type\": \"string\"}, \"color\": {\"title\": \"Color\", \"description\": \"range: 0 <= color <= 127\", \"type\": \"integer\"}, \"text_align\": {\"title\": \"Text Align\", \"description\": \"choices: \\\"\\\", \\\"left\\\", \\\"center\\\", \\\"right\\\"\", \"type\": \"string\"}, \"capitalize\": {\"title\": \"Capitalize\", \"description\": \"choices: \\\"false\\\", \\\"true\\\"\", \"type\": \"string\"}, \"font_size\": {\"title\": \"Font Size\", \"description\": \"range: 0 <= font_size <= 127\", \"type\": \"integer\"}, \"angle\": {\"title\": \"Angle\", \"description\": \"range: 0 <= angle <= 127\", \"type\": \"integer\"}, \"letter_spacing\": {\"title\": \"Letter Spacing\", \"description\": \"range: 0 <= letter_spacing <= 127\", \"type\": \"integer\"}, \"line_height\": {\"title\": \"Line Height\", \"description\": \"range: 0 <= line_height <= 127\", \"type\": \"integer\"}}, \"required\": [\"text\", \"width\", \"height\", \"left\", \"top\", \"font\", \"color\", \"text_align\", \"capitalize\", \"font_size\", \"angle\", \"letter_spacing\", \"line_height\"]}}}\n``` Input: [\"WE DON'T HAVE\\nANOTHER PLANET\", \"GREEN\", \"GO\"]" }, { "from": "gpt", "value": "{\"elements\": [{\"text\": \"GO\", \"width\": 62, \"height\": 40, \"left\": 11, \"top\": 43, \"font\": \"Cormorant Infant\", \"color\": 38, \"text_align\": \"center\", \"capitalize\": \"false\", \"font_size\": 79, \"angle\": 0, \"letter_spacing\": 61, \"line_height\": 27}, {\"text\": \"GREEN\", \"width\": 69, \"height\": 30, \"left\": 6, \"top\": 60, \"font\": \"Cormorant Infant\", \"color\": 56, \"text_align\": \"center\", \"capitalize\": \"false\", \"font_size\": 67, \"angle\": 0, \"letter_spacing\": 50, \"line_height\": 27}, {\"text\": \"WE DON'T HAVE\\nANOTHER PLANET\", \"width\": 71, \"height\": 37, \"left\": 3, \"top\": 74, \"font\": \"Cormorant Infant\", \"color\": 56, \"text_align\": \"center\", \"capitalize\": \"false\", \"font_size\": 39, \"angle\": 0, \"letter_spacing\": 29, \"line_height\": 47}]}" } ] }, ... ``` - ## Citation ``` @inproceedings{inoue2024opencole, title={{OpenCOLE: Towards Reproducible Automatic Graphic Design Generation}}, author={Naoto Inoue and Kento Masui and Wataru Shimoda and Kota Yamaguchi}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year={2024}, } ``` ## Model Card Contact [Naoto Inoue](https://github.com/naoto0804)