arxiv:2403.09029

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Published on Mar 14

· Featured in Daily Papers on Mar 15

Upvote

Authors:

Hugo Laurençon ,

Léo Tronchon ,

Victor Sanh

Abstract

Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight.

View arXiv page View PDF Add to collection

Community

AdinaY

Mar 15

https://huggingface.co/datasets/HuggingFaceM4/WebSight

librarian-bot

Mar 16

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

zhoutianyi

Mar 17

Congrats on the great work! Our arXiv paper https://arxiv.org/abs/2305.14637 is one of the earliest works addressing the same problem one year ago. Looking forward to more work on the topic!

HugoLaurencon

Paper author Mar 17

Thanks @zhoutianyi for the reference, we indeed missed your paper
We’ll put it in the related work section if we edit this technical report after the next iteration!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Abstract

Community

Models citing this paper 1

Datasets citing this paper 2

Spaces citing this paper 8

Collections including this paper 9