Add model card for GUI-AIMA-3B

#1
by nielsr HF Staff - opened

This PR adds a comprehensive model card for the GUI-AIMA-3B model, as presented in the paper GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding.

It includes:

  • Essential metadata: license: cc-by-nc-4.0, pipeline_tag: image-text-to-text for discoverability, and library_name: transformers for automated code snippet generation, based on compatibility evidence.
  • Direct links to the paper, the project page, and the GitHub repository.
  • A concise summary of the model from the paper's abstract, alongside key architectural and result images from the GitHub README.
  • The detailed "Main Results" section, including associated images, to highlight the model's performance.
  • A "Sample Usage" code snippet, extracted from eval/example_inference.py in the official GitHub repository, to help users get started easily.
  • The relevant BibTeX citation.

This model card significantly improves the documentation and usability of the GUI-AIMA-3B model on the Hugging Face Hub.

Please review and merge this PR if everything looks good.

Cannot merge
This branch has merge conflicts in the following files:
  • README.md

Sign up or log in to comment