Add model card for GUI-AIMA-3B

by nielsr HF Staff - opened Nov 5, 2025

←

This PR adds a comprehensive model card for the GUI-AIMA-3B model, as presented in the paper GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding.

It includes:

Essential metadata: license: cc-by-nc-4.0, pipeline_tag: image-text-to-text for discoverability, and library_name: transformers for automated code snippet generation, based on compatibility evidence.
Direct links to the paper, the project page, and the GitHub repository.
A concise summary of the model from the paper's abstract, alongside key architectural and result images from the GitHub README.
The detailed "Main Results" section, including associated images, to highlight the model's performance.
A "Sample Usage" code snippet, extracted from eval/example_inference.py in the official GitHub repository, to help users get started easily.
The relevant BibTeX citation.

This model card significantly improves the documentation and usability of the GUI-AIMA-3B model on the Hugging Face Hub.

Please review and merge this PR if everything looks good.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Cannot merge

This branch has merge conflicts in the following files:

· Sign up or log in to comment