Add transformers library tag and model description

Hi! I'm Niels from the Hugging Face community science team.

This PR adds `library_name: transformers` to the model metadata, as the `Qwen2_5_VL` architecture is supported by the Transformers library. I've also added a brief description of the model's capabilities (autonomous planning and reasoning for video understanding) based on the paper abstract to provide better context for users.

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -1,13 +1,17 @@
 ---
 pipeline_tag: video-text-to-text
 ---
 # EVA: Efficient Reinforcement Learning for End-to-End Video Agent
 [![Paper](https://img.shields.io/badge/Paper-Link-b31b1b.svg)](https://arxiv.org/abs/2603.22918)
 [![GitHub](https://img.shields.io/badge/GitHub-Repository-black.svg)](https://github.com/wangruohui/EfficientVideoAgent)
 [![Model](https://img.shields.io/badge/Model-Link-blue.svg)](https://huggingface.co/WRHC/EfficientVideoAgent/)
-This repository contains the official evaluation code for the model proposed in our paper. The code is available on GitHub and the model weights are available on Hugging Face.
 ![EVA Overview](fig1.png)

 ---
 pipeline_tag: video-text-to-text
+library_name: transformers
 ---
 # EVA: Efficient Reinforcement Learning for End-to-End Video Agent
 [![Paper](https://img.shields.io/badge/Paper-Link-b31b1b.svg)](https://arxiv.org/abs/2603.22918)
 [![GitHub](https://img.shields.io/badge/GitHub-Repository-black.svg)](https://github.com/wangruohui/EfficientVideoAgent)
 [![Model](https://img.shields.io/badge/Model-Link-blue.svg)](https://huggingface.co/WRHC/EfficientVideoAgent/)
+This repository contains the official evaluation code for the model proposed in the paper [EVA: Efficient Reinforcement Learning for End-to-End Video Agent](https://arxiv.org/abs/2603.22918).
+EVA (Efficient Video Agent) is an end-to-end framework that enables "planning-before-perception" through iterative summary-plan-action-reflection reasoning. Unlike passive recognizers, EVA autonomously decides what to watch, when to watch, and how to watch, achieving query-driven and efficient video understanding.
 ![EVA Overview](fig1.png)