nielsr HF Staff commited on
Commit
3dc426b
·
verified ·
1 Parent(s): 68da431

Add transformers library tag and model description

Browse files

Hi! I'm Niels from the Hugging Face community science team.

This PR adds `library_name: transformers` to the model metadata, as the `Qwen2_5_VL` architecture is supported by the Transformers library. I've also added a brief description of the model's capabilities (autonomous planning and reasoning for video understanding) based on the paper abstract to provide better context for users.

Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -1,13 +1,17 @@
1
  ---
2
  pipeline_tag: video-text-to-text
 
3
  ---
 
4
  # EVA: Efficient Reinforcement Learning for End-to-End Video Agent
5
 
6
  [![Paper](https://img.shields.io/badge/Paper-Link-b31b1b.svg)](https://arxiv.org/abs/2603.22918)
7
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-black.svg)](https://github.com/wangruohui/EfficientVideoAgent)
8
  [![Model](https://img.shields.io/badge/Model-Link-blue.svg)](https://huggingface.co/WRHC/EfficientVideoAgent/)
9
 
10
- This repository contains the official evaluation code for the model proposed in our paper. The code is available on GitHub and the model weights are available on Hugging Face.
 
 
11
 
12
  ![EVA Overview](fig1.png)
13
 
 
1
  ---
2
  pipeline_tag: video-text-to-text
3
+ library_name: transformers
4
  ---
5
+
6
  # EVA: Efficient Reinforcement Learning for End-to-End Video Agent
7
 
8
  [![Paper](https://img.shields.io/badge/Paper-Link-b31b1b.svg)](https://arxiv.org/abs/2603.22918)
9
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-black.svg)](https://github.com/wangruohui/EfficientVideoAgent)
10
  [![Model](https://img.shields.io/badge/Model-Link-blue.svg)](https://huggingface.co/WRHC/EfficientVideoAgent/)
11
 
12
+ This repository contains the official evaluation code for the model proposed in the paper [EVA: Efficient Reinforcement Learning for End-to-End Video Agent](https://arxiv.org/abs/2603.22918).
13
+
14
+ EVA (Efficient Video Agent) is an end-to-end framework that enables "planning-before-perception" through iterative summary-plan-action-reflection reasoning. Unlike passive recognizers, EVA autonomously decides what to watch, when to watch, and how to watch, achieving query-driven and efficient video understanding.
15
 
16
  ![EVA Overview](fig1.png)
17