File size: 872 Bytes
360a464
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---
license: apache-2.0
library_name: transformers
pipeline_tag: image-to-image
---

This repository contains the model presented in the paper [UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface](https://hf.co/papers/2503.01342).

UFO unifies object-level detection, pixel-level segmentation, and image-level vision-language tasks into a single model by transforming all perception targets into the language space.  It introduces a novel embedding retrieval approach that relies solely on the language interface to support segmentation tasks.

For more details, please refer to the original paper and the GitHub repository:

- Paper: [UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface](https://hf.co/papers/2503.01342)
- GitHub: [https://github.com/nnnth/UFO](https://github.com/nnnth/UFO)