ConText / README.md
ferenas's picture
Create README.md
3376ff0 verified
metadata
license: mit

ConText

This is ConText, a powerful generalists that could do perfect text removal and segmentation framework. We also first exploration of establishing a visual in-context learning (V-ICL) paradigm for fine-grained text recognition tasks, including text segmentation and removal. To achieve this, we sought a single-task-targeted baseline solution based on the prevailing V-ICL frameworks, which typically regulates in-context inference as a query-label-reconstruction process. Beyond simple task-specific fine-tuning, we proposed an end-to-end in-context generalist elicited from a task-chaining prompt that explicitly chaining up tasks as one enriched demonstration, leveraging inter-task correlations to improve the in-context reasoning capabilities. A Through quantitative and qualitative experiments, we demonstrated the grounding effectiveness and superiority of our framework across various in-domain and out-of-domain text recognition tasks, outperforming both current generalists and specialists. Overall, we hope this pioneering work will encourage further development of V-ICL in text recognition.

The code source is in here.

Model Weight & Usage

Here we provide the weights of ConText and ConTextV, you can download these checkpoints and follow the process in here to perform OCR-level removal and segmentation. Have FUN!

Model Performance

It reaches SOTA performance in all text segmentation and removal benchmarks.

Model Card Contact

Feel free to contact ferenas@sjtu.edu.cn if you have any problem! [More Information Needed]