license: mit
ConText
This is ConText, a powerful generalists that could do perfect text removal and segmentation framework. We also first exploration of establishing a visual in-context learning (V-ICL) paradigm for fine-grained text recognition tasks, including text segmentation and removal. To achieve this, we sought a single-task-targeted baseline solution based on the prevailing V-ICL frameworks, which typically regulates in-context inference as a query-label-reconstruction process. Beyond simple task-specific fine-tuning, we proposed an end-to-end in-context generalist elicited from a task-chaining prompt that explicitly chaining up tasks as one enriched demonstration, leveraging inter-task correlations to improve the in-context reasoning capabilities. A Through quantitative and qualitative experiments, we demonstrated the grounding effectiveness and superiority of our framework across various in-domain and out-of-domain text recognition tasks, outperforming both current generalists and specialists. Overall, we hope this pioneering work will encourage further development of V-ICL in text recognition.
The code source is in here.
Model Weight & Usage
Here we provide the weights of ConText and ConTextV, you can download these checkpoints and follow the process in here to perform OCR-level removal and segmentation. Have FUN!
Model Performance
It reaches SOTA performance in all text segmentation and removal benchmarks.
Model Card Contact
Feel free to contact ferenas@sjtu.edu.cn if you have any problem! [More Information Needed]