Spaces:
Running
Typo in IFM-TTE-7B results for ViDoRe-V2 under Visual Doc
@Hrant
Thanks for pointing it out! I’ve pinged their author who added this model here: https://huggingface.co/spaces/TIGER-Lab/MMEB-Leaderboard/discussions/69
hi
@haoyubu
@ziyjiang
IFM-TTE-7B demonstrates outstanding overall performance on the visdoc task. I noticed that its scores on several datasets are significantly better than other models:
- ViDoRe_esg_reports_human_labeled_v2: +21%
- ViDoRe_esg_reports_v2_multilingual: +22%
- VisRAG_PlotQA: +19%
- ViDoSeek-page: +27%
- MMLongBench-page: +12%
I found that these datasets all have many additional corpus-ids that do not appear in the qrels. However, according to the official evaluation script, these additional corpus-ids should also be added to the candidate set as negative samples. I want to confirm whether IFM-TTE-7B only used all the corpus-ids from the qrels as the candidate set, and did not use the additional corpus-ids from the corpus?
Hi
@kekekeke
, thanks for raising this! From the VLM2Vec/MMEB side, I can confirm that these additional corpus_ids are included in the candidate set during evaluation.(https://github.com/TIGER-AI-Lab/VLM2Vec/blob/main/src/data/eval_dataset/vidore_dataset.py#L59) I think IFM-TTE-7B follows the same approach, but I’ll leave the final confirmation to the authors of that paper.
