关于多阶段训练

#21

by xxxcliu - opened Mar 5, 2024

Discussion

xxxcliu

Mar 5, 2024

看paper有点困惑，是先进行了RetroMAE的预训练，然后用无监督数据进行了dense retrieval的训练，然后又用self蒸馏进行了三种方式的训练吗？

Shitao

Beijing Academy of Artificial Intelligence org Mar 6, 2024

Yes. RetroMAE->dense retrieval->unified fine-tuning

xxxcliu

Mar 6, 2024

Yes. RetroMAE->dense retrieval->unified fine-tuning

Thanks for your reply! Do both bge and bge-m3 adopt a bi-encoder architecture? If so, is there actually a single roberta encoder or two seperate models?

Shitao

Beijing Academy of Artificial Intelligence org Mar 6, 2024

bge and bge-m3 both are bi-encoder model, and query and passage share the same encoder.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment