关于微调时正负例比例设定的问题 About the ratio of pos and negs

#26
by Kaguya-19 - opened

bge-reranker is a very good work, and we fine-tune it on our own data. However, we see that the fine-tuning script in FlagEmbedding provides that the ratio of positive and negative cases is 1:15. Considering that the general classification task usually sets the ratio of positive and negative cases to 1:1, is there any experiment for the setting of multiple negative cases? Thank you very much!

bge-reranker是一个非常好的工作,我们打算在自己业务数据上微调它。然而,我们看到FlagEmbedding中的微调脚本提供的是正负例标准是1:15,考虑到一般分类任务将正负例比例设为1:1,多个负例的选取是否有实验验证?非常感谢!

Kaguya-19 changed discussion title from 关于微调时正负例比例设定的问题 to 关于微调时正负例比例设定的问题 About the ratio of pos and negs
Beijing Academy of Artificial Intelligence org
edited 4 days ago

@Kaguya-19 , 用的分类的交叉熵损失,并不是二分类sigmoid损失,因此不存在正负比例失衡问题。对于对比学习来说,通常负样本越多效果越好。

Kaguya-19 changed discussion status to closed

Sign up or log in to comment