Update README.md
Browse files
README.md
CHANGED
@@ -1127,10 +1127,12 @@ The finetune loss uses triple contrastive loss, adding hard negative. Neg num is
|
|
1127 |
Note: We set different max lengths for query and passage, and the max length of query is always kept at 64.
|
1128 |
|
1129 |
### Others
|
|
|
1130 |
1. 减小显存的方式: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 不支持双塔结构下的gradient checkpointing) 相关issue见: https://github.com/microsoft/DeepSpeed/issues/988
|
1131 |
2. dataset sampler,我们采用了M3E的dataset sampler,用以保证每个batch里的样本均来自于一个dataset,负样本更有价值。
|
1132 |
3. instruction。instruction在我们的实验中对retrieval任务有非常大的性能提升,我们在每个训练样本前都加入'查询: '和'结果: '这样的instruction。
|
1133 |
|
|
|
1134 |
1. The way to reduce memory usage: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 does not support gradient checkpointing under the double-tower structure) For related issues, see: https://github.com/microsoft/DeepSpeed/issues/ 988
|
1135 |
2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
|
1136 |
3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.
|
|
|
1127 |
Note: We set different max lengths for query and passage, and the max length of query is always kept at 64.
|
1128 |
|
1129 |
### Others
|
1130 |
+
一些有用的trick:
|
1131 |
1. 减小显存的方式: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 不支持双塔结构下的gradient checkpointing) 相关issue见: https://github.com/microsoft/DeepSpeed/issues/988
|
1132 |
2. dataset sampler,我们采用了M3E的dataset sampler,用以保证每个batch里的样本均来自于一个dataset,负样本更有价值。
|
1133 |
3. instruction。instruction在我们的实验中对retrieval任务有非常大的性能提升,我们在每个训练样本前都加入'查询: '和'结果: '这样的instruction。
|
1134 |
|
1135 |
+
some useful tricks:
|
1136 |
1. The way to reduce memory usage: fp16 + gradient checkpointing + ZERO STAGE1 (stage2 does not support gradient checkpointing under the double-tower structure) For related issues, see: https://github.com/microsoft/DeepSpeed/issues/ 988
|
1137 |
2. Dataset sampler, we use M3E's dataset sampler to ensure that the samples in each batch come from a dataset, and negative samples are more valuable.
|
1138 |
3. instruction. Instruction has greatly improved the performance of the retrieval task in our experiments. We added instructions like 'query: ' and 'result: ' before each training sample.
|