Shitao commited on
Commit
c0d5b82
1 Parent(s): 519af80

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -245,6 +245,20 @@ with torch.no_grad():
245
 
246
  ## Fine-tune
247
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
248
  You can fine-tune the reranker with the following code:
249
 
250
  **For llm-based reranker**
 
245
 
246
  ## Fine-tune
247
 
248
+ ### Data Format
249
+
250
+ Train data should be a json file, where each line is a dict like this:
251
+
252
+ ```
253
+ {"query": str, "pos": List[str], "neg":List[str], "prompt": str}
254
+ ```
255
+
256
+ `query` is the query, and `pos` is a list of positive texts, `neg` is a list of negative texts, `prompt` indicates the relationship between query and texts. If you have no negative texts for a query, you can random sample some from the entire corpus as the negatives.
257
+
258
+ See [toy_finetune_data.jsonl](https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding/llm_reranker/toy_finetune_data.jsonl) for a toy data file.
259
+
260
+ ### Train
261
+
262
  You can fine-tune the reranker with the following code:
263
 
264
  **For llm-based reranker**