prhegde commited on
Commit
0ea5861
1 Parent(s): 0bbbf1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -29,13 +29,13 @@ Query rewriting for search (web, e-commerce), Virtual assistants and chatbots, I
29
 
30
  Training Procedure
31
 
32
- 1. The training process begins by initializing the sequence-to-sequence model with Google's T5-base model (https://huggingface.co/google-t5/t5-base).
33
- 2. Initially, the model undergoes supervised training using the MS-MARCO query pairs dataset (https://github.com/Narabzad/msmarco-query-reformulation/tree/main/datasets/queries)
34
  3. Subsequently, the model is fine-tuned using a reinforcement learning (RL) framework to enhance its ability to generate queries that are both diverse and relevant.
35
  4. It uses a policy gradient approach to fine-tune the model. For a given input query, a set of trajectories (reformulated queries) are sampled from the model and reward is computed. Policy gradient algorithm is applied to update the model.
36
  5. Rewards are heuristically computed to enhance the model's paraphrasing capability. However, these rewards can be substituted with other domain-specific or goal-specific reward functions as needed.
37
 
38
- Refer https://github.com/PraveenSH/RL-Query-Reformulation for more details.
39
 
40
 
41
  ### Model Sources
 
29
 
30
  Training Procedure
31
 
32
+ 1. The training process begins by initializing the sequence-to-sequence model with Google's [T5-base model ](https://huggingface.co/google-t5/t5-base).
33
+ 2. Initially, the model undergoes supervised training using the [MS-MARCO query pairs dataset](https://github.com/Narabzad/msmarco-query-reformulation/tree/main/datasets/queries)
34
  3. Subsequently, the model is fine-tuned using a reinforcement learning (RL) framework to enhance its ability to generate queries that are both diverse and relevant.
35
  4. It uses a policy gradient approach to fine-tune the model. For a given input query, a set of trajectories (reformulated queries) are sampled from the model and reward is computed. Policy gradient algorithm is applied to update the model.
36
  5. Rewards are heuristically computed to enhance the model's paraphrasing capability. However, these rewards can be substituted with other domain-specific or goal-specific reward functions as needed.
37
 
38
+ Refer [here](https://github.com/PraveenSH/RL-Query-Reformulation) for more details.
39
 
40
 
41
  ### Model Sources