Update README.md
Browse files
README.md
CHANGED
@@ -29,13 +29,13 @@ Query rewriting for search (web, e-commerce), Virtual assistants and chatbots, I
|
|
29 |
|
30 |
Training Procedure
|
31 |
|
32 |
-
1. The training process begins by initializing the sequence-to-sequence model with Google's T5-base model (https://huggingface.co/google-t5/t5-base).
|
33 |
-
2. Initially, the model undergoes supervised training using the MS-MARCO query pairs dataset
|
34 |
3. Subsequently, the model is fine-tuned using a reinforcement learning (RL) framework to enhance its ability to generate queries that are both diverse and relevant.
|
35 |
4. It uses a policy gradient approach to fine-tune the model. For a given input query, a set of trajectories (reformulated queries) are sampled from the model and reward is computed. Policy gradient algorithm is applied to update the model.
|
36 |
5. Rewards are heuristically computed to enhance the model's paraphrasing capability. However, these rewards can be substituted with other domain-specific or goal-specific reward functions as needed.
|
37 |
|
38 |
-
Refer https://github.com/PraveenSH/RL-Query-Reformulation for more details.
|
39 |
|
40 |
|
41 |
### Model Sources
|
|
|
29 |
|
30 |
Training Procedure
|
31 |
|
32 |
+
1. The training process begins by initializing the sequence-to-sequence model with Google's [T5-base model ](https://huggingface.co/google-t5/t5-base).
|
33 |
+
2. Initially, the model undergoes supervised training using the [MS-MARCO query pairs dataset](https://github.com/Narabzad/msmarco-query-reformulation/tree/main/datasets/queries)
|
34 |
3. Subsequently, the model is fine-tuned using a reinforcement learning (RL) framework to enhance its ability to generate queries that are both diverse and relevant.
|
35 |
4. It uses a policy gradient approach to fine-tune the model. For a given input query, a set of trajectories (reformulated queries) are sampled from the model and reward is computed. Policy gradient algorithm is applied to update the model.
|
36 |
5. Rewards are heuristically computed to enhance the model's paraphrasing capability. However, these rewards can be substituted with other domain-specific or goal-specific reward functions as needed.
|
37 |
|
38 |
+
Refer [here](https://github.com/PraveenSH/RL-Query-Reformulation) for more details.
|
39 |
|
40 |
|
41 |
### Model Sources
|