History consideration for reformulation

by Bhawana - opened Jan 19, 2024

Discussion

Bhawana

Jan 19, 2024

What is the number of last questions taken into consideration for reformulation of input query?

Vinitrajputt

Owner Jan 19, 2024

What is the number of last questions taken into consideration for reformulation of input query?

currently its trained on just taking 1 last query as input but as soon as i got time to build dataset, i will release v2.
if you want to contribute mail me at vinit2005singh@gmail.com i will share you the system prompt and pipeline to automate dataset creation.

Bhawana

Jan 19, 2024

I think we need at to consider at least 2+ last queries.

Vinitrajputt

Owner Jan 19, 2024

sure i was also thinking to implimenting multiple queries as input, can you tell me your usecase, so i can better create pipeline which is going to be used

Bhawana

Jan 19, 2024

The usecase I am thinking of is for improving RAG pipeline with query reformaulation.

Vinitrajputt

Owner Jan 19, 2024

i have finetuned this model to convert user question into internet search query. if you can share your rag pipeline or explain me more about what are you expecting from the model so i can better finetune this model.
i am also instrusting about your rag project as i am build my own chat assistant, so it will be helpful if your share more about it.

Bhawana

Jan 22, 2024

Please refer to this content for knowing more about need of query reformulation on RAG pipeline.

Vinitrajputt

Owner Jan 22, 2024

•

edited Jan 22, 2024

sure, i understand the problem, i can build a t5 model which will question to the vector database to find best relevant answer, but for that, i need you to help me to build a dataset. which will consist of multiple examples of what we will feed the model as input and then what we expect from it as output. so for that i have build a prompt template which will be-
<system>system_instrustions_here</system><user>user_query_1</user><assistant>assistant_response_1</assistant><user>user_query_2</user><assistant>assistant_response_2</assistant>... goes on.
the queries can be as much as untill it wont exceed t5 context limit of 512 token. i tried getting more then that at 1024+ but it reduse the quality a lot.
suggestions:-
if you are using this model try adding a prefix "reformulate query: ". it will make this model to perform better as it trained on it.
if you want more context window and have no hardware limitations. i prefer going with microsoft phi-2 as it performs better than this t5.
if you provide me the dataset, i will try to train the model for you. if have any question about the dataset, i will answer them

Bhawana

Jan 23, 2024

I think you should add a model card for this model. Will be useful to people who are testing the model

Bhawana changed discussion status to closed Jan 23, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment