CARL Fine-tuning

#3
by mehdi1964 - opened

Dear community,
What structure should the dataset have for fine-tuning?
like for example is it good to use a text data that has the following structure in a txt file?

User:
My wife and mother are at loggerheads ...
CARL:
What you're describing is what psychologists call triangulation...

.....

or the following structure is better in a csv file?

ID,Type,Utterance,Dialog_Act
27_0,T,"Okay, I want to thank you for your participation so far in this intake...
27_1,P,yeah.,gc
27_2,T,Are you adopted?,ynq
27_3,P,No.,yna

or any recommendations?

@mehdi1964 If you are planning to finetune existing Carl model then better to stick with { "from": "human", "value": "xxx...." }, { "from": "gpt", "value": "xxx..." }

@ajibawa-2023 In what file extension should I save the text (*.txt or *.csv)?
Does it matter at all?

.json or .jsonl are preferred.

ajibawa-2023 changed discussion status to closed

Sign up or log in to comment