Sentence Transformers

This task lets you easily train or fine-tune a Sentence Transformer model on your own dataset.

AutoTrain supports the following types of sentence transformer finetuning:

pair: dataset with two sentences: anchor and positive
pair_class: dataset with two sentences: premise and hypothesis and a target label
pair_score: dataset with two sentences: sentence1 and sentence2 and a target score
triplet: dataset with three sentences: anchor, positive and negative
qa: dataset with two sentences: query and answer

Data Format

Sentence Transformers finetuning accepts data in CSV/JSONL format. You can also use a dataset from Hugging Face Hub.

For pair training, the data should be in the following format:

anchor	positive
hello	hi
how are you	I am fine
What is your name?	My name is Abhishek
Which is the best programming language?	Python

For pair_class training, the data should be in the following format:

premise	hypothesis	label
hello	hi	1
how are you	I am fine	0
What is your name?	My name is Abhishek	1
Which is the best programming language?	Python	1

For pair_score training, the data should be in the following format:

sentence1	sentence2	score
hello	hi	0.8
how are you	I am fine	0.2
What is your name?	My name is Abhishek	0.9
Which is the best programming language?	Python	0.7

For triplet training, the data should be in the following format:

anchor	positive	negative
hello	hi	bye
how are you	I am fine	I am not fine
What is your name?	My name is Abhishek	Whats it to you?
Which is the best programming language?	Python	Javascript

For qa training, the data should be in the following format:

query	answer
hello	hi
how are you	I am fine
What is your name?	My name is Abhishek
Which is the best programming language?	Python