davanstrien's picture
davanstrien HF staff
Update README.md with notebooks for creating synthetic data for training sentence similarity models
d287b55
|
raw
history blame contribute delete
No virus
737 Bytes

Table of Contents

Creating data for training sentence similarity models

These notebooks demonstrate how to create synthetic data for training sentence similarity models.

  • 01_dataset_preparation covers the initial processing steps to prepare a dataset for the synthetic dataset creation. This notebook uses LlamaIndex to chunk texts into sections that will serve as inputs for creating a synthetic dataset. 02_synthetic_data_creation.ipynb: covers synthetic data creation for training sentence similarity models. The notebook uses Outlines to generate structured data and `vLLM`` to run the LLM.