Daniel van Strien PRO

davanstrien

AI & ML interests

Machine Learning Librarian

Articles

Organizations

Posts 16

view post
Post
1600
How can we use open LLMs to create data for training sentence similarity models?

One of the most exciting use cases for LLMs is generating synthetic datasets that can be used to train non-LLM models. In the past, gathering enough data was one of the most significant barriers to training task-specific models. LLMs can potentially help in this area.

I've just written a new blog post on using meta-llama/Meta-Llama-3-70B-Instruct to generate synthetic similarity data based on the approach from Retrieving Texts based on Abstract Descriptions (2305.12517).

https://huggingface.co/blog/davanstrien/synthetic-similarity-datasets