Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davanstrien 
posted an update 25 days ago
Post
1791
How can we use open LLMs to create data for training sentence similarity models?

One of the most exciting use cases for LLMs is generating synthetic datasets that can be used to train non-LLM models. In the past, gathering enough data was one of the most significant barriers to training task-specific models. LLMs can potentially help in this area.

I've just written a new blog post on using meta-llama/Meta-Llama-3-70B-Instruct to generate synthetic similarity data based on the approach from Retrieving Texts based on Abstract Descriptions (2305.12517).

https://huggingface.co/blog/davanstrien/synthetic-similarity-datasets
In this post