Feedback and ideas from first tests
#4
by
patrickfleith
- opened
Issues
- I noticed a tendency to generate many duplicates prompts.
- I struggled to get a good diversity of prompts even after a few attempts in prompt engineering. I tried to force it with "user queries are not just about X, but cover various topics among A, B, C, D, ..." This gave better results
Some ideas to consider
- A prompt template using a placeholder variable like: "User ask queries related to {topic}. Question are short and concise" where 'topic' comes from a user defined list. As a domain-expert myself it is easy to come up with a list of 100 topics in my domain. Alternatively, an intermediate prompt can generate a list of topics for a specific domain. (but this is probably going too far for the purpose of DataCraft?)
- post-generation deduplication. A simple approach using exact match on prompt, or a more subtile with pairwise embedding similarities.
These are just food for thoughts, again amazing work !
Very cool, nice
patrickfleith
changed discussion status to
closed