Feedback and ideas from first tests

#4
by patrickfleith - opened

Issues

  • I noticed a tendency to generate many duplicates prompts.
  • I struggled to get a good diversity of prompts even after a few attempts in prompt engineering. I tried to force it with "user queries are not just about X, but cover various topics among A, B, C, D, ..." This gave better results

Some ideas to consider

  • A prompt template using a placeholder variable like: "User ask queries related to {topic}. Question are short and concise" where 'topic' comes from a user defined list. As a domain-expert myself it is easy to come up with a list of 100 topics in my domain. Alternatively, an intermediate prompt can generate a list of topics for a specific domain. (but this is probably going too far for the purpose of DataCraft?)
  • post-generation deduplication. A simple approach using exact match on prompt, or a more subtile with pairwise embedding similarities.

These are just food for thoughts, again amazing work !

@patrickfleith We just shipped a new version, which might might solve some of these issues.

Very cool, nice

patrickfleith changed discussion status to closed

Sign up or log in to comment