Alexander Watson commited on
Commit
2106945
β€’
1 Parent(s): 06594f2

doc updates

Browse files
Files changed (1) hide show
  1. app.py +19 -31
app.py CHANGED
@@ -32,19 +32,25 @@ logger.addHandler(handler)
32
 
33
  SAMPLE_DATASET_URL = "https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/llm-training-data/dolly-examples-qa-with-context.csv"
34
  WELCOME_MARKDOWN = """
35
- Gretel Navigator is an interface designed to help you create high-quality, diverse training data examples through synthetic data generation techniques. It aims to assist in scenarios where you have limited training data or want to enhance the quality and diversity of your existing dataset.
36
 
37
- ## 🎯 Key Use Cases
38
 
39
- 1. **Augment Existing Training Data**: Expand your existing training data with additional synthetic examples generated by Gretel Navigator. This can help improve the robustness and generalization of your AI models.
 
 
 
40
 
41
- 2. **Create Diverse Training or Evaluation Data**: Generate diverse training or evaluation data from plain text or seed examples. This ensures your AI models are exposed to a wide range of scenarios and edge cases during training.
42
 
43
- 3. **Address Data Limitations**: Generate additional examples to fill gaps in your dataset, particularly for underrepresented classes, rare events, or challenging scenarios. This helps improve your model's ability to handle diverse real-world situations.
44
 
45
- 4. **Mitigate Bias and Toxicity**: Generate training examples that are unbiased and non-toxic by incorporating diverse perspectives and adhering to ethical guidelines. This promotes fairness and responsible AI development.
 
 
 
46
 
47
- 5. **Enhance Model Performance**: Improve the performance of your AI models across various tasks by training them on diverse synthetic data generated by Gretel Navigator.
48
 
49
  ## πŸ”§ Getting Started
50
 
@@ -57,30 +63,15 @@ To start using Gretel Navigator, you'll need:
57
 
58
  Gretel Navigator supports the following formats for input data:
59
 
60
- - Existing AI training or evaluation data formats:
61
- - Input/Output pair format (or instruction/response) with any number of ground truth or "context fields".
62
- - Plain text data.
63
- - File formats:
64
- - Hugging Face dataset
65
- - CSV
66
- - JSON
67
- - JSONL
68
 
69
  ## πŸ“€ Output
70
 
71
  Gretel Navigator generates one additional training example per row in the input/output pair format. You can specify requirements for the input and output pairs in the configuration. Run the process multiple times to scale your data to any desired level.
72
 
73
- ## 🌟 AI Alignment Techniques
74
-
75
- Gretel Navigator incorporates AI alignment techniques to generate high-quality synthetic data:
76
-
77
- - Diverse Instruction and Response Generation
78
- - AI-Aligning-AI Methodology (AAA) for iterative data quality enhancement
79
- - Quality Evaluation
80
- - Bias and Toxicity Detection
81
-
82
- By leveraging these techniques, Gretel Navigator helps you create training data that leads to more robust, unbiased, and high-performing AI models.
83
-
84
  ---
85
 
86
  Ready to enhance your AI training data and unlock the full potential of your models? Let's get started with Gretel Navigator! πŸš€
@@ -89,9 +80,9 @@ Ready to enhance your AI training data and unlock the full potential of your mod
89
 
90
  def main():
91
  st.set_page_config(page_title="Gretel", layout="wide")
92
- st.title("🎨 Gretel Navigator: Enhance Your AI Training Data")
93
  st.write(
94
- "Generate diverse synthetic training data from text or existing datasets to improve the performance and robustness of your AI models."
95
  )
96
 
97
  with st.expander("Introduction", expanded=False):
@@ -347,9 +338,6 @@ def main():
347
  st.markdown("---")
348
  st.markdown("### Format Prompts")
349
 
350
- st.markdown("---")
351
- st.markdown("### Format Prompts")
352
-
353
  system_prompt = st.text_area(
354
  "System Prompt",
355
  value=st.session_state.get(
 
32
 
33
  SAMPLE_DATASET_URL = "https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/llm-training-data/dolly-examples-qa-with-context.csv"
34
  WELCOME_MARKDOWN = """
35
+ Gretel Navigator is a compound AI system designed to help you create high-quality, diverse training data examples through synthetic data generation techniques. It aims to assist in scenarios where you have limited training data or want to enhance the quality and diversity of your existing dataset.
36
 
37
+ Key Use Cases
38
 
39
+ 1. **Create Diverse Training or Evaluation Data from a seed**: Generate diverse training or evaluation data from plain text or seed examples. This ensures your AI models are exposed to a wide range of scenarios and edge cases during training.
40
+ 2. **Enhance Limited Training Data**: Expand your existing training data with additional synthetic examples generated by Gretel Navigator. This can help improve the robustness and generalization of your AI models.
41
+ 3. **Mitigate Bias and Toxicity**: Generate training examples that are unbiased and non-toxic by incorporating diverse perspectives and adhering to ethical guidelines. This promotes fairness and responsible AI development.
42
+ 4. **Enhance Model Performance**: Improve the performance of your AI models across various tasks by training them on domain specific synthetic data generated by Gretel Navigator.
43
 
44
+ ## 🌟 Synthetic Data Generation
45
 
46
+ Gretel Navigator utilizes an agent-based system to generate high-quality synthetic data:
47
 
48
+ - Diverse Instruction and Response Generation
49
+ - Quality Evaluation and Ranking
50
+ - AI-Aligning-AI Methodology (AAA) for iterative data quality enhancement
51
+ - Co-teach, suggestions, and self-teaching for iterative improvement.
52
 
53
+ Leveraging these techniques, Gretel Navigator helps you create training data that leads to more robust, unbiased, and high-performing AI models.
54
 
55
  ## πŸ”§ Getting Started
56
 
 
63
 
64
  Gretel Navigator supports the following formats for input data:
65
 
66
+ - Seed data
67
+ - Input/Output pairs (or instruction/response) with any number of ground truth or "context fields".
68
+ - Plain text (ground truth data)
69
+ - File formats: Hugging Face dataset, CSV, JSON, JSONL
 
 
 
 
70
 
71
  ## πŸ“€ Output
72
 
73
  Gretel Navigator generates one additional training example per row in the input/output pair format. You can specify requirements for the input and output pairs in the configuration. Run the process multiple times to scale your data to any desired level.
74
 
 
 
 
 
 
 
 
 
 
 
 
75
  ---
76
 
77
  Ready to enhance your AI training data and unlock the full potential of your models? Let's get started with Gretel Navigator! πŸš€
 
80
 
81
  def main():
82
  st.set_page_config(page_title="Gretel", layout="wide")
83
+ st.title("🎨 Gretel Navigator: Create Synthetic Data from a Prompt")
84
  st.write(
85
+ "Generate diverse synthetic training data from text or existing datasets to improve or evaluate AI models."
86
  )
87
 
88
  with st.expander("Introduction", expanded=False):
 
338
  st.markdown("---")
339
  st.markdown("### Format Prompts")
340
 
 
 
 
341
  system_prompt = st.text_area(
342
  "System Prompt",
343
  value=st.session_state.get(