davidberenstein1957 HF staff commited on
Commit
d6f9651
1 Parent(s): 4a432be

docs: Updated readme

Browse files
Files changed (1) hide show
  1. README.md +46 -1
README.md CHANGED
@@ -18,4 +18,49 @@ hf_oauth_scopes:
18
  - inference-api
19
  ---
20
 
21
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  - inference-api
19
  ---
20
 
21
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
22
+
23
+ <div class="header-container">
24
+ <div class="logo-container">
25
+ <a href="https://github.com/argilla-io/distilabel" target="_blank" rel="noopener noreferrer">
26
+ <img src="https://distilabel.argilla.io/latest/assets/distilabel-black.svg" alt="Distilabel Logo" style="width: 150px; height: auto;">
27
+ </a>
28
+ </div>
29
+ <div class="title-container">
30
+ <h1 style="margin: 0; font-size: 2em;">🧬 Synthetic Data Generator</h1>
31
+ <p style="margin: 10px 0 0 0; color: #666; font-size: 1.1em;">Build datasets using natural language</p>
32
+ </div>
33
+ </div>
34
+ <br>
35
+ This repository contains the code for the [free Synthetic Data Generator app](https://huggingface.co/spaces/argilla/synthetic-data-generator), which is hosted on the Hugging Face Hub.
36
+
37
+ ## How it works?
38
+
39
+ ![Synthetic Data Generator](https://huggingface.co/spaces/argilla/synthetic-data-generator/resolve/main/assets/flow.png)
40
+
41
+ Distilabel Synthetic Data Generator is an experimental tool that allows you to easily create high-quality datasets for training and fine-tuning language models. It leverages the power of distilabel and advanced language models to generate synthetic data tailored to your specific needs.
42
+
43
+ This tool simplifies the process of creating custom datasets, enabling you to:
44
+
45
+ - Define the characteristics of your desired application
46
+ - Generate system prompts and tasks automatically
47
+ - Create sample datasets for quick iteration
48
+ - Produce full-scale datasets with customizable parameters
49
+ - Push your generated datasets directly to the Hugging Face Hub
50
+
51
+ By using Distilabel Synthetic Data Generator, you can rapidly prototype and create datasets for, accelerating your AI development process.
52
+
53
+ ## Do you want to run this locally?
54
+
55
+ You can simply clone the repository and run it locally with:
56
+
57
+ ```bash
58
+ pip install -r requirements.txt
59
+ python app.py
60
+ ```
61
+
62
+ ## Do you need more control?
63
+
64
+ Each pipeline is based on a distilabel component, so you can easily run it locally or with other LLMs.
65
+
66
+ Check out the [distilabel library](https://github.com/argilla-io/distilabel) for more information.