--- title: Dadc emoji: 🏢 colorFrom: red colorTo: gray sdk: gradio sdk_version: 3.0.17 app_file: app.py pinned: false license: bigscience-bloom-rail-1.0 --- A basic example of dynamic adversarial data collection with a Gradio app. *Instructions for someone to use for their own project:* **Setting up the Space** 1. Clone this repo and deploy it on your own Hugging Face space. 2. Add one of your Hugging Face tokens to the secrets for your space, with the name `HF_TOKEN`. Now, create an empty Hugging Face dataset on the hub. Put the url of this dataset in the secrets for your space, with the name `DATASET_REPO_URL`. It can be a private or public dataset. When you run this space on mturk in the following lines, the app will use your token to automatically store new hits to your dataset. **Running Data Collection** 1. On your local repo that you pulled, create a copy of `config.py.example`, just called `config.py`. Now, put keys from your AWS account in `config.py`. These keys should be for an AWS account that has the AmazonMechanicalTurkFullAccess permission. You also need to create an mturk requestor account associated with your AWS account. 2. Run `python collect.py` locally. If you run it with the `--live_mode` flag, it launches HITs on mturk, using the app you deployed on the space as the data collection UI and backend. NOTE: this means that you will need to pay real workers. If you don't use the `--live_mode` flag, then it will run the HITs on mturk sandbox, which is identical to the normal mturk, but just for testing. You can create a worker account and go to the sandbox version to test your HIT. **Profit** Now, you should be watching hits come into your Hugging Face dataset automatically!