File size: 1,774 Bytes
d23bce8
 
 
 
 
 
 
 
 
 
 
 
d5b2eed
bce177f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
title: Dadc
emoji: 🏢
colorFrom: red
colorTo: gray
sdk: gradio
sdk_version: 3.0.17
app_file: app.py
pinned: false
license: bigscience-bloom-rail-1.0
---

A basic example of dynamic adversarial data collection with a Gradio app.

*Instructions for someone to use for their own project:*

**Setting up the Space**
1. Clone this repo and deploy it on your own Hugging Face space.
2. Add one of your Hugging Face tokens to the secrets for your space, with the
   name `HF_TOKEN`. Now, create an empty Hugging Face dataset on the hub. Put
   the url of this dataset in the secrets for your space, with the name
   `DATASET_REPO_URL`. It can be a private or public dataset. When you run this
   space on mturk in the following lines, the app will use your token to
   automatically store new hits to your dataset.

**Running Data Collection**
1. On your local repo that you pulled, create a copy of `config.py.example`,
   just called `config.py`. Now, put keys from your AWS account in `config.py`.
   These keys should be for an AWS account that has the
   AmazonMechanicalTurkFullAccess permission. You also need to
   create an mturk requestor account associated with your AWS account.
2. Run `python collect.py` locally. If you run it with the `--live_mode` flag,
   it launches HITs on mturk, using the app you deployed on the space as the
   data collection UI and backend. NOTE: this means that you will need to pay
   real workers. If you don't use the `--live_mode` flag, then it will run the
   HITs on mturk sandbox, which is identical to the normal mturk, but just for
   testing. You can create a worker account and go to the sandbox version to
   test your HIT.

**Profit**
Now, you should be watching hits come into your Hugging Face dataset
automatically!