File size: 12,503 Bytes
2c1b6a1
 
 
 
 
 
 
 
a66df69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92a6755
5c906aa
92a6755
 
 
 
 
 
 
 
 
 
 
a66df69
 
 
 
 
 
92a6755
a66df69
2c1b6a1
 
 
92a6755
2c1b6a1
 
 
eddb685
3404b7a
92a6755
2c1b6a1
 
 
 
eddb685
a66df69
2c1b6a1
a66df69
2c1b6a1
eddb685
d6fba19
 
3404b7a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
92a6755
 
 
 
 
 
 
 
 
3404b7a
5c906aa
3404b7a
92a6755
 
 
 
 
3404b7a
92a6755
 
 
 
3404b7a
5c906aa
3404b7a
92a6755
 
 
 
 
3404b7a
92a6755
 
 
 
 
 
 
2c1b6a1
eddb685
d6fba19
 
 
 
92a6755
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c906aa
 
 
92a6755
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5c906aa
92a6755
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf42f74
2c1b6a1
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
<html lang="en">
<head>
  <title>Bootstrap Example</title>
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/css/bootstrap.min.css">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.4.1/js/bootstrap.min.js"></script>
  <style>
.faded {
  margin: 0 auto;
  background: var(--window-color);
  box-shadow: 0 0 5px 5px var(--window-color);
  font-family: cursive;
  font-family: "Gill Sans", sans-serif;
  display: inline-block
}
.padded {
    width: 100%;
    max-width: 800px;
    text-align: left;
}
.demo_title {
   font-size: 32px;
   box-shadow: 0 0 5px 5px var(--window-color);
   font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,
                sans-serif,Apple Color Emoji,Segoe UI Emoji;
}
.demo_text {
   font-size: 16px;
   box-shadow: 0 0 5px 5px var(--window-color);
   font-family: -apple-system,BlinkMacSystemFont,Segoe UI,Helvetica,Arial,
                sans-serif,Apple Color Emoji,Segoe UI Emoji;
}
.tab-group {
    font-size: 15px;
}
.tab-content {
    margin-top: 16px;
}
ul > li {
    margin: 3px 0;
}
ol > li {
    margin: 5px 0;
}
/* a:link {
    color: #00194a;
    text-decoration: none;
}
a:visited {
    color: #3f004a;
    text-decoration: none;
} */
  </style>
</head>
<body>

<div class="tab-group" style="width: 100%; margin:0 auto;">
    <div>
        <!-- Nav tabs -->
        <ul class="nav nav-tabs" role="tablist">
            <li role="presentation" class="active"><a href="#tab1" aria-controls="tab1" role="tab" data-toggle="tab">"Efficient Training"</a></li>
            <li role="presentation"><a href="#tab2" aria-controls="tab2" role="tab" data-toggle="tab">Security</a></li>
            <li role="presentation"><a href="#tab3" aria-controls="tab3" role="tab" data-toggle="tab">Make Your Own</a></li>
        </ul>

        <!-- Tab panes -->
        <div class="tab-content">
            <div role="tabpanel" class="tab-pane active" id="tab1">
                <span class="padded faded text">
                <b> TODO 1</b> Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
                </span>
            </div>
            <div role="tabpanel" class="tab-pane" id="tab2">
                <p>In this section, we discuss common concerns related to security of the collaborative training.</p>

                <p>
                <b>Q: If I join a collaborative training, do I allow other people to execute code on my computer?</b>
                </p>

                <p>
                <b>A:</b> During the training, participants only exchange data (gradients, statistics, model weights) and never send code to each other.
                No other peer can execute code on your computer.
                </p>

                <p>
                To join the training, you typically need to run the code (implementing the model, data streaming, training loop, etc.)
                from a repository or a Colab notebook provided by the authors of the experiment.
                This is no different from running any other open source project/Colab notebook.
                </p>

                <p>
                <b>Q: Can a malicious participant influence the training outcome?</b>
                </p>

                <p>
                <b>A:</b> It is indeed possible unless we use some defense mechanism.
                For instance, a malicious participant can damage model weights by sending large numbers instead of the correct gradients.
                The same can happen due to broken hardware or misconfiguration.
                </p>

                <ul>
                <li>
                    <p>
                    One possible defense is using <b>authentication</b> combined with <b>model checkpointing</b>.
                    In this case, participants should log in (e.g. with their Hugging Face account) to interact with the rest of the collaboration.
                    In turn, moderators can screen potential participants and add them to an allowlist.
                    If something goes wrong (e.g. if a participant sends invalid gradients and the model diverges),
                    the moderators remove them from the list and revert the model to the latest checkpoint unaffected by the attack.
                    </p>

                    <p><b>Spoiler (TODO): How to implement authentication in a decentralized system efficiently?</b></p>

                    <p>
                    Nice bonus: using this data, the moderators can acknowledge the personal contribution of each participant.
                    </p>
                </li>
                <li>
                <p>
                    Another defense is replacing the naive averaging of the peers' gradients with an <b>aggregation technique robust to outliers</b>.
                    <a href="https://arxiv.org/abs/2012.10333">Karimireddy et al. (2020)</a>
                    suggested such a technique (named CenteredClip) and proved that it does not significantly affect the model's convergence.
                    </p>

                    <p><b>Spoiler (TODO): How does CenteredClip protect from outliers? (Interactive Demo)</b></p>

                    <p>
                    In our case, CenteredClip is useful but not enough to protect from malicious participants,
                    since it implies that the CenteredClip procedure itself is performed by a trusted server.
                    In contrast, in our decentralized system, all participants can aggregate a part of the gradients and we cannot assume all of them to be trusted.
                    </p>

                    <p>
                    Recently, <a href="https://arxiv.org/abs/2106.11257">Gorbunov et al. (2021)</a>
                    proposed a robust aggregation protocol for decentralized systems that does not require this assumption.
                    This protocol uses CenteredClip as a subroutine but is able to detect and ban participants who performed it incorrectly.
                    </p>
                </li>
                </ul>
            </div>
            <div role="tabpanel" class="tab-pane" id="tab3">
                <p>In this section, we provide a roadmap for you to run the collaborative training yourself.</p>
                <p>
                    <b>Got confused?</b> Feel free to ask any questions at our <a href="https://discord.gg/uGugx9zYvN">Discord</a>!
                </p>
                <ol>
                    <li>
                        Set up dataset streaming:
                        <ul>
                            <li>
                                <a href="https://huggingface.co/docs/datasets/share_dataset.html">Upload</a> your dataset to Hugging Face Hub
                                in a streaming-friendly format (<a href="https://huggingface.co/datasets/laion/laion_100m_vqgan_f8">example</a>).
                            </li>
                            <li>Set up dataset streaming (see the "Efficient Training" section).</li>
                        </ul>
                    </li>
                    <li>
                        Write code of training peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_trainer.py">example</a>):
                        <ul>
                            <li>Implement your model, set up dataset streaming, and write the training loop.</li>
                            <li>
                                Get familiar with the hivemind library
                                (e.g., via the <a href="https://learning-at-home.readthedocs.io/en/latest/user/quickstart.html">quickstart</a>).
                            </li>
                            <li>
                                In the training loop, wrap up your PyTorch optimizer with
                                <a href="https://learning-at-home.readthedocs.io/en/latest/modules/optim.html#hivemind.optim.experimental.optimizer.Optimizer">hivemind.Optimizer</a>
                                (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/task.py#L121">example</a>).
                            </li>
                        </ul>
                    </li>
                    <li>
                        <b>(optional)</b> Write code of auxiliary peers (<a href="https://github.com/learning-at-home/dalle-hivemind/blob/main/run_aux_peer.py">example</a>):
                        <ul>
                            <li>
                                Auxiliary peers a special kind of peers responsible for
                                logging loss and other metrics (e.g., to <a href="https://wandb.ai/">Weights & Biases</a>)
                                and uploading model checkpoints (e.g., to <a href="https://huggingface.co/docs/transformers/model_sharing">Hugging Face Hub</a>).
                            </li>
                            <li>
                                Such peers don't need to calculate gradients and may be run on cheap machines without GPUs.
                            </li>
                            <li>
                                They can serve as a convenient entry point to
                                <a href="https://learning-at-home.readthedocs.io/en/latest/modules/dht.html">hivemind.DHT</a>
                                (i.e., their address can be specified as <code>initial_peers</code>).
                            </li>
                            <li>
                                It is useful to fix their address by providing <code>host_maddrs</code> and <code>identity_path</code>
                                arguments to <code>hivemind.DHT</code>
                                (these are forwarded to the underlying <a href="https://libp2p.io/">libp2p</a> daemon).
                            </li>
                        </ul>
                    </li>
                    <li>
                        <b>(optional)</b> Make it easier for other people to join:
                        <ul>
                            <li>
                                Create notebooks for free GPU providers (Google Colab, Kaggle, AWS SageMaker, etc.).
                                People may run them online and/or download and run them on their own hardware.
                            </li>
                            <li>
                                <a href="https://huggingface.co/organizations/new">Create</a> a Hugging Face organization
                                with all resources related to the training
                                (dataset, model, inference demo, links to a dashboard with loss and other metrics, etc.).
                                Look at <a href="https://huggingface.co/training-transformers-together">ours</a> as an example.
                            </li>
                            <li>
                                Set up an authentication system (see the "Security" section).
                                For example, you can ask people to join your organization with their Hugging Face accounts
                                (Hugging Face allows to share a link for joining or manually approve new participants).
                                This allows you to screen participants,
                                acknowledge their contributions (e.g., make a leaderboard), and
                                ban accounts who behave maliciously.
                            </li>
                            <li>
                                Set up an inference demo for your model (e.g., using <a href="https://huggingface.co/spaces">Spaces</a>) or
                                a script that periodically uploads the inference results to show the training progress.
                            </li>
                        </ul>
                    </li>
                </ol>
            </div>
        </div>

    </div>
</div>
</body>