|
# PROTECT THE WEAK: CLASS-FOCUSED ONLINE |
|
## LEARNING FOR ADVERSARIAL TRAINING |
|
|
|
**Anonymous authors** |
|
Paper under double-blind review |
|
|
|
ABSTRACT |
|
|
|
Adversarial training promises a defense against adversarial perturbations in terms |
|
of average accuracy. In this work, we identify that the focus on the average accuracy metric can create vulnerabilities to the “weakest” class. For instance, on |
|
CIFAR10, where the average accuracy is 47%, the worst class accuracy can be as |
|
low as 14%. The performance sacrifice of the weakest class can be detrimental |
|
for real-world systems, if indeed the threat model can adversarially choose the |
|
class to attack. To this end, we propose to explicitly minimize the worst class error, which results in a min-max-max optimization formulation. We provide high |
|
probability convergence guarantees of the worst class loss for our method, dubbed |
|
as class focused online learning (CFOL), using techniques from the online learning community. CFOL can be plugged into existing training setups with virtually |
|
no overhead in computation. We observe significant improvements on the worst |
|
class accuracy of 30% for CIFAR10. We also observe consistent behavior across |
|
CIFAR100 and STL10. Intriugingly, we find that minimizing the worst case can |
|
even sometimes improve the average. |
|
|
|
1 INTRODUCTION |
|
|
|
The susceptibility of neural networks to adversarial noise (Goodfellow et al., 2014; Szegedy et al., |
|
2013) has been a grave concern over the launch of such systems in real-world applications. Several techniques defending such attacks that optimize the average performance have been proposed |
|
(Papernot et al., 2016; Raghunathan et al., 2018; Guo et al., 2017; Madry et al., 2017; Zhang et al., |
|
2019). In response, even stronger attacks have been proposed (Carlini & Wagner, 2016; Engstrom |
|
et al., 2018; Carlini, 2019). Indeed, recent studies demonstrate that regardless of the defense there |
|
exists an attack that can lower the average performance of the system (Shafahi et al., 2018). |
|
|
|
In this work, we argue that the average performance is not the only criterion that is of interest |
|
for real-world applications. For classification, in particular, optimizing the average performance |
|
provides no guarantees for the “weakest” class. This is critical in scenarios where an attacker can |
|
pick the class adversarially in addition to the adversarial perturbation. It turns out that the worst |
|
performing class can indeed be much worse than the average in adversarial training. This difference |
|
is already present in clean training but we critically observe, that the gap between the average and the |
|
worst, is greatly exacerbated in adversarial training. This gap can already be observed on CIFAR10 |
|
where the accuracy across classes is far from uniform with 47% average robust accuracy while the |
|
worst class is 14% (see Figure 1). The effect is even more prevalent when more classes are present |
|
as in CIFAR100 where we observe that the worst class has zero accuracy while the best has 70% (see |
|
Appendix C where we include multiple other datasets). Despite the focus on adverarial training, we |
|
note that the same effect can be observed for robust evaluation after clean training (see Figure 4). |
|
|
|
This dramatic drop in accuracy for the weakest classes begs for different approaches than the classical empirical risk minimization (ERM), which focuses squarely on the average loss. We suggest |
|
a simple alternative, which we dub class focused online learning (CFOL), that can be plugged into |
|
existing adversarial training procedures. Instead of minimizing the average performance over the |
|
dataset we sample from an adversarial distribution over classes that is learned jointly with the model |
|
parameters. In this way we aim at becoming robust to an attacker that can adversarially choose the |
|
_class. The focus of this paper is thus on the robust accuracy of the weakest classes instead of the_ |
|
average robust accuracy. |
|
|
|
|
|
----- |
|
|
|
Clean training |
|
|
|
|
|
Adversarial training |
|
|
|
|
|
1.0 |
|
|
|
|
|
|
|
1.0 |
|
|
|
0.8 |
|
|
|
0.6 |
|
|
|
0.4 |
|
|
|
0.2 |
|
|
|
0.0 |
|
|
|
|
|
0.8 |
|
|
|
0.6 |
|
|
|
0.4 |
|
|
|
0.2 |
|
|
|
0.0 |
|
|
|
|
|
Figure 1: The error across classes is already not perfectly uniform in clean training on CIFAR10. |
|
However, this phenomenon is significantly worsened in adversarial training when considering the |
|
robust accuracy. That is, some classes perform much worse than the average. The worst class |
|
accuracy and average accuracy is depicted with a red and black line respectively. |
|
|
|
Concretely, we make the following contributions: |
|
|
|
|
|
|
|
- We propose a simple solution which relies on the classical bandit algorithm from the online |
|
learning literature, namely the Exponential-weight algorithm for Exploration and Exploitation (Exp3) (Auer et al., 2002). The method is directly compatible with standard adversarial training procedures (Madry et al., 2017), by replacing the empirical distribution with an |
|
adaptively learned adversarial distribution over classes. |
|
|
|
- We carry out extensive experiments comparing CFOL against three strong baselines across |
|
three datasets, where we consistently observe that CFOL improves the weakest classes. |
|
|
|
- We support the empirical results with high probability convergence guarantees for the worst |
|
class accuracy and establish direct connection with the conditional value at risk (CVaR) |
|
(Rockafellar et al., 2000) uncertainty set from the distributional robust optimization. |
|
|
|
**Overview** We define our new objective in Section 2, followed by the description of the proposed |
|
method in Section 3. In Section 3.1 we prove convergence guarantees. We then move to the empirical work in Section 4, where we observe and resolve the issue of robust overfitting in the context of |
|
the worst class. Our main empirical findings are covered in Section 5 and we conclude in Section 6. |
|
|
|
|
|
1.1 RELATED WORK |
|
|
|
**Adversarial examples** Goodfellow et al. (2014); Szegedy et al. (2013) are the first to make the |
|
important observation that deep neural networks are vulnerable to small adversarially perturbation |
|
of the input. Since then, there has been a growing body of literature addressing this safety critical |
|
issue, spanning from certified robust model (Raghunathan et al., 2018), distillation (Papernot et al., |
|
2016), input augmentation (Guo et al., 2017), to adversarial training (Madry et al., 2017; Zhang |
|
et al., 2019). We focus on adversarial training in this paper. While certified robustness is desirable, |
|
adversarial training remains one of the most successful defenses in practice. |
|
|
|
In a concurrent work Tian et al. (2021) independently observe the non-uniform accuracy over classes |
|
in adversarial training, further strengthening the case that lack of class-wise robustness is indeed an |
|
issue. However, they mainly focus on constructing an attack that can enlarge this disparity. |
|
|
|
**Minimizing the maximum** Connected to our method is that of focused online learning (FOL) |
|
(Shalev-Shwartz & Wexler, 2016), which similarly takes a bandit approach, but instead re-weights |
|
the distribution over the N training examples, independent of the class label. This naturally leads |
|
to a convergence rate in terms of the number of examples N instead of the number of classes k for |
|
which usually k ≪ _N_ . We compare in more detail theoretically and empirically in Section 3.1 and |
|
Section 5 respectively. |
|
|
|
Interpolations between average and maximum loss have been considered in various other settings: |
|
for class imbalanced datasets (Lin et al., 2017), in federated learning (Li et al., 2019), and more |
|
generally the tilted empirical risk minimization (Li et al., 2020; Lee et al., 2020). |
|
|
|
|
|
----- |
|
|
|
**Distributional robust optimization** The accuracy over the worst class can be seen as a particular |
|
re-weighing the data distribution which adversarially assigns all weight to a single class. Worst |
|
case perturbation of the data distribution have more generally been studied under the framework of |
|
distributional robust stochastic optimization (DRO) (Ben-Tal et al., 2013; Shapiro, 2017). Instead of |
|
attempting to minimizing the empirical risk on a training distribution P0, this framework considers |
|
some uncertainty set around the training distribution (P0) and seeks to minimize the worst case |
|
_U_ |
|
risk within this set, supQ∈U (P0) Ex∼Q[ℓ(x)]. |
|
|
|
A choice of uncertainty set, which has been given significant attention in the community, is conditional value at risk (CVaR), which aims at minimizing the weighted average of the tail risk (Rockafellar et al., 2000; Levy et al., 2020; Kawaguchi & Lu, 2020; Fan et al., 2017; Curi et al., 2019). |
|
CVaR has been specialized to a re-weighting over class labels, namely labeled conditional value at |
|
risk (LCVaR) (Xu et al., 2020). This was originally derived in the context of imbalanced dataset to |
|
re-balance the classes. It is still applicable in our setting and we thus provide a comparison. The |
|
original empirical work of (Xu et al., 2020) only considers the full-batch setting. We complement |
|
this by demonstrating LCVaR in a stochastic setting. |
|
|
|
In Duchi et al. (2019); Duchi & Namkoong (2018) they are, similarly to our setting, interested in |
|
uniform performance over various groups. However, these groups are assumed to be latent subpopulations, which introduces significant complications. It is thus concerned with a different setting, an |
|
example being training on a dataset implicitly consisting of multiple text corpora. |
|
|
|
CFOL can also be formulated in the framework of DRO by choosing an uncertainty set that can |
|
re-weight the k class-conditional risks. The precise definition is given in Appendix B.1. We further |
|
establish a direct connection between the uncertainty sets of CFOL and CVaR that we make precise |
|
in Appendix B.1, which also contains a summary of the most relevant related methods in Table 4. |
|
|
|
2 PROBLEM FORMULATION AND PRELIMINARIES |
|
|
|
**Notation** The underlying data distribution is denoted by D with examples x ∈ R[d] and classes |
|
_y_ [k]. A given iteration is characterized by t [T ], while p[y]t [indicates the][ y][th][ index of the] |
|
_t[th] ∈iterate. The indicator function is denoted with ∈ 1_ boolean and unif(n) indicates the uniform |
|
_{_ _}_ |
|
distribution over n elements. An overview of the notation is provided in Appendix A. |
|
|
|
In classification, we are normally interested in minimizing the population risk E(x,y) [ℓ(θ, x, y)] |
|
_∼D_ |
|
over our model parameters θ ∈ R[p], where ℓ is some loss function of θ and example x ∈ R[d] with |
|
an associated class y ∈ [k]. Madry et al. (2017) formalized adversarially training by modifying |
|
this objective with an adversarial perturbation to each example. That is, we instead want to find a |
|
parameterization θ of our predictive model which solves the following optimization problem: |
|
|
|
|
|
min _L(θ) := E(x,y)_ |
|
_θ_ _∼D_ |
|
|
|
|
|
max _,_ (1) |
|
_δ_ _[ℓ][(][θ, x][ +][ δ, y][)]_ |
|
_∈S_ |
|
|
|
|
|
where each x is now perturbed by adversarial noise δ ∈S ⊆ R[d]. Common choices of S include |
|
norm-ball constraints as in Madry et al. (2017) or bounding some notion of perceptual distance |
|
(Laidlaw et al., 2020). When the distribution over classes is uniform this is implicitly minimizing |
|
the average loss over all class. This does not guarantee high accuracy for the worst class as illustrated |
|
in Figure 1, since we only know with certainty that max ≥ avg. |
|
|
|
Instead, we will focus on a different objective, namely minimizing the worst class-conditioned risk: |
|
|
|
|
|
max _._ (2) |
|
_δ_ _[ℓ][(][θ, x][ +][ δ, y][)]_ |
|
_∈S_ |
|
|
|
|
|
_Ly(θ) := Ex_ _p_ ( _y)_ |
|
_∼_ _D_ _·|_ |
|
|
|
|
|
min max |
|
_θ_ _y∈[k]_ |
|
|
|
|
|
This follows the philosophy that “a chain is only as strong as its weakest link”. In our particular |
|
setting it models a scenario where the attacker can adversarially choose what class the model is |
|
evaluated on. In safety critical application, such as autonomous driving, modeling even just a single |
|
class wrong can still have catastrophic consequences. |
|
|
|
As the maximum in Equation 2 is a discrete maximization problem its treatment requires more care. |
|
We will take a common approach and construct a convex relaxation to this problem in Section 3. |
|
|
|
|
|
----- |
|
|
|
Class sampling distribution |
|
|
|
|
|
0.15 |
|
|
|
0.10 |
|
|
|
p |
|
|
|
0.05 |
|
|
|
0.00 |
|
|
|
airplaneautomobilebird cat deer dog froghorseshiptruck |
|
|
|
Figure 2: Contrary to ERM, which samples the examples uniformly, CFOL samples from an adaptive distribution. The learned adversarial distribution is non-uniform over the classes in CIFAR10 |
|
when using adversarial training. As expected, the hardest classes are also most frequently sampled. |
|
|
|
|
|
3 METHOD |
|
|
|
Since we do not have access to the true distribution D, we will instead minimize over the provided |
|
empirical distribution. Let Ny be the set of data point indices for class y such that the total number |
|
of examples is N = _y=1_ |
|
_class-conditioned risk,_ _[|N][y][|][. Then, we are interested in minimizing the][ maximum empirical]_ |
|
|
|
[P][k] 1 |
|
|
|
max _Ly(θ) :=_ max (3) |
|
_y∈[k]_ _|Ny|_ _iX∈Ny_ _δ∈S_ _[ℓ][(][θ, x][i][ +][ δ, y][)][.]_ |
|
b |
|
|
|
|
|
We relax this discrete problem to a continuous problem over the simplex ∆k, |
|
|
|
_k_ |
|
|
|
max _Ly(θ)_ max _pyLy(θ)._ (4) |
|
_y_ [k] _≤_ _p_ ∆k |
|
_∈_ _∈_ _y=1_ |
|
|
|
X |
|
|
|
Note that equality is attained when p is a dirac on the argmax over classes.b [b] |
|
|
|
Equation 4 leaves us with a two-player zero-sum game between the model parameters θ and the |
|
class distribution p. From the perspective of p it is just a linear objective, albeit adversarially picked |
|
by the model. This immediately let us use a no-regret algorithm under simplex constraint, namely |
|
Hedge (Freund & Schapire, 1997): |
|
|
|
|
|
_wy[t]_ [=][ w]y[t][−][1] _−_ _ηL[b]y(θ[t]),_ _p[t]y_ [= exp] _wy[t]_ _/_ exp _wy[t]_ _._ (Hedge) |
|
|
|
_y=1_ |
|
|
|
|