File size: 9,368 Bytes
9375c9a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
// Copyright (C) 2017 Davis E. King (davis@dlib.net)
// License: Boost Software License See LICENSE.txt for the full license.
#undef DLIB_UPPER_bOUND_FUNCTION_ABSTRACT_Hh_
#ifdef DLIB_UPPER_bOUND_FUNCTION_ABSTRACT_Hh_
#include "../matrix.h"
#include <limits>
namespace dlib
{
// ----------------------------------------------------------------------------------------
struct function_evaluation
{
/*!
WHAT THIS OBJECT REPRESENTS
This object records the output of a real valued function in response to
some input.
In particular, if you have a function F(x) then the function_evaluation is
simply a struct that records x and the scalar value F(x).
!*/
function_evaluation() = default;
function_evaluation(const matrix<double,0,1>& x, double y) :x(x), y(y) {}
matrix<double,0,1> x;
double y = std::numeric_limits<double>::quiet_NaN();
};
// ----------------------------------------------------------------------------------------
class upper_bound_function
{
/*!
WHAT THIS OBJECT REPRESENTS
This object represents a piecewise linear non-parametric function that can
be used to define an upper bound on some more complex and unknown function.
To describe this precisely, lets assume there is a function F(x) which you
are capable of sampling from but otherwise know nothing about, and that you
would like to find an upper bounding function U(x) such that U(x) >= F(x)
for any x. It would also be good if U(x)-F(x) was minimal. I.e. we would
like U(x) to be a tight upper bound, not something vacuous like U(x) =
infinity.
The upper_bound_function class is a tool for creating this kind of upper
bounding function from a set of function_evaluations of F(x). We do this
by considering only U(x) of the form:
U = [](matrix<double,0,1> x) {
double min_ub = infinity;
for (size_t i = 0; i < POINTS.size(); ++i) {
function_evaluation p = POINTS[i]
double local_bound = p.y + sqrt(noise_terms[i] + trans(p.x-x)*M*(p.x-x))
min_ub = min(min_ub, local_bound)
}
return min_ub;
}
Where POINTS is an array of function_evaluation instances drawn from F(x),
M is a diagonal matrix, and noise_terms is an array of scalars.
To create an upper bound U(x), the upper_bound_function takes a POINTS array
containing evaluations of F(x) as input and solves the following quadratic
program to find the parameters of U(x):
min_{M,noise_terms}: sum(squared(M)) + sum(squared(noise_terms/relative_noise_magnitude))
s.t. U(POINTS[i].x) >= POINTS[i].y, for all i
noise_terms[i] >= 0
min(M) >= 0
M is a diagonal matrix
Therefore, the quadratic program finds the U(x) that always upper bounds
F(x) on the supplied POINTS, but is otherwise as small as possible.
The inspiration for the upper_bound_function object came from the AdaLIPO
algorithm from this excellent paper:
Global optimization of Lipschitz functions
Malherbe, Cédric and Vayatis, Nicolas
International Conference on Machine Learning - 2017
In that paper, they propose to use a simpler U(x) where noise_terms is
always 0 and M is a diagonal matrix where each diagonal element is the same
value. Therefore, there is only a single scalar parameter for U(x) in
their formulation of the problem. This causes difficulties if F(x) is
stochastic or has discontinuities since, without the noise term, M will
become really huge and the upper bound becomes vacuously large. It is also
problematic if the gradient of F(x) with respect to x contains elements of
widely varying magnitude since the simpler formulation of U(x) assumes a
uniform rate of change regardless of which dimension is varying.
!*/
public:
upper_bound_function(
);
/*!
ensures
- #num_points() == 0
- #dimensionality() == 0
!*/
explicit upper_bound_function(
const std::vector<function_evaluation>& points,
const double relative_noise_magnitude = 0.001,
const double solver_eps = 0.0001
);
/*!
requires
- all the x vectors in points must have the same non-zero dimensionality.
- relative_noise_magnitude >= 0
- solver_eps > 0
ensures
- Creates an upper bounding function U(x), as described above, assuming that
the given points are drawn from F(x).
- Uses the provided relative_noise_magnitude when solving the QP, as
described above. Note that relative_noise_magnitude can be set to 0. If
you do this then all the noise terms are constrained to 0. You should
only do this if you know F(x) is non-stochastic and continuous
everywhere.
- When solving the QP used to find the parameters of U(x), the upper
bounding function, we solve the QP to solver_eps accuracy. It's
possible that large enough solver_eps can lead to upper bounds that don't
upper bound all the supplied points. But for reasonable epsilon values
this shouldn't be a problem.
- #num_points() == points.size()
- #dimensionality() == points[0].x.size()
!*/
upper_bound_function(
const double relative_noise_magnitude,
const double solver_eps
);
/*!
requires
- relative_noise_magnitude >= 0
- solver_eps > 0
ensures
- #num_points() == 0
- #dimensionality() == 0
- This destructor is the same as calling the above constructor with points.size()==0
!*/
void add (
const function_evaluation& point
);
/*!
requires
- num_points() == 0 || point.x.size() == dimensionality()
- point.x.size() != 0
ensures
- Adds point to get_points().
- Incrementally updates the upper bounding function with the given function
evaluation. That is, we assume that F(point.x)==point.y and solve the QP
described above to find the new U(x) that upper bounds all the points
this object knows about (i.e. all the points in get_points() and the new point).
- Calling add() is much faster than recreating the upper_bound_function
from scratch with all the points. This is because we warm start with the
previous solution to the QP. This is done by discarding any non-active
constraints and solving the QP again with only the previously active
constraints and the new constraints formed by all the pairs of the new
point and the old points. This means the QP solved by add() is much
smaller than the QP that would be solved by a fresh call to the
upper_bound_function constructor.
!*/
const std::vector<function_evaluation>& get_points(
) const;
/*!
ensures
- returns the points from F(x) used to define this upper bounding function.
These are all the function_evaluation objects given to this object via
its constructor and add().
!*/
long num_points(
) const;
/*!
ensures
- returns the number of points used to define the upper bounding function.
(i.e. returns get_points().size())
!*/
long dimensionality(
) const;
/*!
ensures
- returns the dimensionality of the input vectors to the upper bounding function.
!*/
double operator() (
const matrix<double,0,1>& x
) const;
/*!
requires
- num_points() > 0
- x.size() == dimensionality()
ensures
- return U(x)
(i.e. returns the upper bound on F(x) at x given by our upper bounding function)
!*/
};
// ----------------------------------------------------------------------------------------
}
#endif // DLIB_UPPER_bOUND_FUNCTION_ABSTRACT_Hh_
|