File size: 21,156 Bytes
b9a0f21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
# Applying Transformations


```python
import param
import numpy as np
import holoviews as hv
from holoviews import opts

hv.extension('bokeh', 'matplotlib')
```

HoloViews objects provide a convenient way of wrapping your data along with some metadata for exploration and visualization. For the simplest visualizations, you can simply declare a small collection of elements which can then be composed or placed in an appropriate container. As soon as the task becomes more complex, it is natural to write functions that output HoloViews objects.

In this user guide, we will introduce to related concepts to express transforms of some data, first we will cover `dim` transforms to express simple transforms of some data and then ``Operation`` classes to express more complex transformations. Operations provide a consistent structure for such code, making it possible to write general functions that can process HoloViews objects. This enables powerful new ways of specifying HoloViews objects computed from existing data, allowing the construction of flexible data processing pipelines. Examples of such operations are ``histogram``, ``rolling``, ``datashade`` or ``decimate``, which apply some computation on certain types of Element and return a new Element with the transformed data.

In this user guide we will discover how transforms and operations work, how to control their parameters and how to chain them. The [Data Processing Pipelines](./14-Data_Pipelines.ipynb) guide extends what we will have learned to demonstrate how operations can be applied lazily by using the ``dynamic`` flag, letting us define deferred processing pipelines that can drive highly complex visualizations and dashboards.

## Transforms

A transform is expressed using a `dim` expression, which we originally introduced in the context of the [Style Mapping](./04-Style_Mapping.ipynb) user guide. It allows expressing some deferred computation on a HoloViews Element. This can be a powerful way to transform some 
data quickly and easily. Let us start by declaring a `Dataset` with a single dimension `x`:


```python
ds = hv.Dataset(np.linspace(0, np.pi), 'x')
ds
```

The `Dataset` x values consist of an array of monotonically increasing values from 0 to np.pi. We can now define a transform which takes these values and transform them:


```python
expr = np.sin(hv.dim('x')*10+5)
expr
```

This expression takes these values multiplies them by 10, adds 5 and then calculates the `sine`. Using the `.transform` method we can now apply this expression to the `Dataset` and assign the values to a newly created `y` dimension by supplying it as a keyword (in the same way we could override the `x` dimension):


```python
transformed = ds.transform(y=expr)
transformed
```

We can see the result of this by casting it to a `Curve`:


```python
hv.Curve(transformed)
```

This allows almost any mathematical transformation to be expressed and applied on a `Dataset` in a deferred way. The regular `dim` expression supports all the  standard mathematical operators and NumPy array methods. However if we want to use methods which exist only on specific datatypes we can invoke them using `.df` or `.xr`, which let you make (pandas) dataframe and xarray API (method and accessor) calls respectively. Let us for example load an XArray Dataset, which has a number of custom methods to perform complex computations on the data, e.g. the quantile method:


```python
import xarray as xr

air_temp = xr.tutorial.load_dataset('air_temperature')
print(air_temp.quantile.__doc__)
```

We can construct an expression to apply this method on the data and compute the 95th percentile of air temperatures along the 'time' dimension:


```python
quantile_expr = hv.dim('air').xr.quantile(0.95, dim='time')
quantile_expr
```

Now we can apply this to the `Dataset` using the `transform` method, in the resulting dataset we can see that the time dimension has been dropped:


```python
temp_ds = hv.Dataset(air_temp, ['lon', 'lat', 'time'])

transformed_ds = temp_ds.transform(air=quantile_expr)

transformed_ds
```

To visualize this data we will cast it to an `Image`:


```python
hv.Image(transformed_ds)
```

The real power of `dim` transforms comes in when combining them with parameters. We will look at this in more detail later as part of the [Pipeline user guide](14-Data_Pipelines.ipynb) but let's quickly see what this looks like. We will create a [Panel](https://panel.holoviz.org) slider to control the `q` value in the call to the `quantile` method:


```python
import panel as pn

q = pn.widgets.FloatSlider(name='quantile')

quantile_expr = hv.dim('air').xr.quantile(q, dim='time')
quantile_expr
```

Now that we have expressed this dynamic `dim` transform let us apply it using `.apply.transform`:


```python
temp_ds = hv.Dataset(air_temp, ['lon', 'lat'])
transformed = temp_ds.apply.transform(air=quantile_expr).apply(hv.Image)

pn.Column(q, transformed.opts(colorbar=True, width=400))
```

`dim` expressions provide a very powerful way to apply transforms on your data either statically or controlled by some external parameter, e.g. one driven by a Panel widget.

## Operations are parameterized

In cases a simple transform is not sufficient or you want to encapsulate some transformation in a more rigorous way an `Operation` allows encapsulating the parameters of a transform on a function-like object. Operations in HoloViews are subclasses of ``Operation``, which transform one Element or ``Overlay`` of Elements by returning a new Element that may be a transformation of the original. All operations are parameterized using the [param](https://github.com/holoviz/param) library which allows easy validation and documentation of the operation arguments. In particular, operations are instances of ``param.ParameterizedFunction`` which allows operations to be used in the same way as normal python functions.

This approach has several advantages, one of which is that we can manipulate the parameters of operations at several different levels: at the class-level, at the instance-level or when it is called. Another advantage is that using parameterizing operations allows them to be inspected just like any other HoloViews object using ``hv.help``. We will now do this for the ``histogram`` operation:


```python
from holoviews.operation import histogram
hv.help(histogram)
```

## Applying operations

Above we can see a listing of all the parameters of the operation, with the defaults, the expected types and detailed docstrings for each one. The ``histogram`` operation can be applied to any Element and will by default generate a histogram for the first value dimension defined on the object it is applied to. As a simple example we can create an ``BoxWhisker`` Element containing samples from a normal distribution, and then apply the ``histogram`` operation to those samples in two ways: 1) by creating an instance on which we will change the ``num_bins`` and 2) by passing ``bin_range`` directly when calling the operation:


```python
boxw = hv.BoxWhisker(np.random.randn(10000))
histop_instance = histogram.instance(num_bins=50)

boxw + histop_instance(boxw).relabel('num_bins=50') + histogram(boxw, bin_range=(0, 3)).relabel('bin_range=(0, 3)')
```

We can see that these two ways of using operations gives us convenient control over how the parameters are applied. An instance allows us to persist some defaults which will be used in all subsequent calls, while passing keyword arguments to the operations applies the parameters for just that particular call.

The third way to manipulate parameters is to set them at the class level. If we always want to use ``num_bins=30`` instead of the default of ``num_bins=20`` shown in the help output above, we can simply set ``histogram.num_bins=30``. 

## Operations on containers

``Operations`` in HoloViews are applied to individual elements, which means that when you apply an operation to a container object (such as ``NdLayout``, ``GridSpace`` and ``HoloMap``) the operation is applied once per element. For an operation to work, all the elements must be of the same type which means the operation effectively acts to map the operation over all the contained elements. As a simple example we can define a HoloMap of ``BoxWhisker`` Elements by varying the width of the distribution via the ``Sigma`` value and then apply the histogram operation to it:


```python
holomap = hv.HoloMap({(i*0.1+0.1): hv.BoxWhisker(np.random.randn(10000)*(i*0.1+0.1)) for i in range(5)},
                     kdims='Sigma')
holomap + histogram(holomap)
```

As you can see the operation has generated a ``Histogram`` for each value of ``Sigma`` in the ``HoloMap``. In this way we can apply the operation to the entire parameter space defined by a ``HoloMap``, ``GridSpace``, and ``NdLayout``.

## Combining operations

Since operations take a HoloViews object as input and return another HoloViews object we can very easily chain and combine multiple operations to perform complex analyses quickly and easily, while instantly visualizing the output.

In this example we'll work with operations on timeseries. We first define a small function to generate a random, noisy timeseries:


```python
from holoviews.operation import timeseries

def time_series(T = 1, N = 100, mu = 0.1, sigma = 0.1, S0 = 20):  
    """Parameterized noisy time series"""
    dt = float(T)/N
    t = np.linspace(0, T, N)
    W = np.random.standard_normal(size = N) 
    W = np.cumsum(W)*np.sqrt(dt)       # standard brownian motion
    X = (mu-0.5*sigma**2)*t + sigma*W 
    S = S0*np.exp(X)                   # geometric brownian motion
    return S

curve = hv.Curve(time_series(N=1000)).opts(width=600)
```

Now we will start applying some operations to this data. HoloViews ships with two ready-to-use timeseries operations: the ``rolling`` operation, which applies a function over a rolling window, and a ``rolling_outlier_std`` operation that computes outlier points in a timeseries by excluding points less than ``sigma`` standard deviation removed from the rolling mean:


```python
smoothed = curve * timeseries.rolling(curve) * timeseries.rolling_outlier_std(curve)
smoothed.opts(opts.Scatter(color='black'))
```

In the next section we will define a custom operation that will compose with the ``smoothed`` operation output above to form a short operation pipeline.

## Defining custom operations

We can now define our own custom ``Operation`` which as you may recall can process either elements and overlays. This means we can define a simple operation that takes our ``smoothed`` overlay and computes the difference between the raw and smoothed curves that it contains. Such a subtraction will give us the residual between the smoothed and unsmoothed ``Curve`` elements, removing long-term trends and leaving the short-term variation.

Defining an operation is very simple. An ``Operation`` subclass should define a ``_process`` method, which simply accepts an ``element`` argument. Optionally we can also define parameters on the operation, which we can access using the ``self.p`` attribute on the operation. In this case we define a ``String`` parameter, which specifies the name of the subtracted value dimension on the returned Element.


```python
from holoviews.operation import Operation

class residual(Operation):
    """
    Subtracts two curves from one another.
    """
    
    label = param.String(default='Residual', doc="""
        Defines the label of the returned Element.""")
    
    def _process(self, element, key=None):
        # Get first and second Element in overlay
        el1, el2 = element.get(0), element.get(1)
        
        # Get x-values and y-values of curves
        xvals  = el1.dimension_values(0)
        yvals  = el1.dimension_values(1)
        yvals2 = el2.dimension_values(1)
        
        # Return new Element with subtracted y-values
        # and new label
        return el1.clone((xvals, yvals-yvals2),
                         vdims=self.p.label)
```

Having defined the residual operation let's try it out right away by applying it to our original and smoothed ``Curve``. We'll place the two objects on top of each other so they can share an x-axis and we can compare them directly:


```python
(smoothed + residual(smoothed).opts(xaxis=None)).cols(1)
```

In this view we can immediately see that only a very small residual is left when applying this level of smoothing. However we have only tried one particular ``rolling_window`` value, the default value of ``10``. To assess how this parameter affects the residual we can evaluate the operation over a number different parameter settings, as we will see in the next section.

## Evaluating operation parameters

When applying an operation there are often parameters to vary. Using traditional plotting approaches it's often difficult to evaluate them interactively to get a detailed understanding of what they do. Here we will apply the ``rolling`` operations with varying ``rolling_window`` widths and ``window_type``s across a ``HoloMap``:


```python
rolled = hv.HoloMap({(w, str(wt)): timeseries.rolling(curve, rolling_window=w, window_type=wt)
                     for w in [10, 25, 50, 100, 200] for wt in [None, 'hamming', 'triang']},
                    kdims=['Window', 'Window Type'])
rolled
```

This visualization is already useful since we can compare the effect of various parameter values by moving the slider and trying different window options. However since we can also chain operations we can easily compute the residual and view the two together. 

To do this we simply overlay the ``HoloMap`` of smoothed curves on top of the original curve and pass it to our new ``residual`` function. Then we can combine the smoothed view with the original and see how the smoothing and residual curves vary across parameter values:


```python
(curve * rolled + residual(curve * rolled)).cols(1)
```

Using a few additional lines we have now evaluated the operation over a number of different parameters values, allowing us to process the data with different smoothing parameters. In addition, by interacting with this visualization we can gain a better understanding of the operation parameters as well as gain insights into the structure of the underlying data.

## Operations on 2D elements

Let's look at another example of operations in action, this time applying a simple filter to an `Image`. The basic idea is the same as above, although accessing the values to be transformed is a bit more complicated. First, we prepare an example image:


```python
hv.output(backend='matplotlib', size=200)

from scipy.misc import ascent

stairs_image = hv.Image(ascent()[200:500, :], bounds=[0, 0, ascent().shape[1], 300], label="stairs")
stairs_image
```

We'll define a simple ``Operation``, which takes an ``Image`` and applies a high-pass or low-pass filter. We then use this to build a ``HoloMap`` of images filtered with different sigma values:


```python
from scipy import ndimage

class image_filter(hv.Operation):
    
    sigma = param.Number(default=5)
    
    type_ = param.String(default="low-pass")

    def _process(self, element, key=None):
        xs = element.dimension_values(0, expanded=False)
        ys = element.dimension_values(1, expanded=False)
        
        # setting flat=False will preserve the matrix shape
        data = element.dimension_values(2, flat=False)
        
        if self.p.type_ == "high-pass":
            new_data = data - ndimage.gaussian_filter(data, self.p.sigma)
        else:
            new_data = ndimage.gaussian_filter(data, self.p.sigma)
        
        label = element.label + " ({} filtered)".format(self.p.type_)
        # make an exact copy of the element with all settings, just with different data and label:
        element = element.clone((xs, ys, new_data), label=label)
        return element

stairs_map = hv.HoloMap({sigma: image_filter(stairs_image, sigma=sigma)
                         for sigma in range(0, 12, 1)}, kdims="sigma")

stairs_map.opts(framewise=True)
```

Just as in the previous example, it is quite straight-forward to build a HoloMap containing results for different parameter values. Inside the ``_process()`` method, the given parameters can be accessed as ``self.p.<parameter-name>`` (note that ``self.<parameter_name>`` always contains the default value!). Since we did not specify the ``type_`` parameter, it defaulted to "low-pass".

There are some peculiarities when applying operations to two-dimensional elements:

- Understanding the ``dimension_values()`` method: In principle, one could use ``element.data`` to access the element's data, however, since HoloViews can wrap a wide range of data formats, ``dimension_values()`` provides an API that lets you access the data without having to worry about the type of the data. The first parameter specifies the dimension to be returned. On a 2D element like an Image or Raster the first two dimensions reference the key dimensions, so passing an index of 0 or 1 will return the x- and y-axis values respectively. Any subsequent dimensions will be value dimensions, e.g. on an Image index value 2 will refer to the intensity values and on an RGB index values 2, 3, and 4 will return the Red, Green and Blue intensities instead. Setting ``expanded=False`` yields only the axis, while the default setting ``expanded=True`` returns a value for every pixel. Specifying ``flat=False`` means that the data's matrix shape will be preserved, which is what we need for this kind of filter.
- ``Image`` and related classes come with convenient methods to convert between matrix indices and data coordinates and vice versa: ``matrix2sheet()`` and ``sheet2matrix()``. This is useful when searching for features such as peaks.

A very powerful aspect of operations is the fact that they understand Holoviews data structures. This means it is very straight-forward to apply an operation to every element in a container. As an example, let's apply an additional high-pass filter to our HoloMap:


```python
image_filter(stairs_map, type_="high-pass").opts(framewise=True)
```

Note, that the sigma value for the high-pass filter has defaulted to 5, and the sigma value in the HoloMap still corresponds to the original low-pass filter.


## Benefits of using ``Operation``

Now that we have seen some operations in action we can get some appreciation of what makes them useful. When working with data interactively we often end up applying a lot of ad-hoc data transforms, which provides maximum flexibility but is neither reproducible nor maintainable. Operations allow us to encapsulate analysis code using a well defined interface that is well suited for building complex analysis pipelines:

1. ``Operation`` parameters are well defined by declaring parameters on the class. These parameters can be easily documented and automatically carry out validation on the types and ranges of the inputs. These parameters are documented using ``hv.help``.

2. Both inputs and outputs of an operation are instantly visualizable, because the data **is** the visualization. This means you're not constantly context switching between data processing and visualization --- visualization comes for free as you build your data processing pipeline.

3. Operations understand HoloViews datastructures and can be immediately applied to any appropriate collection of elements, allowing you to evaluate the operation with permutations of parameter values. This flexibility makes it easy to assess the effect of operation parameters and their effect on your data.

4. As we will discover in the [Data processing pipelines](./14-Data_Pipelines.ipynb) guide, operations can be applied lazily to build up complex deferred data-processing pipelines, which can aid your data exploration and drive interactive visualizations and dashboards.

## Other types of operation

As we have seen ``Operation`` is defined at the level of processing HoloViews elements or overlays of elements. In some situations, you may want to compute a new HoloViews datastructure from a number of elements contained in a structure other than an overlay, such as a HoloMap or a Layout. 

One such pattern is an operation that accepts and returns a ``HoloMap`` where each of the output element depends on all the data in the input ``HoloMap``. For situations such as these, subclassing ``Operation`` is not appropriate and we recommend defining your own function. These custom operation types won't automatically gain support for lazy pipelines as described in the [Data processing pipelines](./14-Data_Pipelines.ipynb) guide and how these custom operations are pipelined is left as a design decision for the user. Note that as long as these functions return simple elements or containers, their output can be used by subclasses of ``Operation`` as normal. 

What we *do* recommend is that you subclass from ``param.ParameterizedFunction`` so that you can declare well-documented and validated parameters, add a description of your operation with a class level docstring and gain automatic documentation support via ``hv.help``.