File size: 12,386 Bytes
b9a0f21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
# Indexing and Selecting data

As explained in the [Building composite objects](./06-Building_Composite_Objects.ipynb) and [Dimensioned Containers](./05-Dimensioned_Containers.ipynb) guides, HoloViews allows building up hierarchical containers that express the natural relationships between data items, in whatever multidimensional space best characterizes the application domain.  Once your data is in such containers, individual visualizations are then made by choosing subregions of this multidimensional space, either smaller numeric ranges (as in cropping of photographic images), or lower-dimensional subsets (as in selecting frames from a movie, or a specific movie from a large library), or both (as in selecting a cropped version of a frame from a specific movie from a large library).  

In this user guide, we show how to specify such selections, using five different (but related) operations that can act on an element ``e``:

| Operation      | Example syntax   |  Description |
|:---------------|:----------------:|:-------------|
| **indexing**   | e[5.5], e[3,5.5] | Selecting a single data value, returning one actual numerical value from the existing data
| **slice**      | e[3:5.5], e[3:5.5,0:1] | Selecting a contiguous portion from an Element, returning the same type of Element
| **sample**     | e.sample(y=5.5),<br>e.sample((3,3)) |  Selecting one or more regularly spaced data values, returning a new type of Element
| **select**     | e.select(y=5.5),<br>e.select(y=(3,5.5)) | More verbose notation covering all supported slice and index operations by dimension name.
| **iloc**       | e[2, :],<br>e[2:5, :]  | Indexes and slices by row and column tabular index supporting integer indexes, slices, lists and boolean indices.

These operations are all concerned with selecting some subset of the data values, without combining across data values (e.g. averaging) or otherwise transforming the actual data. In the [Tabular Data](./08-Tabular_Datasets.ipynb) user guide we will look at additional operations on the data that reduce, summarize, or transform the data in other ways, in addition to the selections covered here.

We'll be going through each operation in detail and provide a visual illustration to help make the semantics of each operation clear. This user guide assumes that you are familiar with continuous and discrete coordinate systems, so please review our [Continuous Coordinates](Continuous_Coordinates.ipynb) guide if you have not done so already.


```python
import numpy as np
import holoviews as hv
from holoviews import opts

hv.extension('bokeh', 'matplotlib')

opts.defaults(
    opts.Bounds(line_width=2, color='red', axiswise=True),
    opts.Image(cmap='Blues'),
    opts.Points(size=8, padding=0.1),
    opts.Text(text_font_size='16pt'), opts.Scatter(size=5))
```

# Indexing and slicing Elements

In the [Dimensioned Containers](./05-Dimensioned_Containers.ipynb) guide we saw examples of how to select individual elements embedded in a multi-dimensional space.  The [Continuous Coordinates](Continuous_Coordinates.ipynb) user guide covered slicing and indexing in Elements representing continuous coordinate coordinate systems such as ``Image`` types. Here we'll be going through each operation in full detail, providing a visual illustration to help make the semantics of each operation clear.

How the ``Element`` may be indexed depends on the key dimensions (or ``kdims``) of the ``Element``. It is thus important to consider the nature and dimensionality of your data when choosing the ``Element`` type for it.

## 1D Elements: Slicing and indexing

Certain Chart elements support both single-dimensional indexing and slicing: ``Scatter``, ``Curve``, ``Histogram``, and ``ErrorBars``. Here we'll look at how we can easily slice a ``Histogram`` to select a subregion of it:


```python
np.random.seed(42)
edges, data = np.histogram(np.random.randn(100))
hist = hv.Histogram((edges, data))
subregion = hist[0:1]
hist * subregion
```

The two bins in a different color show the selected region, overlaid on top of the full histogram.  We can also access the value for a specific bin in the ``Histogram``. A continuous-valued index that falls inside a particular bin will return the corresponding value or frequency.


```python
hist[0.25], hist[0.5], hist[0.55]
```

We can slice a ``Curve`` the same way:


```python
xs = np.linspace(0, np.pi*2, 21)
curve = hv.Curve((xs, np.sin(xs)))
subregion = curve[np.pi/2:np.pi*1.5]
curve * subregion * hv.Scatter(curve)
```

Here again the region in a different color is the specified subregion. We've also marked each discrete point with a dot using the ``Scatter`` ``Element``.  As before we can also get the value for a specific sample point; whatever x-index is provided will snap to the closest sample point and return the dependent value:


```python
curve[4.05], curve[4.1], curve[4.17], curve[4.3]
```

It is important to note that an index (or a list of indices, as for the 2D and 3D cases below) will always return the raw indexed (dependent) value, i.e. a number.  A slice (indicated with `:`), on the other hand, will retain the Element type even in cases where the plot might not be useful, such as having only a single value, two values, or no value at all in that range:


```python
curve[4:4.5]
```

## 2D and 3D Elements: slicing

For data defined in a 2D space, there are 2D equivalents of the 1D ``Curve`` and ``Scatter`` types. ``Points``, for example, can be thought of as a number of points in a 2D space.


```python
r = np.arange(0, 1, 0.005)
xs, ys = (r * fn(85*np.pi*r) for fn in (np.cos, np.sin))
paths = hv.Points((xs, ys))
paths + paths[0:1, 0:1]
```

However, indexing is not supported in this space, because there could be many possible points near a given set of coordinates, and finding the nearest one would require a search across potentially incommensurable dimensions, which is poorly defined and difficult to support.

Slicing in 3D works much like slicing in 2D, but indexing is not supported for the same reason as in 2D:


```python
xs = np.linspace(0, np.pi*8, 201)
scatter = hv.Scatter3D((xs, np.sin(xs), np.cos(xs)))
layout = scatter + scatter[5:10, :, 0:]
hv.output(layout, backend='matplotlib')
```

## 2D Raster and Image: slicing and indexing

Raster and the various other image-like objects (Images, RGB, HSV, etc.) can all be sliced and indexed, as can Surface, because they all have an underlying regular grid of key dimension values:


```python
np.random.seed(0)
extents = (0, 0, 10, 10)
img = hv.Image(np.random.rand(10, 10), bounds=extents)
img_slice = img[1:9,4:5]
box = hv.Bounds((1,4,9,5))
img*box + img_slice
```


```python
img[4.2,4.2], img[4.3,4.2], img[5.0,4.2]
```

# Tabular indexing and slicing

While most indexing in HoloViews works by selecting the values along a dimension it is also frequently useful to index and slice using integer row and column indices. For this purpose most HoloViews objects have a ``.iloc`` indexing interface (mirroring the [pandas](http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing) API), which supports all the usual indexing semantics. Supported iloc arguments include:

* An integer e.g. 5

* A list or array of integers [4, 3, 0]

* A slice object with ints 1:7

* A boolean array

#### Indexing

In this way we can for example select the x- and y-values in the 8th row of our ``Curve``:


```python
xs = np.linspace(0, np.pi*2, 21)
curve = hv.Curve((xs, np.sin(xs)))
print('x: %s, y: %s' % (curve.iloc[8, 0], curve.iloc[8, 1]))
curve * hv.Scatter(curve.iloc[8])
```

#### Slicing

Alternatively we can select every second sample between indices 5 and 16 of a ``Curve``:


```python
curve + curve.iloc[5:16:2]
```

#### Lists of integers and boolean indices

Finally we may also pass a list of the integer samples to select, or use boolean indices. This mode of indexing can be very useful for randomly sampling an Element or picking a specific set of rows or (columns):


```python
curve.iloc[[0, 5, 10, 15, 20]] + curve.iloc[xs>3]
```

# Sampling

Sampling is essentially a process of indexing an Element at multiple index locations, and collecting the results.  Thus any Element that can be indexed can also be sampled.  Compared to regular indexing, sampling is different in that  multiple indices may be supplied at the same time.  Also, indexing will only return the value at that location, whereas the return type from a sampling operation is another ``Element`` type, usually either a ``Table`` or a ``Curve``, to allow both key and value dimensions to be returned.

### Sampling Elements

Sampling can use either an explicit list of indexes, or pass an index value for each dimension keyword argument.

We'll start by taking a single sample of an Image object, to make clear how sampling and indexing are similar operations yet different in their results:


```python
img_coords = hv.Points(img, extents=extents)
labeled_img = img * img_coords * hv.Points([img.closest([(4.1,4.3)])]).opts(color='r')
img + labeled_img + img.sample([(4.1,4.3)])
```


```python
img[4.1,4.3]
```

Here, the output of the indexing operation is the value (0.20887675609483469) from the location closest to the specified indexes, whereas ``.sample()`` returns a Table that lists both the coordinates *and* the value, and slicing (in previous section) returns an Element of the same type, not a Table.


Next we can try sampling along only one Dimension on our 2D Image, leaving us with a 1D Element (in this case a ``Curve``):


```python
sampled = img.sample(y=5)
labeled_img = img * img_coords * hv.Points(zip(sampled['x'], [img.closest(y=5)]*10))
img + labeled_img + sampled
```

Sampling works on any regularly sampled Element type.  For example, we can select multiple samples along the x-axis of a Curve.


```python
xs = np.arange(10)
samples = [2, 4, 6, 8]
curve = hv.Curve(zip(xs, np.sin(xs)))
curve_samples = hv.Scatter(zip(xs, [0] * 10)) * hv.Scatter(zip(samples, [0]*len(samples))) 
curve + curve_samples + curve.sample(samples)
```

### Sampling HoloMaps

Sampling is often useful when you have more data than you wish to visualize or analyze at one time. First, let's create a HoloMap containing a number of observations of some noisy data.


```python
obs_hmap = hv.HoloMap({i: hv.Image(np.random.randn(10, 10), bounds=extents)
                       for i in range(3)}, kdims='Observation')
```

A `HoloMap` may not be sampled directly, instead we can use the `.apply` method to sample each element in the HoloMap and consequently use the `.collapse` method to produce a single `Dataset`. In this case we'll take 3x3 subsamples of each of the Images:


```python
hv.output(backend='matplotlib', size=120)

sample_style = dict(edgecolors='k', alpha=1)
all_samples = obs_hmap.collapse().to.scatter3d().opts(alpha=0.15, xticks=4)
sampled = obs_hmap.apply.sample((3,3)).collapse()
subsamples = sampled.to.scatter3d().opts(**sample_style)
all_samples * subsamples + hv.Table(sampled)
```

By supplying bounds in as a (left, bottom, right, top) tuple we can also sample a subregion of our images:


```python
sampled = obs_hmap.apply.sample((3,3), bounds=(2,5,5,10)).collapse()
subsamples = sampled.to.scatter3d().opts(xticks=4, **sample_style)
all_samples * subsamples + hv.Table(sampled)
```

Since this kind of sampling is only well supported for continuous coordinate systems, we can only apply this kind of sampling to Image types for now.

### Sampling Charts

Sampling Chart-type Elements like Curve, Scatter, Histogram is only supported by providing an explicit list of samples, since those Elements have no underlying regular grid.


```python
hv.output(backend='bokeh')

xs = np.arange(10)
extents = (0, 0, 2, 10)
curve = hv.HoloMap({(i) : hv.Curve(zip(xs, np.sin(xs)*i))
                    for i in np.linspace(0.5, 1.5, 3)},
                   kdims='Observation')
all_samples = curve.collapse().to.points()
sampled = curve.apply.sample([0, 2, 4, 6, 8]).collapse()
sample_points = sampled.to.points(extents=extents)
sampling = all_samples * sample_points.opts(color='red')
sampling + hv.Table(sampled)
```

These tools should help you index, slice, sample, and select your data with ease.  The [Tabular Data](./07-Tabular_Data.ipynb) guide explains how to do other types of operations, such as averaging and other reduction operations.