File size: 3,260 Bytes
b9a0f21
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
In addition to the plots available via the plot interface, hvPlot makes a number of more sophisticated, statistical plots available that are modelled on ``pandas.plotting``. To explore these, we will load the iris and stocks datasets from Bokeh:


```python
import pandas as pd
import hvplot.pandas  # noqa

from bokeh.sampledata import iris, stocks 

iris = iris.flowers
```

### Scatter Matrix

When working with multi-dimensional data, it is often difficult to understand the relationship between all the different variables. A ``scatter_matrix`` makes it possible to visualize all of the pairwise relationships in a compact format. ``hvplot.scatter_matrix`` is closely modelled on ``pandas.plotting.scatter_matrix``:


```python
hvplot.scatter_matrix(iris, c="species")
```

Compared to a static Seaborn/Matplotlib-based plot, here it is easy to explore the data interactively thanks to Bokeh's linked zooming, linked panning, and linked brushing (using the ``box_select`` and ``lasso_select`` tools).

### Parallel Coordinates

Parallel coordinate plots provide another way of visualizing multi-variate data. ``hvplot.parallel_coordinates`` provides a simple API to create such a plot, modelled on the API of `pandas.plotting.parallel_coordinates()`:


```python
hvplot.parallel_coordinates(iris, "species")
```

The plot quickly clarifies the relationship between different variables, highlighting the difference of the "setosa" species in the petal width and length dimensions.

### Andrews Curves


Another similar approach is to visualize the dimensions using Andrews curves, which are constructed by generating a Fourier series from the features of each observation, visualizing the aggregate differences between classes. The ``hvplot.andrews_curves()`` function provides a simple API to generate Andrews curves from a datafrom, closely matching the API of ``pandas.plotting.andrews_curves()``:


```python
hvplot.andrews_curves(iris, "species")
```

Once again we can see the significant difference of the setosa species. However, unlike the parallel coordinate plot, the Andrews plot does not give any real quantitative insight into the features that drive those differences.

### Lag Plot

Lastly, for the analysis of time series hvplot offers a so called lag plot, implemented by the ``hvplot.lag_plot()`` function, modelled on the matching ``pandas.plotting.lag_plot()`` function.

As an example we will compare the closing stock prices of Apple and IBM from 2000-2013 using a lag of 365 days:


```python
index = pd.DatetimeIndex(stocks.AAPL['date'])
stock_df = pd.DataFrame({'IBM': stocks.IBM['close'], 'AAPL': stocks.AAPL['close']}, index=index)

hvplot.lag_plot(stock_df, lag=365, alpha=0.3)
```

Using this plot it becomes apparent that Apple was significantly more volatile over the analyzed time scale. In other words, its price at a particular point in time sometimes differed significantly from the price 365 days in the past. This also becomes visible in a simple line chart of the same data:


```python
stock_df.hvplot.line()
```

These plot types can help you make sense of complex datasets.  See [holoviews.org](https://holoviews.org) for many other plots and tools that can be used alongside those from hvPlot for other purposes.