year-vs-climatology / hvplot_docs /14-Data_Pipelines.md
ahuang11's picture
Upload 52 files
b9a0f21 verified
|
raw
history blame
7.17 kB
# Data Processing Pipelines
```python
import pandas as pd
import holoviews as hv
from holoviews import opts
from bokeh.sampledata import stocks
from holoviews.operation.timeseries import rolling, rolling_outlier_std
hv.extension('bokeh')
opts.defaults(opts.Curve(width=600, framewise=True))
```
In the previous guides we discovered how to load and declare [dynamic, live data](./07-Live_Data.ipynb) and how to [transform elements](./11-Transforming_Elements.ipynb) using `dim` expressions and operations. In this guide we will discover how to combine dynamic data with operations to declare lazy and declarative data processing pipelines, which can be used for interactive exploration but can also drive complex dashboards or even bokeh apps.
## Declaring dynamic data
We will begin by declaring a function which loads some data. In this case we will just load some stock data from the bokeh but you could imagine querying this data using REST interface or some other API or even loading some large collection of data from disk or generating the data from some simulation or data processing job.
```python
def load_symbol(symbol, **kwargs):
df = pd.DataFrame(getattr(stocks, symbol))
df['date'] = df.date.astype('datetime64[ns]')
return hv.Curve(df, ('date', 'Date'), ('adj_close', 'Adjusted Close'))
stock_symbols = ['AAPL', 'FB', 'GOOG', 'IBM', 'MSFT']
dmap = hv.DynamicMap(load_symbol, kdims='Symbol').redim.values(Symbol=stock_symbols)
```
We begin by displaying our DynamicMap to see what we are dealing with. Recall that a ``DynamicMap`` is only evaluated when you request the key so the ``load_symbol`` function is only executed when first displaying the ``DynamicMap`` and whenever we change the widget dropdown:
```python
dmap
```
## Processing data
It is very common to want to process some data, for this purpose HoloViews provides so-called ``Operations``, which are described in detail in the [Transforming Elements](./11-Transforming_Elements.ipynb). ``Operations`` are simply parameterized functions, which take HoloViews objects as input, transform them in some way and then return the output.
In combination with [Dimensioned Containers](./05-Dimensioned_Containers.ipynb) such as ``HoloMap`` and ``GridSpace`` they are a powerful way to explore how the parameters of your transform affect the data. We will start with a simple example. HoloViews provides a ``rolling`` function which smoothes timeseries data with a rolling window. We will apply this operation with a ``rolling_window`` of 30, i.e. roughly a month of our daily timeseries data:
```python
smoothed = rolling(dmap, rolling_window=30)
smoothed
```
As you can see the ``rolling`` operation applies directly to our ``DynamicMap``, smoothing each ``Curve`` before it is displayed. Applying an operation to a ``DynamicMap`` keeps the data as a ``DynamicMap``, this means the operation is also applied lazily whenever we display or select a different symbol in the dropdown widget.
### Dynamically evaluating parameters on operations and transforms with ``.apply``
The ``.apply`` method allows us to automatically build a dynamic pipeline given an object and some operation or function along with parameter, stream or widget instances passed in as keyword arguments. Internally it will then build a `Stream` to ensure that whenever one of these changes the plot is updated. To learn more about streams see the [Responding to Events](./12-Responding_to_Events.ipynb).
This mechanism allows us to build powerful pipelines by linking parameters on a user defined class or even an external widget, e.g. here we import an ``IntSlider`` widget from [``panel``](https://pyviz.panel.org):
```python
import panel as pn
slider = pn.widgets.IntSlider(name='rolling_window', start=1, end=100, value=50)
```
Using the ``.apply`` method we could now apply the ``rolling`` operation to the DynamicMap and link the slider to the operation's ``rolling_window`` parameter (which also works for simple functions as will be shown below). However, to further demonstrate the features of `dim` expressions and the `.transform` method, which we first introduced in the [Transforming elements user guide](11-Transforming_Elements.ipynb), we will instead apply the rolling mean using the `.df` namespace accessor on a `dim` expression:
```python
rolled_dmap = dmap.apply.transform(adj_close=hv.dim('adj_close').df.rolling(slider).mean())
rolled_dmap
```
The ``rolled_dmap`` is another DynamicMap that defines a simple two-step pipeline, which calls the original callback when the ``symbol`` changes and reapplies the expression whenever the slider value changes. Since the widget's value is now linked to the plot via a ``Stream`` we can display the widget and watch the plot update:
```python
slider
```
The power of building pipelines is that different visual components can share the same inputs but compute very different things from that data. The part of the pipeline that is shared is only evaluated once making it easy to build efficient data processing code. To illustrate this we will also apply the ``rolling_outlier_std`` operation which computes outliers within the ``rolling_window`` and again we will supply the widget ``value``:
```python
outliers = dmap.apply(rolling_outlier_std, rolling_window=slider.param.value)
rolled_dmap * outliers.opts(color='red', marker='triangle')
```
We can chain operations like this indefinitely and attach parameters or explicit streams to each stage. By chaining we can watch our visualization update whenever we change a stream value anywhere in the pipeline and HoloViews will be smart about which parts of the pipeline are recomputed, which allows us to build complex visualizations very quickly.
The ``.apply`` method is also not limited to operations. We can just as easily apply a simple Python function to each object in the ``DynamicMap``. Here we define a function to compute the residual between the original ``dmap`` and the ``rolled_dmap``.
```python
def residual_fn(overlay):
# Get first and second Element in overlay
el1, el2 = overlay.get(0), overlay.get(1)
# Get x-values and y-values of curves
xvals = el1.dimension_values(0)
yvals = el1.dimension_values(1)
yvals2 = el2.dimension_values(1)
# Return new Element with subtracted y-values
# and new label
return el1.clone((xvals, yvals-yvals2),
vdims='Residual')
```
If we overlay the two DynamicMaps we can then dynamically broadcast this function to each of the overlays, producing a new DynamicMap which responds to both the symbol selector widget and the slider:
```python
residual = (dmap * rolled_dmap).apply(residual_fn)
residual
```
In later guides we will see how we can combine HoloViews plots and Panel widgets into custom layouts allowing us to define complex dashboards. For more information on how to deploy bokeh apps from HoloViews and build dashboards see the [Deploying Bokeh Apps](./Deploying_Bokeh_Apps.ipynb) and [Dashboards](./17-Dashboards.ipynb) guides. To get a quick idea of what this might look like let's compose all the components we have no built: