ahuang11's picture
Upload 52 files
b9a0f21 verified
|
raw
history blame
No virus
5.17 kB

hvPlot provides one API to explore data of many different types. Previous sections have exclusively worked with tabular data stored in pandas (or pandas-like) DataFrames. The other most common type of data are n-dimensional arrays. hvPlot aims to eventually support different array libraries but for now focuses on xarray. XArray provides a convenient and very powerful wrapper to label the axis and coordinates of multi-dimensional (n-D) arrays. This user guide will cover how to leverage xarray and hvplot to visualize and explore data of different dimensionality ranging from simple 1D data, to 2D image-like data, to multi-dimensional cubes of data.

For these examples we’ll use the North American air temperature dataset:

import xarray as xr
import hvplot.xarray  # noqa

air_ds = xr.tutorial.open_dataset('air_temperature').load()
air = air_ds.air
air_ds

1D Plots

Selecting the data at a particular lat/lon coordinate we get a 1D dataset of air temperatures over time:

air1d = air.sel(lat=40, lon=285)
air1d.hvplot()

Notice how the axes are already appropriately labeled, because xarray stores the metadata required. We can also further subselect the data and use * to overlay plots:

air1d_sel = air1d.sel(time='2013-01')
air1d_sel.hvplot(color='purple') * air1d_sel.hvplot.scatter(marker='o', color='blue', size=15)
air.lat

Selecting multiple

If we select multiple coordinates along one axis and plot a chart type, the data will automatically be split by the coordinate:

air.sel(lat=[20, 40, 60], lon=285).hvplot.line()

To plot a different relationship we can explicitly request to display the latitude along the y-axis and use the by keyword to color each longitude (or 'lon') differently (note that this differs from the hue keyword xarray uses):

air.sel(time='2013-02-01 00:00', lon=[280, 285]).hvplot.line(y='lat', by='lon', legend='top_right')

2D Plots

By default the DataArray.hvplot() method generates an image if the data is two-dimensional.

air2d = air.sel(time='2013-06-01 12:00')
air2d.hvplot(width=400)

Alternatively we can also plot the same data using the contour and contourf methods, which provide a levels argument to control the number of iso-contours to draw:

air2d.hvplot.contour(width=400, levels=20) + air2d.hvplot.contourf(width=400, levels=8)

n-D Plots

If the data has more than two dimensions it will default to a histogram without providing it further hints:

air.hvplot()

However we can tell it to apply a groupby along a particular dimension, allowing us to explore the data as images along that dimension with a slider:

air.hvplot(groupby='time', width=500)

By default, for numeric types you'll get a slider and for non-numeric types you'll get a selector. Use widget_type and widget_location to control the look of the widget. To learn more about customizing widget behavior see Widgets.

air.hvplot(groupby='time', width=600, widget_type='scrubber', widget_location='bottom')

If we pick a different, lower dimensional plot type (such as a 'line') it will automatically apply a groupby over the remaining dimensions:

air.hvplot.line(width=600)

Statistical plots

Statistical plots such as histograms, kernel-density estimates, or violin and box-whisker plots aggregate the data across one or more of the coordinate dimensions. For instance, plotting a KDE provides a summary of all the air temperature values but we can, once again, use the by keyword to view each selected latitude (or 'lat') separately:

air.sel(lat=[25, 50, 75]).hvplot.kde('air', by='lat', alpha=0.5)

Using the by keyword we can break down the distribution of the air temperature across one or more variables:

air.hvplot.violin('air', by='lat', color='lat', cmap='Category20')

Rasterizing

If you are plotting a large amount of data at once, you can consider using the hvPlot interface to Datashader, which can be enabled simply by setting rasterize=True.

Note that by declaring that the data should not be grouped by another coordinate variable, i.e. by setting groupby=[], we can plot all the datapoints, showing us the spread of air temperatures in the dataset:

air.hvplot.scatter('time', groupby=[], rasterize=True) *\
air.mean(['lat', 'lon']).hvplot.line('time', color='indianred')

Here we also overlaid a non-datashaded line plot of the average temperature at each time. If you enable the appropriate hover tool, the overlaid data supports hovering and zooming even in a static export such as on a web server or in an email, while the raw-data plot has been aggregated spatially before it is sent to the browser, and thus it has only the fixed spatial binning available at that time. If you have a live Python process, the raw data will be aggregated each time you pan or zoom, letting you see the entire dataset regardless of size.