ahuang11's picture
Upload 52 files
b9a0f21 verified
|
raw
history blame
No virus
5.17 kB
hvPlot provides one API to explore data of many different types. Previous sections have exclusively worked with tabular data stored in pandas (or pandas-like) DataFrames. The other most common type of data are n-dimensional arrays. hvPlot aims to eventually support different array libraries but for now focuses on [xarray](https://xarray.pydata.org/en/stable/). XArray provides a convenient and very powerful wrapper to label the axis and coordinates of multi-dimensional (n-D) arrays. This user guide will cover how to leverage ``xarray`` and ``hvplot`` to visualize and explore data of different dimensionality ranging from simple 1D data, to 2D image-like data, to multi-dimensional cubes of data.
For these examples we’ll use the North American air temperature dataset:
```python
import xarray as xr
import hvplot.xarray # noqa
air_ds = xr.tutorial.open_dataset('air_temperature').load()
air = air_ds.air
air_ds
```
## 1D Plots
Selecting the data at a particular lat/lon coordinate we get a 1D dataset of air temperatures over time:
```python
air1d = air.sel(lat=40, lon=285)
air1d.hvplot()
```
Notice how the axes are already appropriately labeled, because xarray stores the metadata required. We can also further subselect the data and use `*` to overlay plots:
```python
air1d_sel = air1d.sel(time='2013-01')
air1d_sel.hvplot(color='purple') * air1d_sel.hvplot.scatter(marker='o', color='blue', size=15)
```
```python
air.lat
```
### Selecting multiple
If we select multiple coordinates along one axis and plot a chart type, the data will automatically be split by the coordinate:
```python
air.sel(lat=[20, 40, 60], lon=285).hvplot.line()
```
To plot a different relationship we can explicitly request to display the latitude along the y-axis and use the ``by`` keyword to color each longitude (or 'lon') differently (note that this differs from the ``hue`` keyword xarray uses):
```python
air.sel(time='2013-02-01 00:00', lon=[280, 285]).hvplot.line(y='lat', by='lon', legend='top_right')
```
## 2D Plots
By default the ``DataArray.hvplot()`` method generates an image if the data is two-dimensional.
```python
air2d = air.sel(time='2013-06-01 12:00')
air2d.hvplot(width=400)
```
Alternatively we can also plot the same data using the ``contour`` and ``contourf`` methods, which provide a ``levels`` argument to control the number of iso-contours to draw:
```python
air2d.hvplot.contour(width=400, levels=20) + air2d.hvplot.contourf(width=400, levels=8)
```
## n-D Plots
If the data has more than two dimensions it will default to a histogram without providing it further hints:
```python
air.hvplot()
```
However we can tell it to apply a ``groupby`` along a particular dimension, allowing us to explore the data as images along that dimension with a slider:
```python
air.hvplot(groupby='time', width=500)
```
By default, for numeric types you'll get a slider and for non-numeric types you'll get a selector. Use ``widget_type`` and ``widget_location`` to control the look of the widget. To learn more about customizing widget behavior see [Widgets](Widgets.ipynb).
```python
air.hvplot(groupby='time', width=600, widget_type='scrubber', widget_location='bottom')
```
If we pick a different, lower dimensional plot type (such as a 'line') it will automatically apply a groupby over the remaining dimensions:
```python
air.hvplot.line(width=600)
```
## Statistical plots
Statistical plots such as histograms, kernel-density estimates, or violin and box-whisker plots aggregate the data across one or more of the coordinate dimensions. For instance, plotting a KDE provides a summary of all the air temperature values but we can, once again, use the ``by`` keyword to view each selected latitude (or 'lat') separately:
```python
air.sel(lat=[25, 50, 75]).hvplot.kde('air', by='lat', alpha=0.5)
```
Using the ``by`` keyword we can break down the distribution of the air temperature across one or more variables:
```python
air.hvplot.violin('air', by='lat', color='lat', cmap='Category20')
```
## Rasterizing
If you are plotting a large amount of data at once, you can consider using the hvPlot interface to [Datashader](https://datashader.org), which can be enabled simply by setting `rasterize=True`.
Note that by declaring that the data should not be grouped by another coordinate variable, i.e. by setting `groupby=[]`, we can plot all the datapoints, showing us the spread of air temperatures in the dataset:
```python
air.hvplot.scatter('time', groupby=[], rasterize=True) *\
air.mean(['lat', 'lon']).hvplot.line('time', color='indianred')
```
Here we also overlaid a non-datashaded line plot of the average temperature at each time. If you enable the appropriate hover tool, the overlaid data supports hovering and zooming even in a static export such as on a web server or in an email, while the raw-data plot has been aggregated spatially before it is sent to the browser, and thus it has only the fixed spatial binning available at that time. If you have a live Python process, the raw data will be aggregated each time you pan or zoom, letting you see the entire dataset regardless of size.