year-vs-climatology / hvplot_docs /Geometry_Data.md
ahuang11's picture
Upload 52 files
b9a0f21 verified
|
raw
history blame
6.99 kB
In addition to the two main types of data, namely tabular/columnar and gridded data HoloViews also provide extensible interfaces to represent path geometry data. Specifically it has three main element types used to representing different types of geometries. In this section we will cover the HoloViews data model for representing different kinds of geometries.
There are many different ways of representing path geometries but HoloViews' data model is oriented on GEOS geometry definitions and allows faithfully round-tripping data between its element types and GEOS geometry definitions such as ``LinearString``, ``Polygon``, ``MultiLineString`` and ``MultiPolygon`` geometries (even if this is not implemented in HoloViews itself). HoloViews defines a dictionary based format for the geometries but also supports [spatialpandas](https://github.com/holoviz/spatialpandas), which is a highly optimized implementation similar to [geopandas](https://github.com/geopandas/geopandas/) but without the heavy geo-dependencies such as shapely and fiona. [GeoViews](https://geoviews.org/user_guide/Geometries.html) supports both geopandas and raw shapely geometries directly.
```python
import numpy as np
import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
```
## Representing paths
The ``Path`` element represents a collection of path geometries with optional associated values. Each path geometry may be split into sub-geometries on NaN-values and may be associated with scalar values or array values varying along its length. In analogy to GEOS geometry types a Path is a collection of LineString and MultiLineString geometries with associated values.
While other formats can be supported through extensible interfaces (e.g. geopandas and shapely objects in GeoViews), natively HoloViews provides support for representing paths as one or more columnar data-structures including arrays, dataframes and dictionaries of column arrays and scalars. A simple path geometry may therefore be drawn using:
```python
hv.Path({'x': [1, 2, 3, 4, 5], 'y': [0, 0, 1, 1, 2]}, ['x', 'y'])
```
Here the dictionary of x- and y-coordinates could also be an NumPy array with two columns or a dataframe with 'x' and 'y' columns.
To draw multiple paths the data-structures can be wrapped in a list. Additionally, it is also possible to associate a value with each path by declaring it as a value dimension:
```python
p = hv.Path([{'x': [1, 2, 3, 4, 5], 'y': [0, 0, 1, 1, 2], 'value': 0},
{'x': [5, 4, 3, 2, 1], 'y': [2, 2, 1, 1, 0], 'value': 1}], vdims='value').opts(color='value')
p
```
#### Multi-geometry
Splitting the geometries in this way allows assigning separate values to each geometry, however often multiple geometries share the same value in which case it may be desirable to represent them as a multi-geometry by combining the coordinates and separating them by a NaN value:
```python
hv.Path([{'x': [1, 2, 3, 4, 5, np.nan, 5, 4, 3, 2, 1],
'y': [0, 0, 1, 1, 2, np.nan, 2, 2, 1, 1, 0], 'value': 0}],
vdims='value').opts(color='value')
```
This represents a more efficient format particularly when there are very many small geometries with the same value.
#### Scalar vs. continuously varying value dimensions
Unlike ``Contours`` which are limited to representing iso-contours or isoclines, i.e. a function of two variables which describes a curve along which the function has a constant value, a ``Path`` element may also have continuously varying values along its path. Below we will declare a path with a value that varies along its path:
```python
a, b, delta = 3, 5, np.pi/2.
vs = np.linspace(0, np.pi*2, 200)
xs = np.sin(a * vs + delta)
ys = np.sin(b * vs)
hv.Path([{'x': xs, 'y': ys, 'value': vs}], vdims='value').opts(
color='value', cmap='hsv')
```
Note that since not all data formats allow storing scalar values as actual scalars, 1D-arrays matching the length of the coordinates but with only one unique value are also considered scalar. For example the following is a valid ``Contours`` element despite the fact that the value dimension is not a scalar variable:
```python
hv.Contours([{'x': xs, 'y': ys, 'value': np.ones(200)}], vdims='value').opts(color='value')
```
## Representing Polygons
The ``Polygons`` element represents a collection of polygon geometries with associated scalar values. Each polygon geometry may be split into sub-geometries on NaN-values and may be associated with scalar values. In analogy to GEOS geometry types a ``Polygons`` element is a collection of Polygon and MultiPolygon geometries. Polygon geometries are defined as a set of coordinates describing the exterior bounding ring and any number of interior holes.
In summary ``Polygons`` can be represented in much the same way as ``Paths`` above but have a special reserved key to store the polygon interiors or 'holes'. The holes are stored as a list-of-lists of arrays. This nested format is necessary to unambiguously associate holes with the sub-geometries in a multi-geometry. In the simplest case of a single Polygon geometry the format looks like this:
```python
xs = [1, 2, 3]
ys = [2, 0, 7]
holes = [[[(1.5, 2), (2, 3), (1.6, 1.6)], [(2.1, 4.5), (2.5, 5), (2.3, 3.5)]]]
hv.Polygons([{'x': xs, 'y': ys, 'holes': holes}])
```
The 'x' and 'y' coordinates represent the exterior of the Polygon and the list-of-list of holes defines two interior regions inside the polygon.
In a multi-Polygon arrangement where two Polygon geometries are separated by NaNs, the purpose of the nested format becomes a bit clearer. Here the polygon from above still has the two holes but the second polygon does not have any holes, which we declare with an empty list:
```python
xs = [1, 2, 3, np.nan, 6, 7, 3]
ys = [2, 0, 7, np.nan, 7, 5, 2]
holes = [
[[(1.5, 2), (2, 3), (1.6, 1.6)], [(2.1, 4.5), (2.5, 5), (2.3, 3.5)]],
[]
]
hv.Polygons([{'x': xs, 'y': ys, 'holes': holes}])
```
If a polygon has no holes at all the 'holes' key may be omitted entirely:
```python
hv.Polygons([{'x': xs, 'y': ys, 'holes': holes, 'value': 0},
{'x': [4, 6, 6], 'y': [0, 2, 1], 'value': 1},
{'x': [-3, -1, -6], 'y': [3, 2, 1], 'value': 3}], vdims='value')
```
## Accessing the data
To access the underlying data the geometry elements (``Path``/``Contours``/``Polygons``) implement a ``split`` method. By default it simply returns a list of elements, where each contains only one geometry:
```python
poly = hv.Polygons([
{'x': xs, 'y': ys, 'holes': holes, 'value': 0},
{'x': [4, 6, 6], 'y': [0, 2, 1], 'value': 1}
], vdims='value')
hv.Layout(poly.split())
```
Using the ``datatype`` argument the data may instead be returned in the desired format, e.g. 'dictionary', 'array' or 'dataframe'. Here we return the 'dictionary' format:
```python
poly.split(datatype='dictionary')
```
Note that this conversion may be lossy if the converted format has no way of representing 'holes' or other data.