home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 636512559

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
636512559 MDU6SXNzdWU2MzY1MTI1NTk= 4143 [Feature request] Masked operations 1200058 open 0     1 2020-06-10T20:04:45Z 2021-04-22T20:54:03Z   NONE      

Xarray already has unstack(sparse=True) which is quite awesome. However, in many cases it is costly to convert a very dense array (existing values >> missing values) to a sparse representation. Also, many calculations require to convert the sparse array back into dense array and to manually mask the missing values (e.g. Keras).

Logically, a sparse array is equal to a masked dense array. They only differ in their internal data representation. Therefore, I would propose to have a masked=True option for all operations that can create missing values. These cover (amongst others): - .unstack([...], masked=True) - .where(<multi-dimensional array>, masked=True) - .align([...], masked=True)

This would solve a number of problems: - No more conversion of int -> float - Explicit value for missingness - When stacking data with missing values, the missing values can be just dropped - When converting data with missing values to DataFrame, the missing values can be just dropped

MCVE Code Sample

An example would be outer joins with slightly different coordinates (taken from the documentation): ```python

x <xarray.DataArray (lat: 2, lon: 2)> array([[25, 35], [10, 24]]) Coordinates: * lat (lat) float64 35.0 40.0 * lon (lon) float64 100.0 120.0

y <xarray.DataArray (lat: 2, lon: 2)> array([[20, 5], [ 7, 13]]) Coordinates: * lat (lat) float64 35.0 42.0 * lon (lon) float64 100.0 120.0 ```

Non-masked outer join:

```python

a, b = xr.align(x, y, join="outer") a <xarray.DataArray (lat: 3, lon: 2)> array([[25., 35.], [10., 24.], [nan, nan]]) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 b <xarray.DataArray (lat: 3, lon: 2)> array([[20., 5.], [nan, nan], [ 7., 13.]]) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 ```

The masked version:

```python

a, b = xr.align(x, y, join="outer", masked=True) a <xarray.DataArray (lat: 3, lon: 2)> masked_array(data=[[25, 35], [10, 24], [--, --]], mask=[[False, False], [False, False], [True, True]], fill_value=0) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 b <xarray.DataArray (lat: 3, lon: 2)> masked_array(data=[[20, 5], [--, --], [7, 13]], mask=[[False, False], [True, True], [False, False]], fill_value=0) Coordinates: * lat (lat) float64 35.0 40.0 42.0 * lon (lon) float64 100.0 120.0 ```

Related issue: https://github.com/pydata/xarray/issues/3955

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4143/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.633ms · About: xarray-datasette