home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1087160635

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1087160635 I_kwDOAMm_X85AzME7 6103 reindex multidimensional fill_value skipping 1191149 closed 0     1 2021-12-22T20:14:00Z 2021-12-22T20:17:39Z 2021-12-22T20:17:38Z CONTRIBUTOR      

What happened:

I started with a Dataframe that represented an identity matrix and used reindex with multiple dimensions and a fill_value. The goal was to produce a Dataset from a sparse dataframe and the identity matrix was the simplest example. The fill_value in reindex is defined as "Value to use for newly missing values." What I found was that fill_values were not applied at coordinates that were in the unique set of the any coord. For a pure identity matrix, that means fill_values were not applied anywhere (all rows are present, all cols are present). When I thin the identity matrix (skipping elements), the error is more obvious. On the rows and columns that have valid input data, the fill_value is not applied.

What you expected to happen:

I expected all new nan values to be filled with the fill value.

Minimal Complete Verifiable Example:

``` import numpy as np import pandas as pd

n = 10 thin = 2

df = pd.DataFrame.from_dict([ dict(ROW=v, COL=v, LAND=1) for v in np.arange(0, n, thin) ]).set_index(['ROW', 'COL'])

ds = df.to_xarray() rds = ds.reindex(ROW=np.arange(n), COL=np.arange(n), fill_value=0)

p = rds.LAND.plot()

p.axes.set_facecolor('red')

p.axes.figure.savefig('test.png')

print(rds.LAND[:]) print(rds.LAND[::thin, ::thin]) ```

Output: <xarray.DataArray 'LAND' (ROW: 10, COL: 10)> array([[ 1., 0., nan, 0., nan, 0., nan, 0., nan, 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [nan, 0., 1., 0., nan, 0., nan, 0., nan, 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [nan, 0., nan, 0., 1., 0., nan, 0., nan, 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [nan, 0., nan, 0., nan, 0., 1., 0., nan, 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [nan, 0., nan, 0., nan, 0., nan, 0., 1., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) Coordinates: * ROW (ROW) int64 0 1 2 3 4 5 6 7 8 9 * COL (COL) int64 0 1 2 3 4 5 6 7 8 9 <xarray.DataArray 'LAND' (ROW: 5, COL: 5)> array([[ 1., nan, nan, nan, nan], [nan, 1., nan, nan, nan], [nan, nan, 1., nan, nan], [nan, nan, nan, 1., nan], [nan, nan, nan, nan, 1.]]) Coordinates: * ROW (ROW) int64 0 2 4 6 8 * COL (COL) int64 0 2 4 6 8

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.144+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.18.2 pandas: 1.1.5 numpy: 1.19.5 scipy: 1.4.1 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: 3.1.0 Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.12.0 distributed: 1.25.3 matplotlib: 3.2.2 cartopy: None seaborn: 0.11.2 numbagg: None pint: None setuptools: 57.4.0 pip: 21.1.3 conda: None pytest: 3.6.4 IPython: 5.5.0 sphinx: 1.8.6
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6103/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 0.646ms · About: xarray-datasette