home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where user = 19285200 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 2
  • open 1

type 1

  • issue 3

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1722417436 I_kwDOAMm_X85mqgEc 7868 `open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` ghiggi 19285200 closed 0     8 2023-05-23T16:23:07Z 2023-11-17T15:26:01Z 2023-11-17T15:26:01Z NONE      

What is your issue?

I noticed that open_dataset with chunks="auto" fails when netCDF4 variables/coordinates are encoded as NC_STRING. The reason is that xarray reads netCDF4 NC_STRING as object type, and dask cannot estimate the size of a object dtype.

As a workaround, the user must currently rewrite the netCDF4 and specify the string DataArray(s) encoding(s) as a fixed-length string type (i.e "S2" if max string length is 2) so that the data are written as NC_CHAR and xarray read it back as byte-encoded fixed-length string type.

Here below I provide a reproducible example

``` import xarray as xr import numpy as np

Define string datarray

arr = np.array(["M6", "M3"], dtype=str) print(arr.dtype) # <U2 da = xr.DataArray(data=arr, dims=("time")) data_vars = {"str_arr": da}

Create dataset

ds_nc_string = xr.Dataset(data_vars=data_vars)

Set chunking to see behaviour at read-time

ds_nc_string["str_arr"] = ds_nc_string["str_arr"].chunk(1) # chunks ((1,1),)

Write dataset with NC_STRING

ds_nc_string["str_arr"].encoding["dtype"] = str ds_nc_string.to_netcdf("/tmp/nc_string.nc")

Write dataset with NC_CHAR

ds_nc_char = xr.Dataset(data_vars=data_vars) ds_nc_char["str_arr"].encoding["dtype"] = "S2" ds_nc_char.to_netcdf("/tmp/nc_char.nc")

When NC_STRING, chunks="auto" does not work when string are saved as

--> NC STRING is read as object, and dask can not estimate chunk size !

If chunks={} it reads the NC_STRING array in a single dask chunk !!!

ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks="auto") # NotImplementedError ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks={}) # Works ds_nc_string.chunks # chunks (2,)

With NC_CHAR, chunks={} and chunks="auto" works and returns the same result!

ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks={})
ds_nc_char.chunks # chunks (2,) ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks="auto") ds_nc_char.chunks # chunks (2,)

NC_STRING is read back as object

ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks=None) ds_nc_string["str_arr"].dtype # object

NC_CHAR is read back as fixed length byte-string representation (S2)

ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks=None) ds_nc_char["str_arr"].dtype # S2 ds_nc_char["str_arr"].data.astype(str) # U2 ```

Questions: - open_dataset should not take care of automatically deserializing the NC_CHAR fixed-length byte-string representation into a Unicode string? - open_dataset should not take care of automatically reading NC_STRING as Unicode string (converting object to str)?

Related issues are: - https://github.com/pydata/xarray/issues/7652 - https://github.com/pydata/xarray/issues/2059 - https://github.com/pydata/xarray/pull/7654 - https://github.com/pydata/xarray/issues/2040

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7868/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1368027148 I_kwDOAMm_X85RinAM 7014 xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax ghiggi 19285200 closed 0     10 2022-09-09T15:59:31Z 2023-03-28T09:18:02Z 2023-03-28T09:18:02Z NONE      

What happened?

If cmap.set_over is specified, the array color mapping and the colorbar behave badly if the array does not contain values above the norm.vmax.

Let's take an array and apply a colormap and norm (see code below) Now, if in the array I change the array values larger than the norm.vmax (the 2 bottom right pixels) with other values inside the norm: - Using matplotlib I get the expected results - Using xarray I get this weird behavior.

What did you expect to happen?

The colorbar should not "shift" and the array should be colormapped correctly This is possibily related also to https://github.com/pydata/xarray/issues/4061

Minimal Complete Verifiable Example

```Python import matplotlib.colors import numpy as np import xarray as xr import matplotlib as mpl import matplotlib.pyplot as plt

Define DataArray

arr = np.array([[0, 10, 15, 20], [ np.nan, 40, 50, 100], [150, 158, 160, 161], ]) lon = np.arange(arr.shape[1]) lat = np.arange(arr.shape[0])[::-1] lons, lats = np.meshgrid(lon, lat) da = xr.DataArray(arr, dims=["y", "x"], coords={"lon": (("y","x"), lons), "lat": (("y","x"), lats), } ) da

Define colormap

color_list = ["#9c7e94", "#640064", "#009696", "#C8FF00", "#FF7D00"] levels = [0.05, 1, 10, 20, 150, 160] cmap = mpl.colors.LinearSegmentedColormap.from_list("cmap", color_list, len(levels) - 1) norm = mpl.colors.BoundaryNorm(levels, cmap.N) cmap.set_over("darkred") # color for above 160 cmap.set_under("none") # color for below 0.05 cmap.set_bad("gray", 0.2) # color for nan

Define colorbar settings

ticks = levels cbar_kwargs = {
'extend': "max",

}

Correct plot

p = da.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs) plt.show()

Remove values larger than the norm.vmax level

da1 = da.copy() da1.data[da1.data>=norm.vmax] = norm.vmax - 1 # could be replaced with any value inside the norm

With matplotlib.pcolormesh [OK]

p = plt.pcolormesh(da1["lon"].data, da1["lat"], da1.data, cmap=cmap, norm=norm) plt.colorbar(p, **cbar_kwargs) plt.show()

With matplotlib.imshow [OK]

p = plt.imshow(da1.data, cmap=cmap, norm=norm) plt.colorbar(p, **cbar_kwargs) plt.show()

With xarray.pcolormesh [BUG]

--> The colorbar shift !!!

da1.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs) plt.show()

With xarray.imshow [BUG]

--> The colorbar shift !!!

da1.plot.imshow(cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs, origin="upper") ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.0 cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.7.1 distributed: 2022.7.1 matplotlib: 3.5.2 cartopy: 0.20.3 seaborn: 0.11.2 numbagg: None fsspec: 2022.7.1 cupy: None pint: 0.19.2 sparse: None flox: None numpy_groupies: None setuptools: 63.3.0 pip: 22.2.2 conda: None pytest: None IPython: 7.33.0 sphinx: 5.1.1 /home/ghiggi/anaconda3/envs/gpm_geo/lib/python3.9/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7014/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
749924639 MDU6SXNzdWU3NDk5MjQ2Mzk= 4607 set_index(..., append=True) act as with append=False with 'Dimensions without coordinates' ghiggi 19285200 open 0     0 2020-11-24T17:59:49Z 2020-11-24T19:37:04Z   NONE      

What happened:

I get into this strange behaviour when trying to recreate a stacked (MultiIndex) coordinate using set_index(...,append=True).

Since it is not possible to save Dataset to netCDF or Zarr containing stacked / MultiIndex coordinates, before writing to disk I used reset_index(<stacked_coordinate>). When reading such data, I need to use set_index(.., append=True) to recreate such stacked coordinate.

What you expected to happen:

I would expect that set_index(..., append=True) would recreate the MultiIndex stacked coordinate. However, this does not occur if the dimension coordinate specified within set_index() is a 'dimension without coordinate'. In such situation, set_index(..., append=True) behaves as set_index(, append=False).

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np

Create Datasets

arr1 = np.random.rand(4, 5).reshape(4,5) arr2 = np.random.rand(4, 5).reshape(4,5) da1 = xr.DataArray(arr1, dims=['nodes','time'], coords={"time": [1,2,3,4,5], "nodes": [1,2,3,4]}, name='var1') da2 = xr.DataArray(arr2, dims=['nodes','time'], coords={"time": [1,2,3,4,5], "nodes": [1,2,3,4]}, name='var2') ds_unstacked = xr.Dataset({'var1':da1,'var2':da2}) print(ds_unstacked)

- Stack variables across a new dimension

da_stacked = ds_unstacked.to_stacked_array(new_dim="variables", variable_dim='variable', sample_dims=['nodes','time'], name="Stacked_Variables") ds_stacked = da_stacked.to_dataset()

- Look at the stacked MultiIndex coordinate 'variables'

print(ds_stacked) print(da_stacked.variables.indexes)

Remove MultiIndex (to save Dataset to netCDF/Zarr, ...)

ds_stacked_disk = ds_stacked.reset_index('variables') print(ds_stacked_disk)

Try to recreate MultiIndex

print(ds_stacked_disk.set_index(variables=['variable'], append=False)) # GOOD ! Replace 'variable' coordinate with 'variables' print(ds_stacked_disk.set_index(variables=['variable'], append=True)) # BUG ! Do not create the expected MultiIndex !

Current workaround to obtain a MultiIndex stacked coordinate

tmp_ds = ds_stacked_disk.assign_coords(variables=(np.arange(0,2))) ds_stacked1 = tmp_ds.set_index(variables=['variable'], append=True)
print(ds_stacked1) # But with level 0 - 'variables_level_0'

Unstack back

- If the BUG is solved, no need to specify the level argument

ds_stacked1['Stacked_Variables'].to_unstacked_dataset(dim='variables', level='variable')

```

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:55:52) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: 1.3.2 dask: 2.27.0 distributed: 2.27.0 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: 3.2.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4607/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 21.577ms · About: xarray-datasette