github: issues: 2 rows where repo = 13221727, state = "closed" and user = 19285200 sorted by updated

2 rows where repo = 13221727, state = "closed" and user = 19285200 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at ▲	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
1722417436	I_kwDOAMm_X85mqgEc	7868	`open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING`	ghiggi 19285200	closed	0			8	2023-05-23T16:23:07Z	2023-11-17T15:26:01Z	2023-11-17T15:26:01Z	NONE				What is your issue? I noticed that `open_dataset` with `chunks="auto"` fails when netCDF4 variables/coordinates are encoded as `NC_STRING`. The reason is that xarray reads netCDF4 `NC_STRING` as `object` type, and `dask` cannot estimate the size of a `object` dtype. As a workaround, the user must currently rewrite the netCDF4 and specify the string DataArray(s) `encoding`(s) as a fixed-length string type (i.e `"S2"` if max string length is 2) so that the data are written as `NC_CHAR` and xarray read it back as byte-encoded fixed-length string type. Here below I provide a reproducible example ``` import xarray as xr import numpy as np Define string datarray arr = np.array(["M6", "M3"], dtype=str) print(arr.dtype) # <U2 da = xr.DataArray(data=arr, dims=("time")) data_vars = {"str_arr": da} Create dataset ds_nc_string = xr.Dataset(data_vars=data_vars) Set chunking to see behaviour at read-time ds_nc_string["str_arr"] = ds_nc_string["str_arr"].chunk(1) # chunks ((1,1),) Write dataset with NC_STRING ds_nc_string["str_arr"].encoding["dtype"] = str ds_nc_string.to_netcdf("/tmp/nc_string.nc") Write dataset with NC_CHAR ds_nc_char = xr.Dataset(data_vars=data_vars) ds_nc_char["str_arr"].encoding["dtype"] = "S2" ds_nc_char.to_netcdf("/tmp/nc_char.nc") When NC_STRING, chunks="auto" does not work when string are saved as --> NC STRING is read as object, and dask can not estimate chunk size ! If chunks={} it reads the NC_STRING array in a single dask chunk !!! ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks="auto") # NotImplementedError ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks={}) # Works ds_nc_string.chunks # chunks (2,) With NC_CHAR, chunks={} and chunks="auto" works and returns the same result! ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks={}) ds_nc_char.chunks # chunks (2,) ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks="auto") ds_nc_char.chunks # chunks (2,) NC_STRING is read back as object ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks=None) ds_nc_string["str_arr"].dtype # object NC_CHAR is read back as fixed length byte-string representation (S2) ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks=None) ds_nc_char["str_arr"].dtype # S2 ds_nc_char["str_arr"].data.astype(str) # U2 ``` Questions: - `open_dataset` should not take care of automatically deserializing the `NC_CHAR` fixed-length byte-string representation into a `Unicode string`? - `open_dataset` should not take care of automatically reading `NC_STRING` as `Unicode string` (converting `object` to `str`)? Related issues are: - https://github.com/pydata/xarray/issues/7652 - https://github.com/pydata/xarray/issues/2059 - https://github.com/pydata/xarray/pull/7654 - https://github.com/pydata/xarray/issues/2040	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7868/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue
1368027148	I_kwDOAMm_X85RinAM	7014	xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax	ghiggi 19285200	closed	0			10	2022-09-09T15:59:31Z	2023-03-28T09:18:02Z	2023-03-28T09:18:02Z	NONE				What happened? If `cmap.set_over` is specified, the array color mapping and the colorbar behave badly if the array does not contain values above the `norm.vmax`. Let's take an array and apply a colormap and norm (see code below) Now, if in the array I change the array values larger than the `norm.vmax` (the 2 bottom right pixels) with other values inside the norm: - Using matplotlib I get the expected results - Using xarray I get this weird behavior. What did you expect to happen? The colorbar should not "shift" and the array should be colormapped correctly This is possibily related also to https://github.com/pydata/xarray/issues/4061 Minimal Complete Verifiable Example ```Python import matplotlib.colors import numpy as np import xarray as xr import matplotlib as mpl import matplotlib.pyplot as plt Define DataArray arr = np.array([[0, 10, 15, 20], [ np.nan, 40, 50, 100], [150, 158, 160, 161], ]) lon = np.arange(arr.shape[1]) lat = np.arange(arr.shape[0])[::-1] lons, lats = np.meshgrid(lon, lat) da = xr.DataArray(arr, dims=["y", "x"], coords={"lon": (("y","x"), lons), "lat": (("y","x"), lats), } ) da Define colormap color_list = ["#9c7e94", "#640064", "#009696", "#C8FF00", "#FF7D00"] levels = [0.05, 1, 10, 20, 150, 160] cmap = mpl.colors.LinearSegmentedColormap.from_list("cmap", color_list, len(levels) - 1) norm = mpl.colors.BoundaryNorm(levels, cmap.N) cmap.set_over("darkred") # color for above 160 cmap.set_under("none") # color for below 0.05 cmap.set_bad("gray", 0.2) # color for nan Define colorbar settings ticks = levels cbar_kwargs = { 'extend': "max", } Correct plot p = da.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs) plt.show() Remove values larger than the norm.vmax level da1 = da.copy() da1.data[da1.data>=norm.vmax] = norm.vmax - 1 # could be replaced with any value inside the norm With matplotlib.pcolormesh [OK] p = plt.pcolormesh(da1["lon"].data, da1["lat"], da1.data, cmap=cmap, norm=norm) plt.colorbar(p, cbar_kwargs) plt.show() With matplotlib.imshow [OK] p = plt.imshow(da1.data, cmap=cmap, norm=norm) plt.colorbar(p, cbar_kwargs) plt.show() With xarray.pcolormesh [BUG] --> The colorbar shift !!! da1.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs) plt.show() With xarray.imshow [BUG] --> The colorbar shift !!! da1.plot.imshow(cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs, origin="upper") ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 \| packaged by conda-forge \| (main, May 27 2022, 16:56:21) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.0 cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.7.1 distributed: 2022.7.1 matplotlib: 3.5.2 cartopy: 0.20.3 seaborn: 0.11.2 numbagg: None fsspec: 2022.7.1 cupy: None pint: 0.19.2 sparse: None flox: None numpy_groupies: None setuptools: 63.3.0 pip: 22.2.2 conda: None pytest: None IPython: 7.33.0 sphinx: 5.1.1 /home/ghiggi/anaconda3/envs/gpm_geo/lib/python3.9/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7014/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

2 rows where repo = 13221727, state = "closed" and user = 19285200 sorted by updated_at descending

What is your issue?

Define string datarray

Create dataset

Set chunking to see behaviour at read-time

Write dataset with NC_STRING

Write dataset with NC_CHAR

When NC_STRING, chunks="auto" does not work when string are saved as

--> NC STRING is read as object, and dask can not estimate chunk size !

If chunks={} it reads the NC_STRING array in a single dask chunk !!!

With NC_CHAR, chunks={} and chunks="auto" works and returns the same result!

NC_STRING is read back as object

NC_CHAR is read back as fixed length byte-string representation (S2)

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Define DataArray

Define colormap

Define colorbar settings

Correct plot

Remove values larger than the norm.vmax level

With matplotlib.pcolormesh [OK]

With matplotlib.imshow [OK]

With xarray.pcolormesh [BUG]

--> The colorbar shift !!!

With xarray.imshow [BUG]

--> The colorbar shift !!!

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Advanced export