github: issues: 3 rows where user = 19285200 sorted by updated

3 rows where user = 19285200 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1722417436	I_kwDOAMm_X85mqgEc	7868	`open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING`	ghiggi 19285200	closed	8	2023-05-23T16:23:07Z	2023-11-17T15:26:01Z	2023-11-17T15:26:01Z	NONE	What is your issue? I noticed that `open_dataset` with `chunks="auto"` fails when netCDF4 variables/coordinates are encoded as `NC_STRING`. The reason is that xarray reads netCDF4 `NC_STRING` as `object` type, and `dask` cannot estimate the size of a `object` dtype. As a workaround, the user must currently rewrite the netCDF4 and specify the string DataArray(s) `encoding`(s) as a fixed-length string type (i.e `"S2"` if max string length is 2) so that the data are written as `NC_CHAR` and xarray read it back as byte-encoded fixed-length string type. Here below I provide a reproducible example ``` import xarray as xr import numpy as np Define string datarray arr = np.array(["M6", "M3"], dtype=str) print(arr.dtype) # <U2 da = xr.DataArray(data=arr, dims=("time")) data_vars = {"str_arr": da} Create dataset ds_nc_string = xr.Dataset(data_vars=data_vars) Set chunking to see behaviour at read-time ds_nc_string["str_arr"] = ds_nc_string["str_arr"].chunk(1) # chunks ((1,1),) Write dataset with NC_STRING ds_nc_string["str_arr"].encoding["dtype"] = str ds_nc_string.to_netcdf("/tmp/nc_string.nc") Write dataset with NC_CHAR ds_nc_char = xr.Dataset(data_vars=data_vars) ds_nc_char["str_arr"].encoding["dtype"] = "S2" ds_nc_char.to_netcdf("/tmp/nc_char.nc") When NC_STRING, chunks="auto" does not work when string are saved as --> NC STRING is read as object, and dask can not estimate chunk size ! If chunks={} it reads the NC_STRING array in a single dask chunk !!! ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks="auto") # NotImplementedError ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks={}) # Works ds_nc_string.chunks # chunks (2,) With NC_CHAR, chunks={} and chunks="auto" works and returns the same result! ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks={}) ds_nc_char.chunks # chunks (2,) ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks="auto") ds_nc_char.chunks # chunks (2,) NC_STRING is read back as object ds_nc_string = xr.open_dataset("/tmp/nc_string.nc", chunks=None) ds_nc_string["str_arr"].dtype # object NC_CHAR is read back as fixed length byte-string representation (S2) ds_nc_char = xr.open_dataset("/tmp/nc_char.nc", chunks=None) ds_nc_char["str_arr"].dtype # S2 ds_nc_char["str_arr"].data.astype(str) # U2 ``` Questions: - `open_dataset` should not take care of automatically deserializing the `NC_CHAR` fixed-length byte-string representation into a `Unicode string`? - `open_dataset` should not take care of automatically reading `NC_STRING` as `Unicode string` (converting `object` to `str`)? Related issues are: - https://github.com/pydata/xarray/issues/7652 - https://github.com/pydata/xarray/issues/2059 - https://github.com/pydata/xarray/pull/7654 - https://github.com/pydata/xarray/issues/2040	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7868/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1368027148	I_kwDOAMm_X85RinAM	7014	xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax	ghiggi 19285200	closed	10	2022-09-09T15:59:31Z	2023-03-28T09:18:02Z	2023-03-28T09:18:02Z	NONE	What happened? If `cmap.set_over` is specified, the array color mapping and the colorbar behave badly if the array does not contain values above the `norm.vmax`. Let's take an array and apply a colormap and norm (see code below) Now, if in the array I change the array values larger than the `norm.vmax` (the 2 bottom right pixels) with other values inside the norm: - Using matplotlib I get the expected results - Using xarray I get this weird behavior. What did you expect to happen? The colorbar should not "shift" and the array should be colormapped correctly This is possibily related also to https://github.com/pydata/xarray/issues/4061 Minimal Complete Verifiable Example ```Python import matplotlib.colors import numpy as np import xarray as xr import matplotlib as mpl import matplotlib.pyplot as plt Define DataArray arr = np.array([[0, 10, 15, 20], [ np.nan, 40, 50, 100], [150, 158, 160, 161], ]) lon = np.arange(arr.shape[1]) lat = np.arange(arr.shape[0])[::-1] lons, lats = np.meshgrid(lon, lat) da = xr.DataArray(arr, dims=["y", "x"], coords={"lon": (("y","x"), lons), "lat": (("y","x"), lats), } ) da Define colormap color_list = ["#9c7e94", "#640064", "#009696", "#C8FF00", "#FF7D00"] levels = [0.05, 1, 10, 20, 150, 160] cmap = mpl.colors.LinearSegmentedColormap.from_list("cmap", color_list, len(levels) - 1) norm = mpl.colors.BoundaryNorm(levels, cmap.N) cmap.set_over("darkred") # color for above 160 cmap.set_under("none") # color for below 0.05 cmap.set_bad("gray", 0.2) # color for nan Define colorbar settings ticks = levels cbar_kwargs = { 'extend': "max", } Correct plot p = da.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs) plt.show() Remove values larger than the norm.vmax level da1 = da.copy() da1.data[da1.data>=norm.vmax] = norm.vmax - 1 # could be replaced with any value inside the norm With matplotlib.pcolormesh [OK] p = plt.pcolormesh(da1["lon"].data, da1["lat"], da1.data, cmap=cmap, norm=norm) plt.colorbar(p, cbar_kwargs) plt.show() With matplotlib.imshow [OK] p = plt.imshow(da1.data, cmap=cmap, norm=norm) plt.colorbar(p, cbar_kwargs) plt.show() With xarray.pcolormesh [BUG] --> The colorbar shift !!! da1.plot.pcolormesh(x="lon", y="lat", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs) plt.show() With xarray.imshow [BUG] --> The colorbar shift !!! da1.plot.imshow(cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs, origin="upper") ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? No response Environment INSTALLED VERSIONS ------------------ commit: None python: 3.9.13 \| packaged by conda-forge \| (main, May 27 2022, 16:56:21) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-124-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: 1.0.2 h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.0 cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.7.1 distributed: 2022.7.1 matplotlib: 3.5.2 cartopy: 0.20.3 seaborn: 0.11.2 numbagg: None fsspec: 2022.7.1 cupy: None pint: 0.19.2 sparse: None flox: None numpy_groupies: None setuptools: 63.3.0 pip: 22.2.2 conda: None pytest: None IPython: 7.33.0 sphinx: 5.1.1 /home/ghiggi/anaconda3/envs/gpm_geo/lib/python3.9/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.")	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7014/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
749924639	MDU6SXNzdWU3NDk5MjQ2Mzk=	4607	set_index(..., append=True) act as with append=False with 'Dimensions without coordinates'	ghiggi 19285200	open	0	2020-11-24T17:59:49Z	2020-11-24T19:37:04Z		NONE	What happened: I get into this strange behaviour when trying to recreate a stacked (MultiIndex) coordinate using `set_index(...,append=True)`. Since it is not possible to save Dataset to netCDF or Zarr containing stacked / MultiIndex coordinates, before writing to disk I used `reset_index(<stacked_coordinate>)`. When reading such data, I need to use `set_index(.., append=True)` to recreate such stacked coordinate. What you expected to happen: I would expect that `set_index(..., append=True)` would recreate the MultiIndex stacked coordinate. However, this does not occur if the dimension coordinate specified within set_index() is a 'dimension without coordinate'. In such situation, `set_index(..., append=True)` behaves as `set_index(, append=False)`. Minimal Complete Verifiable Example: ```python import xarray as xr import numpy as np Create Datasets arr1 = np.random.rand(4, 5).reshape(4,5) arr2 = np.random.rand(4, 5).reshape(4,5) da1 = xr.DataArray(arr1, dims=['nodes','time'], coords={"time": [1,2,3,4,5], "nodes": [1,2,3,4]}, name='var1') da2 = xr.DataArray(arr2, dims=['nodes','time'], coords={"time": [1,2,3,4,5], "nodes": [1,2,3,4]}, name='var2') ds_unstacked = xr.Dataset({'var1':da1,'var2':da2}) print(ds_unstacked) - Stack variables across a new dimension da_stacked = ds_unstacked.to_stacked_array(new_dim="variables", variable_dim='variable', sample_dims=['nodes','time'], name="Stacked_Variables") ds_stacked = da_stacked.to_dataset() - Look at the stacked MultiIndex coordinate 'variables' print(ds_stacked) print(da_stacked.variables.indexes) Remove MultiIndex (to save Dataset to netCDF/Zarr, ...) ds_stacked_disk = ds_stacked.reset_index('variables') print(ds_stacked_disk) Try to recreate MultiIndex print(ds_stacked_disk.set_index(variables=['variable'], append=False)) # GOOD ! Replace 'variable' coordinate with 'variables' print(ds_stacked_disk.set_index(variables=['variable'], append=True)) # BUG ! Do not create the expected MultiIndex ! Current workaround to obtain a MultiIndex stacked coordinate tmp_ds = ds_stacked_disk.assign_coords(variables=(np.arange(0,2))) ds_stacked1 = tmp_ds.set_index(variables=['variable'], append=True) print(ds_stacked1) # But with level 0 - 'variables_level_0' Unstack back - If the BUG is solved, no need to specify the level argument ds_stacked1['Stacked_Variables'].to_unstacked_dataset(dim='variables', level='variable') ``` Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 \| packaged by conda-forge \| (default, Sep 24 2020, 16:55:52) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.5.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: 0.9.8.4 iris: None bottleneck: 1.3.2 dask: 2.27.0 distributed: 2.27.0 matplotlib: 3.3.2 cartopy: 0.18.0 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20200917 pip: 20.2.3 conda: None pytest: None IPython: 7.18.1 sphinx: 3.2.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4607/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where user = 19285200 sorted by updated_at descending

What is your issue?

Define string datarray

Create dataset

Set chunking to see behaviour at read-time

Write dataset with NC_STRING

Write dataset with NC_CHAR

When NC_STRING, chunks="auto" does not work when string are saved as

--> NC STRING is read as object, and dask can not estimate chunk size !

If chunks={} it reads the NC_STRING array in a single dask chunk !!!

With NC_CHAR, chunks={} and chunks="auto" works and returns the same result!

NC_STRING is read back as object

NC_CHAR is read back as fixed length byte-string representation (S2)

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Define DataArray

Define colormap

Define colorbar settings

Correct plot

Remove values larger than the norm.vmax level

With matplotlib.pcolormesh [OK]

With matplotlib.imshow [OK]

With xarray.pcolormesh [BUG]

--> The colorbar shift !!!

With xarray.imshow [BUG]

--> The colorbar shift !!!

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

Create Datasets

- Stack variables across a new dimension

- Look at the stacked MultiIndex coordinate 'variables'

Remove MultiIndex (to save Dataset to netCDF/Zarr, ...)

Try to recreate MultiIndex

Current workaround to obtain a MultiIndex stacked coordinate

Unstack back

- If the BUG is solved, no need to specify the level argument

Advanced export