github: issues: 5 rows where state = "closed", type = "issue" and user = 4666753 sorted by updated

5 rows where state = "closed", type = "issue" and user = 4666753 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1177665302	I_kwDOAMm_X85GMb8W	6401	Unnecessary warning when specifying `chunks` opening dataset with empty dimension	jaicher 4666753	closed	0	2022-03-23T06:38:25Z	2022-04-09T20:27:40Z	2022-04-09T20:27:40Z	CONTRIBUTOR	What happened? I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the `chunks` argument (for a non-empty dimension). If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk. What did you expect to happen? I expect no warning to be raised when there is no data: performance degradation on an empty array should be negligible. we don't always know if one of the dimensions is empty until loading. But we would use the `chunks` parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np each `a` is expected to be chunked separately ds = xr.Dataset({"x": (("a", "b"), np.empty((4, 0)))}).chunk({"a": 1}) but when we save it, it gets saved as a single chunk ds.to_zarr("tmp.zarr") so if we open it up with expected chunksizes (not knowing that b is empty): ds2 = xr.open_zarr("tmp.zarr", chunks={"a": 1}) we get a warning :( ``` Relevant log output `Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)` Anything else we need to know? This can be fixed by only calling `_check_chunks_compatibility()` whenever `var` is nonempty (PR forthcoming). Environment INSTALLED VERSIONS [3/1946] commit: None python: 3.8.12 \| packaged by conda-forge \| (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1076377174	I_kwDOAMm_X85AKDZW	6062	Import hangs when matplotlib installed but no display available	jaicher 4666753	closed	3	2021-12-10T03:12:55Z	2021-12-29T07:56:59Z	2021-12-29T07:56:59Z	CONTRIBUTOR	What happened: On a device with no display available, importing xarray without setting the matplotlib backend hangs on import of matplotlib.pyplot since #5794 was merged. What you expected to happen: I expect to be able to run `import xarray` without needing to mess with environment variables or import matplotlib and change the default backend.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6062/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1033142897	I_kwDOAMm_X849lIJx	5883	Failing parallel writes to_zarr with regions parameter?	jaicher 4666753	closed	1	2021-10-22T03:33:02Z	2021-10-22T18:37:06Z	2021-10-22T18:37:06Z	CONTRIBUTOR	What happened: Following guidance on how to use regions keyword in `xr.Dataset.to_zarr()`, I wrote a multithreaded program that makes independent writes to each index along an axis. But, when I use more than one thread, some of these writes fail. What you expected to happen: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not). Minimal Complete Verifiable Example: ```python path = "tmp.zarr" NTHREADS = 4 # when 1, things work as expected import multiprocessing.dummy as mp # threads, instead of processes import numpy as np import dask.array as da import xarray as xr dummy values for metadata xr.Dataset( {"x": (("a", "b"), -da.ones((10, 7), chunks=(None, 1)))}, {"apple": ("a", -da.ones(10, dtype=int, chunks=(1,)))}, ).to_zarr(path, mode="w", compute=False) actual values to save ds = xr.Dataset( {"x": (("a", "b"), np.random.uniform(size=(10, 7)))}, {"apple": ("a", np.arange(10))}, ) save them using NTHREADS with mp.Pool(NTHREADS) as p: p.map( lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode="r+", region=dict(a=slice(idx, 1 + idx))), range(10) ) ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a xr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1. ``` Anything else we need to know?: this behavior is the same if coordinate "apple" (over a) is changed to be coordinate "a" (index over dimension) if dummy dataset had "apple" defined using dask, I observed `ds_roundtrip` having all correct values of "apple" (but not "x"). But, if it was defined as a numpy array, I observed `ds_roundtrip` having incorrect values of "apple" (in addition to "x"). Environment: Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 \| packaged by conda-forge \| (default, May 11 2021, 07:01:05) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: 0.11.2 numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: None pytest: None IPython: 7.28.0 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5883/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
980549418	MDU6SXNzdWU5ODA1NDk0MTg=	5741	Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError)	jaicher 4666753	closed	0	2021-08-26T18:57:00Z	2021-10-10T00:02:42Z	2021-10-10T00:02:42Z	CONTRIBUTOR	What happened: I have an `xr.Dataset` with a dask-array-valued variable including zero-length dimension (other variables are non-empty). I tried saving it to zarr, but it fails with a zero division error. What you expected to happen: I expect it to save without any errors. Minimal Complete Verifiable Example: the following commands fail. ```python import numpy as np import xarray as xr ds = xr.Dataset( {"x": (("a", "b", "c"), np.empty((75, 0, 30))), "y": (("a", "c"), np.random.normal(size=(75, 30)))}, {"a": np.arange(75), "b": [], "c": np.arange(30)}, ).chunk({}) ds.to_zarr("fails.zarr") # RAISES ZeroDivisionError ``` Anything else we need to know?: If we load all the empty arrays to numpy, it is able to save correctly. That is: `python ds["x"].load() # run on all variables that have a zero dimension ds.to_zarr("works.zarr") # successfully runs` I'll make a PR using this solution, but not sure if this is a deeper bug that should be fixed in zarr or in a nicer way. Environment: Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 \| packaged by conda-forge \| (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.7.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.9.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5741/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
559841620	MDU6SXNzdWU1NTk4NDE2MjA=	3748	`swap_dims()` incorrectly changes underlying index name	jaicher 4666753	closed	1	2020-02-04T16:41:25Z	2020-02-24T22:34:58Z	2020-02-24T22:34:58Z	CONTRIBUTOR	MCVE Code Sample ```python import xarray as xr create data array with named dimension and named coordinate x = xr.DataArray([1], {"idx": [2], "y": ("idx", [3])}, ["idx"], name="x") what's our current index? (idx, this is fine) x.indexes prints "idx: Int64Index([2], dtype='int64', name='idx')" swap dim so that y is our dimension, what's index now? x.swap_dims({"idx": "y"}).indexes prints "y: Int64Index([3], dtype='int64', name='idx')" ``` The dimension name is appropriately swapped but the pandas index name is incorrect. Expected Output ```python swap dim so that y is our dimension, what's index now? x.swap_dims({"idx": "y"}).indexes prints "y: Int64Index([3], dtype='int64', name='y')" ``` Problem Description This is a problem because running `x.swap_dims({"idx": "y"}).to_dataframe()` gives a dataframe with columns `["x", "idx"]` and index `"idx"`. This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening. Output of `xr.show_versions()` # Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 \| packaged by conda-forge \| (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 0.25.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3748/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

5 rows where state = "closed", type = "issue" and user = 4666753 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

each `a` is expected to be chunked separately

but when we save it, it gets saved as a single chunk

so if we open it up with expected chunksizes (not knowing that b is empty):

we get a warning :(

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS [3/1946]

dummy values for metadata

actual values to save

save them using NTHREADS

perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a

MCVE Code Sample

create data array with named dimension and named coordinate

what's our current index? (idx, this is fine)

prints "idx: Int64Index([2], dtype='int64', name='idx')"

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='idx')"

Expected Output

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='y')"

Problem Description

Output of `xr.show_versions()`

Advanced export

issues

5 rows where state = "closed", type = "issue" and user = 4666753 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

each a is expected to be chunked separately

but when we save it, it gets saved as a single chunk

so if we open it up with expected chunksizes (not knowing that b is empty):

we get a warning :(

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS [3/1946]

dummy values for metadata

actual values to save

save them using NTHREADS

perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a

MCVE Code Sample

create data array with named dimension and named coordinate

what's our current index? (idx, this is fine)

prints "idx: Int64Index([2], dtype='int64', name='idx')"

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='idx')"

Expected Output

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='y')"

Problem Description

Output of xr.show_versions()

Advanced export

each `a` is expected to be chunked separately

Output of `xr.show_versions()`