github: issues: 7 rows where repo = 13221727, type = "issue" and user = 4666753 sorted by updated

7 rows where repo = 13221727, type = "issue" and user = 4666753 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
668256331	MDU6SXNzdWU2NjgyNTYzMzE=	4288	hue argument for xarray.plot.step() for plotting multiple histograms over shared bins	jaicher 4666753	open	2	2020-07-30T00:30:37Z	2022-04-17T19:27:28Z		CONTRIBUTOR	Is your feature request related to a problem? Please describe. I love how efficiently we can plot line data for different observations using `xr.DataArray.plot(hue={hue coordinate name})` over a 2D array, and I have appreciated `xr.DataArray.plot.step()` for plotting histogram data using interval coordinates. Today, I wanted to plot/compare several histograms over the same set of bins. I figured I could write `xr.DataArray.plot.step(hue={...})`, but I found out that this functionality is not implemented. Describe the solution you'd like I think we should have a hue kwarg for `xr.DataArray.plot.step()`. When specified, we would be able to plot 2D data in the same way as `xr.DataArray.plot()`, except that we get a set of step plots instead of a set of line plots. Describe alternatives you've considered Use `xr.DataArray.plot()` instead. This is effective for histograms with many bins, but inaccurately represents histograms with coarse bins Manually call `xr.DataArray.plot.hist()` on each 1D subarray for each label on the hue coordinate, adding appropriate labels and legend. This is fine and my current solution, but I think it would be excellent to use the same shorthand that was developed for line plots. Additional context I didn't evaluate the other plotting functions implemented, but I suspect that others could appropriately consider a hue argument but do not yet support doing so.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4288/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1177665302	I_kwDOAMm_X85GMb8W	6401	Unnecessary warning when specifying `chunks` opening dataset with empty dimension	jaicher 4666753	closed	0	2022-03-23T06:38:25Z	2022-04-09T20:27:40Z	2022-04-09T20:27:40Z	CONTRIBUTOR	What happened? I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the `chunks` argument (for a non-empty dimension). If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the `chunks` parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk. What did you expect to happen? I expect no warning to be raised when there is no data: performance degradation on an empty array should be negligible. we don't always know if one of the dimensions is empty until loading. But we would use the `chunks` parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np each `a` is expected to be chunked separately ds = xr.Dataset({"x": (("a", "b"), np.empty((4, 0)))}).chunk({"a": 1}) but when we save it, it gets saved as a single chunk ds.to_zarr("tmp.zarr") so if we open it up with expected chunksizes (not knowing that b is empty): ds2 = xr.open_zarr("tmp.zarr", chunks={"a": 1}) we get a warning :( ``` Relevant log output `Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)` Anything else we need to know? This can be fixed by only calling `_check_chunks_compatibility()` whenever `var` is nonempty (PR forthcoming). Environment INSTALLED VERSIONS [3/1946] commit: None python: 3.8.12 \| packaged by conda-forge \| (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
552987067	MDU6SXNzdWU1NTI5ODcwNjc=	3712	[Documentation/API?] {DataArray,Dataset}.sortby is stable sort?	jaicher 4666753	open	0	2020-01-21T16:27:37Z	2022-04-09T02:26:34Z		CONTRIBUTOR	I noticed that `{DataArray,Dataset}.sortby()` are implemented using `np.lexsort()`, which is a stable sort. Can we expect this function to remain a stable sort in the future even if the implementation is changed for some reason? It is not explicitly stated in the docs that the sorting will be stable. If this function is meant to always be stable, I think the documentation should explicitly state this. If not, I think it would be helpful to have an optional argument to ensure that the sort is kept stable in case the implementation changes in the future.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3712/reactions", "total_count": 3, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 1 }		xarray 13221727	issue
1076377174	I_kwDOAMm_X85AKDZW	6062	Import hangs when matplotlib installed but no display available	jaicher 4666753	closed	3	2021-12-10T03:12:55Z	2021-12-29T07:56:59Z	2021-12-29T07:56:59Z	CONTRIBUTOR	What happened: On a device with no display available, importing xarray without setting the matplotlib backend hangs on import of matplotlib.pyplot since #5794 was merged. What you expected to happen: I expect to be able to run `import xarray` without needing to mess with environment variables or import matplotlib and change the default backend.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6062/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
1033142897	I_kwDOAMm_X849lIJx	5883	Failing parallel writes to_zarr with regions parameter?	jaicher 4666753	closed	1	2021-10-22T03:33:02Z	2021-10-22T18:37:06Z	2021-10-22T18:37:06Z	CONTRIBUTOR	What happened: Following guidance on how to use regions keyword in `xr.Dataset.to_zarr()`, I wrote a multithreaded program that makes independent writes to each index along an axis. But, when I use more than one thread, some of these writes fail. What you expected to happen: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not). Minimal Complete Verifiable Example: ```python path = "tmp.zarr" NTHREADS = 4 # when 1, things work as expected import multiprocessing.dummy as mp # threads, instead of processes import numpy as np import dask.array as da import xarray as xr dummy values for metadata xr.Dataset( {"x": (("a", "b"), -da.ones((10, 7), chunks=(None, 1)))}, {"apple": ("a", -da.ones(10, dtype=int, chunks=(1,)))}, ).to_zarr(path, mode="w", compute=False) actual values to save ds = xr.Dataset( {"x": (("a", "b"), np.random.uniform(size=(10, 7)))}, {"apple": ("a", np.arange(10))}, ) save them using NTHREADS with mp.Pool(NTHREADS) as p: p.map( lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode="r+", region=dict(a=slice(idx, 1 + idx))), range(10) ) ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a xr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1. ``` Anything else we need to know?: this behavior is the same if coordinate "apple" (over a) is changed to be coordinate "a" (index over dimension) if dummy dataset had "apple" defined using dask, I observed `ds_roundtrip` having all correct values of "apple" (but not "x"). But, if it was defined as a numpy array, I observed `ds_roundtrip` having incorrect values of "apple" (in addition to "x"). Environment: Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 \| packaged by conda-forge \| (default, May 11 2021, 07:01:05) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: 0.11.2 numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: None pytest: None IPython: 7.28.0 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5883/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
980549418	MDU6SXNzdWU5ODA1NDk0MTg=	5741	Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError)	jaicher 4666753	closed	0	2021-08-26T18:57:00Z	2021-10-10T00:02:42Z	2021-10-10T00:02:42Z	CONTRIBUTOR	What happened: I have an `xr.Dataset` with a dask-array-valued variable including zero-length dimension (other variables are non-empty). I tried saving it to zarr, but it fails with a zero division error. What you expected to happen: I expect it to save without any errors. Minimal Complete Verifiable Example: the following commands fail. ```python import numpy as np import xarray as xr ds = xr.Dataset( {"x": (("a", "b", "c"), np.empty((75, 0, 30))), "y": (("a", "c"), np.random.normal(size=(75, 30)))}, {"a": np.arange(75), "b": [], "c": np.arange(30)}, ).chunk({}) ds.to_zarr("fails.zarr") # RAISES ZeroDivisionError ``` Anything else we need to know?: If we load all the empty arrays to numpy, it is able to save correctly. That is: `python ds["x"].load() # run on all variables that have a zero dimension ds.to_zarr("works.zarr") # successfully runs` I'll make a PR using this solution, but not sure if this is a deeper bug that should be fixed in zarr or in a nicer way. Environment: Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 \| packaged by conda-forge \| (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.7.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.9.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: None ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5741/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
559841620	MDU6SXNzdWU1NTk4NDE2MjA=	3748	`swap_dims()` incorrectly changes underlying index name	jaicher 4666753	closed	1	2020-02-04T16:41:25Z	2020-02-24T22:34:58Z	2020-02-24T22:34:58Z	CONTRIBUTOR	MCVE Code Sample ```python import xarray as xr create data array with named dimension and named coordinate x = xr.DataArray([1], {"idx": [2], "y": ("idx", [3])}, ["idx"], name="x") what's our current index? (idx, this is fine) x.indexes prints "idx: Int64Index([2], dtype='int64', name='idx')" swap dim so that y is our dimension, what's index now? x.swap_dims({"idx": "y"}).indexes prints "y: Int64Index([3], dtype='int64', name='idx')" ``` The dimension name is appropriately swapped but the pandas index name is incorrect. Expected Output ```python swap dim so that y is our dimension, what's index now? x.swap_dims({"idx": "y"}).indexes prints "y: Int64Index([3], dtype='int64', name='y')" ``` Problem Description This is a problem because running `x.swap_dims({"idx": "y"}).to_dataframe()` gives a dataframe with columns `["x", "idx"]` and index `"idx"`. This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening. Output of `xr.show_versions()` # Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 \| packaged by conda-forge \| (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 0.25.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3748/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

7 rows where repo = 13221727, type = "issue" and user = 4666753 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

each `a` is expected to be chunked separately

but when we save it, it gets saved as a single chunk

so if we open it up with expected chunksizes (not knowing that b is empty):

we get a warning :(

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS [3/1946]

dummy values for metadata

actual values to save

save them using NTHREADS

perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a

MCVE Code Sample

create data array with named dimension and named coordinate

what's our current index? (idx, this is fine)

prints "idx: Int64Index([2], dtype='int64', name='idx')"

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='idx')"

Expected Output

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='y')"

Problem Description

Output of `xr.show_versions()`

Advanced export

issues

7 rows where repo = 13221727, type = "issue" and user = 4666753 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

each a is expected to be chunked separately

but when we save it, it gets saved as a single chunk

so if we open it up with expected chunksizes (not knowing that b is empty):

we get a warning :(

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS [3/1946]

dummy values for metadata

actual values to save

save them using NTHREADS

perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a

MCVE Code Sample

create data array with named dimension and named coordinate

what's our current index? (idx, this is fine)

prints "idx: Int64Index([2], dtype='int64', name='idx')"

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='idx')"

Expected Output

swap dim so that y is our dimension, what's index now?

prints "y: Int64Index([3], dtype='int64', name='y')"

Problem Description

Output of xr.show_versions()

Advanced export

each `a` is expected to be chunked separately

Output of `xr.show_versions()`