home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

7 rows where repo = 13221727, type = "issue" and user = 4666753 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 5
  • open 2

type 1

  • issue · 7 ✖

repo 1

  • xarray · 7 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
668256331 MDU6SXNzdWU2NjgyNTYzMzE= 4288 hue argument for xarray.plot.step() for plotting multiple histograms over shared bins jaicher 4666753 open 0     2 2020-07-30T00:30:37Z 2022-04-17T19:27:28Z   CONTRIBUTOR      

Is your feature request related to a problem? Please describe.

I love how efficiently we can plot line data for different observations using xr.DataArray.plot(hue={hue coordinate name}) over a 2D array, and I have appreciated xr.DataArray.plot.step() for plotting histogram data using interval coordinates. Today, I wanted to plot/compare several histograms over the same set of bins. I figured I could write xr.DataArray.plot.step(hue={...}), but I found out that this functionality is not implemented.

Describe the solution you'd like

I think we should have a hue kwarg for xr.DataArray.plot.step(). When specified, we would be able to plot 2D data in the same way as xr.DataArray.plot(), except that we get a set of step plots instead of a set of line plots.

Describe alternatives you've considered

  • Use xr.DataArray.plot() instead. This is effective for histograms with many bins, but inaccurately represents histograms with coarse bins
  • Manually call xr.DataArray.plot.hist() on each 1D subarray for each label on the hue coordinate, adding appropriate labels and legend. This is fine and my current solution, but I think it would be excellent to use the same shorthand that was developed for line plots.

Additional context

I didn't evaluate the other plotting functions implemented, but I suspect that others could appropriately consider a hue argument but do not yet support doing so.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4288/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1177665302 I_kwDOAMm_X85GMb8W 6401 Unnecessary warning when specifying `chunks` opening dataset with empty dimension jaicher 4666753 closed 0     0 2022-03-23T06:38:25Z 2022-04-09T20:27:40Z 2022-04-09T20:27:40Z CONTRIBUTOR      

What happened?

I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the chunks argument (for a non-empty dimension).

If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the chunks parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk.

What did you expect to happen?

I expect no warning to be raised when there is no data:

  • performance degradation on an empty array should be negligible.
  • we don't always know if one of the dimensions is empty until loading. But we would use the chunks parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

each a is expected to be chunked separately

ds = xr.Dataset({"x": (("a", "b"), np.empty((4, 0)))}).chunk({"a": 1})

but when we save it, it gets saved as a single chunk

ds.to_zarr("tmp.zarr")

so if we open it up with expected chunksizes (not knowing that b is empty):

ds2 = xr.open_zarr("tmp.zarr", chunks={"a": 1})

we get a warning :(

```

Relevant log output

Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)

Anything else we need to know?

This can be fixed by only calling _check_chunks_compatibility() whenever var is nonempty (PR forthcoming).

Environment

INSTALLED VERSIONS [3/1946]

commit: None python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None

xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
552987067 MDU6SXNzdWU1NTI5ODcwNjc= 3712 [Documentation/API?] {DataArray,Dataset}.sortby is stable sort? jaicher 4666753 open 0     0 2020-01-21T16:27:37Z 2022-04-09T02:26:34Z   CONTRIBUTOR      

I noticed that {DataArray,Dataset}.sortby() are implemented using np.lexsort(), which is a stable sort. Can we expect this function to remain a stable sort in the future even if the implementation is changed for some reason?

It is not explicitly stated in the docs that the sorting will be stable. If this function is meant to always be stable, I think the documentation should explicitly state this. If not, I think it would be helpful to have an optional argument to ensure that the sort is kept stable in case the implementation changes in the future.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3712/reactions",
    "total_count": 3,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
1076377174 I_kwDOAMm_X85AKDZW 6062 Import hangs when matplotlib installed but no display available jaicher 4666753 closed 0     3 2021-12-10T03:12:55Z 2021-12-29T07:56:59Z 2021-12-29T07:56:59Z CONTRIBUTOR      

What happened: On a device with no display available, importing xarray without setting the matplotlib backend hangs on import of matplotlib.pyplot since #5794 was merged.

What you expected to happen: I expect to be able to run import xarray without needing to mess with environment variables or import matplotlib and change the default backend.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6062/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1033142897 I_kwDOAMm_X849lIJx 5883 Failing parallel writes to_zarr with regions parameter? jaicher 4666753 closed 0     1 2021-10-22T03:33:02Z 2021-10-22T18:37:06Z 2021-10-22T18:37:06Z CONTRIBUTOR      

What happened: Following guidance on how to use regions keyword in xr.Dataset.to_zarr(), I wrote a multithreaded program that makes independent writes to each index along an axis. But, when I use more than one thread, some of these writes fail.

What you expected to happen: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not).

Minimal Complete Verifiable Example:

```python path = "tmp.zarr" NTHREADS = 4 # when 1, things work as expected import multiprocessing.dummy as mp # threads, instead of processes

import numpy as np import dask.array as da import xarray as xr

dummy values for metadata

xr.Dataset( {"x": (("a", "b"), -da.ones((10, 7), chunks=(None, 1)))}, {"apple": ("a", -da.ones(10, dtype=int, chunks=(1,)))}, ).to_zarr(path, mode="w", compute=False)

actual values to save

ds = xr.Dataset( {"x": (("a", "b"), np.random.uniform(size=(10, 7)))}, {"apple": ("a", np.arange(10))}, )

save them using NTHREADS

with mp.Pool(NTHREADS) as p: p.map( lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode="r+", region=dict(a=slice(idx, 1 + idx))), range(10) ) ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads

perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a

xr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1. ```

Anything else we need to know?:

  • this behavior is the same if coordinate "apple" (over a) is changed to be coordinate "a" (index over dimension)
  • if dummy dataset had "apple" defined using dask, I observed ds_roundtrip having all correct values of "apple" (but not "x"). But, if it was defined as a numpy array, I observed ds_roundtrip having incorrect values of "apple" (in addition to "x").

Environment:

Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: 0.11.2 numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: None pytest: None IPython: 7.28.0 sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5883/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
980549418 MDU6SXNzdWU5ODA1NDk0MTg= 5741 Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError) jaicher 4666753 closed 0     0 2021-08-26T18:57:00Z 2021-10-10T00:02:42Z 2021-10-10T00:02:42Z CONTRIBUTOR      

What happened: I have an xr.Dataset with a dask-array-valued variable including zero-length dimension (other variables are non-empty). I tried saving it to zarr, but it fails with a zero division error.

What you expected to happen: I expect it to save without any errors.

Minimal Complete Verifiable Example: the following commands fail.

```python import numpy as np import xarray as xr

ds = xr.Dataset( {"x": (("a", "b", "c"), np.empty((75, 0, 30))), "y": (("a", "c"), np.random.normal(size=(75, 30)))}, {"a": np.arange(75), "b": [], "c": np.arange(30)}, ).chunk({})

ds.to_zarr("fails.zarr") # RAISES ZeroDivisionError ```

Anything else we need to know?: If we load all the empty arrays to numpy, it is able to save correctly. That is:

python ds["x"].load() # run on all variables that have a zero dimension ds.to_zarr("works.zarr") # successfully runs

I'll make a PR using this solution, but not sure if this is a deeper bug that should be fixed in zarr or in a nicer way.

Environment:

Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.7.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.9.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5741/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
559841620 MDU6SXNzdWU1NTk4NDE2MjA= 3748 `swap_dims()` incorrectly changes underlying index name jaicher 4666753 closed 0     1 2020-02-04T16:41:25Z 2020-02-24T22:34:58Z 2020-02-24T22:34:58Z CONTRIBUTOR      

MCVE Code Sample

```python import xarray as xr

create data array with named dimension and named coordinate

x = xr.DataArray([1], {"idx": [2], "y": ("idx", [3])}, ["idx"], name="x")

what's our current index? (idx, this is fine)

x.indexes

prints "idx: Int64Index([2], dtype='int64', name='idx')"

swap dim so that y is our dimension, what's index now?

x.swap_dims({"idx": "y"}).indexes

prints "y: Int64Index([3], dtype='int64', name='idx')"

``` The dimension name is appropriately swapped but the pandas index name is incorrect.

Expected Output

```python

swap dim so that y is our dimension, what's index now?

x.swap_dims({"idx": "y"}).indexes

prints "y: Int64Index([3], dtype='int64', name='y')"

```

Problem Description

This is a problem because running x.swap_dims({"idx": "y"}).to_dataframe() gives a dataframe with columns ["x", "idx"] and index "idx". This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 0.25.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3748/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 74.533ms · About: xarray-datasette