home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

5 rows where state = "closed", type = "issue" and user = 4666753 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 5 ✖

state 1

  • closed · 5 ✖

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1177665302 I_kwDOAMm_X85GMb8W 6401 Unnecessary warning when specifying `chunks` opening dataset with empty dimension jaicher 4666753 closed 0     0 2022-03-23T06:38:25Z 2022-04-09T20:27:40Z 2022-04-09T20:27:40Z CONTRIBUTOR      

What happened?

I receive unnecessary warnings when opening Zarr datasets with empty dimensions/arrays using the chunks argument (for a non-empty dimension).

If an array has zero size (due to an empty dimension), it is saved as a single chunk regardless of Dask chunking on other dimensions (#5742). If the chunks parameter is provided for other dimensions when loading the Zarr file (based on the expected chunksizes were the array nonempty), xarray gives a warning about potentially degraded performance from splitting the single chunk.

What did you expect to happen?

I expect no warning to be raised when there is no data:

  • performance degradation on an empty array should be negligible.
  • we don't always know if one of the dimensions is empty until loading. But we would use the chunks parameter for dimensions with consistent chunksizes (to specify a multiple of what's on disk) -- this is thrown off when other dimensions are empty.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

each a is expected to be chunked separately

ds = xr.Dataset({"x": (("a", "b"), np.empty((4, 0)))}).chunk({"a": 1})

but when we save it, it gets saved as a single chunk

ds.to_zarr("tmp.zarr")

so if we open it up with expected chunksizes (not knowing that b is empty):

ds2 = xr.open_zarr("tmp.zarr", chunks={"a": 1})

we get a warning :(

```

Relevant log output

Python {...}/miniconda3/envs/new-majiq/lib/python3.8/site-packages/xarray/core/dataset.py:410: UserWarning: Specified Dask chunks (1, 1, 1, 1) would separate on disks chunk shape 4 for dime nsion a. This could degrade performance. (chunks = {'a': (1, 1, 1, 1), 'b': (0,)}, preferred_ch unks = {'a': 4, 'b': 1}). Consider rechunking after loading instead. _check_chunks_compatibility(var, output_chunks, preferred_chunks)

Anything else we need to know?

This can be fixed by only calling _check_chunks_compatibility() whenever var is nonempty (PR forthcoming).

Environment

INSTALLED VERSIONS [3/1946]

commit: None python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None

xarray: 2022.3.0 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.6.0 Nio: None zarr: 2.11.1 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.4 dask: 2022.01.0 distributed: 2022.01.0 matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.4 conda: None pytest: 7.0.1 IPython: 8.1.1 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6401/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1076377174 I_kwDOAMm_X85AKDZW 6062 Import hangs when matplotlib installed but no display available jaicher 4666753 closed 0     3 2021-12-10T03:12:55Z 2021-12-29T07:56:59Z 2021-12-29T07:56:59Z CONTRIBUTOR      

What happened: On a device with no display available, importing xarray without setting the matplotlib backend hangs on import of matplotlib.pyplot since #5794 was merged.

What you expected to happen: I expect to be able to run import xarray without needing to mess with environment variables or import matplotlib and change the default backend.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6062/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1033142897 I_kwDOAMm_X849lIJx 5883 Failing parallel writes to_zarr with regions parameter? jaicher 4666753 closed 0     1 2021-10-22T03:33:02Z 2021-10-22T18:37:06Z 2021-10-22T18:37:06Z CONTRIBUTOR      

What happened: Following guidance on how to use regions keyword in xr.Dataset.to_zarr(), I wrote a multithreaded program that makes independent writes to each index along an axis. But, when I use more than one thread, some of these writes fail.

What you expected to happen: I expect all the writes to take place safely so long as the regions I write to do not overlap (they do not).

Minimal Complete Verifiable Example:

```python path = "tmp.zarr" NTHREADS = 4 # when 1, things work as expected import multiprocessing.dummy as mp # threads, instead of processes

import numpy as np import dask.array as da import xarray as xr

dummy values for metadata

xr.Dataset( {"x": (("a", "b"), -da.ones((10, 7), chunks=(None, 1)))}, {"apple": ("a", -da.ones(10, dtype=int, chunks=(1,)))}, ).to_zarr(path, mode="w", compute=False)

actual values to save

ds = xr.Dataset( {"x": (("a", "b"), np.random.uniform(size=(10, 7)))}, {"apple": ("a", np.arange(10))}, )

save them using NTHREADS

with mp.Pool(NTHREADS) as p: p.map( lambda idx: ds.isel(a=slice(idx, 1 + idx)).to_zarr(path, mode="r+", region=dict(a=slice(idx, 1 + idx))), range(10) ) ds_roundtrip = xr.open_zarr(path).load() # open what we just saved over multiple threads

perfect match for x on some slices of a, but when NTHREADS > 1, x has very different value or NaN on other slices of a

xr.testing.assert_allclose(ds, ds_roundtrip) # fails when NTHREADS > 1. ```

Anything else we need to know?:

  • this behavior is the same if coordinate "apple" (over a) is changed to be coordinate "a" (index over dimension)
  • if dummy dataset had "apple" defined using dask, I observed ds_roundtrip having all correct values of "apple" (but not "x"). But, if it was defined as a numpy array, I observed ds_roundtrip having incorrect values of "apple" (in addition to "x").

Environment:

Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.3 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.10.1 cftime: 1.5.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: 0.11.2 numbagg: None pint: None setuptools: 58.2.0 pip: 21.3 conda: None pytest: None IPython: 7.28.0 sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5883/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
980549418 MDU6SXNzdWU5ODA1NDk0MTg= 5741 Dataset.to_zarr fails on dask array with zero-length dimension (ZeroDivisonError) jaicher 4666753 closed 0     0 2021-08-26T18:57:00Z 2021-10-10T00:02:42Z 2021-10-10T00:02:42Z CONTRIBUTOR      

What happened: I have an xr.Dataset with a dask-array-valued variable including zero-length dimension (other variables are non-empty). I tried saving it to zarr, but it fails with a zero division error.

What you expected to happen: I expect it to save without any errors.

Minimal Complete Verifiable Example: the following commands fail.

```python import numpy as np import xarray as xr

ds = xr.Dataset( {"x": (("a", "b", "c"), np.empty((75, 0, 30))), "y": (("a", "c"), np.random.normal(size=(75, 30)))}, {"a": np.arange(75), "b": [], "c": np.arange(30)}, ).chunk({})

ds.to_zarr("fails.zarr") # RAISES ZeroDivisionError ```

Anything else we need to know?: If we load all the empty arrays to numpy, it is able to save correctly. That is:

python ds["x"].load() # run on all variables that have a zero dimension ds.to_zarr("works.zarr") # successfully runs

I'll make a PR using this solution, but not sure if this is a deeper bug that should be fixed in zarr or in a nicer way.

Environment:

Output of <tt>xr.show_versions()</tt> ``` INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.4.72-microsoft-standard-WSL2 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.2.4 numpy: 1.20.2 scipy: 1.7.1 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.9.3 cftime: 1.5.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.08.1 distributed: 2021.08.1 matplotlib: 3.4.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: None IPython: 7.22.0 sphinx: None ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5741/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
559841620 MDU6SXNzdWU1NTk4NDE2MjA= 3748 `swap_dims()` incorrectly changes underlying index name jaicher 4666753 closed 0     1 2020-02-04T16:41:25Z 2020-02-24T22:34:58Z 2020-02-24T22:34:58Z CONTRIBUTOR      

MCVE Code Sample

```python import xarray as xr

create data array with named dimension and named coordinate

x = xr.DataArray([1], {"idx": [2], "y": ("idx", [3])}, ["idx"], name="x")

what's our current index? (idx, this is fine)

x.indexes

prints "idx: Int64Index([2], dtype='int64', name='idx')"

swap dim so that y is our dimension, what's index now?

x.swap_dims({"idx": "y"}).indexes

prints "y: Int64Index([3], dtype='int64', name='idx')"

``` The dimension name is appropriately swapped but the pandas index name is incorrect.

Expected Output

```python

swap dim so that y is our dimension, what's index now?

x.swap_dims({"idx": "y"}).indexes

prints "y: Int64Index([3], dtype='int64', name='y')"

```

Problem Description

This is a problem because running x.swap_dims({"idx": "y"}).to_dataframe() gives a dataframe with columns ["x", "idx"] and index "idx". This gives ambiguous names and drops the original name, while the DataArray string representation gives no indication that this might be happening.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.8.1 | packaged by conda-forge | (default, Jan 29 2020, 15:06:10) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.0 pandas: 0.25.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.10.1 distributed: 2.10.0 matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 45.1.0.post20200119 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3748/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 23.941ms · About: xarray-datasette