home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where type = "issue" and user = 4753005 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: locked, comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 4
  • open 1

type 1

  • issue · 5 ✖

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1694671281 I_kwDOAMm_X85lAqGx 7812 Appending to existing zarr store writes mostly NaN from dask arrays, but not numpy arrays grahamfindlay 4753005 open 0     1 2023-05-03T19:30:13Z 2023-11-15T18:56:09Z   NONE      

What is your issue?

I am using xarray to consolidate ~24 pre-existing, moderately large netCDF files into a single zarr store. Each file contains a DataArray with dimensions (channel, time), and no values are nan. Each file's timeseries picks up right where the previous one's left off, making this a perfect use case for out-of-memory file concatenation. for i, f in enumerate(tqdm(files)): da = xr.open_dataarray(f) # Open the netCDF file da = da.chunk({'channel': da.channel.size, 'time': 'auto'}) # Chunk along the time dimension if i == 0: da.to_zarr(zarr_file, mode="w") else: da.to_zarr(zarr_file, append_dim='time') da.close() This always writes the first file correctly, and every other file appends without warning or error, but when I read the resulting zarr store, ~25% of all timepoints (probably, time chunks) derived from files i > 0 are nan.

Admittedly, the above code seems dangerous, since there is no guarantee that da.chunk({'time': 'auto'}) will always return chunks of the same size, even though the files are nearly identical in size, and I don't know what the expected behavior is if the dask chunksizes don't match the chunksizes of the pre-existing zarr store. I checked the docs but didn't find the answer.

Even if the chunksizes always do match, I am not sure what will happen when appending to an existing store. If the last chunk in the store before appending is not a full chunk, will it be "filled in" when new data are appended to the store? Presumably, but this seems like it could cause problems with parallel writing, since the source chunks from a dask array almost certainly won't line up with the new chunks in the zarr store, unless you've been careful to make it so.

In any case, the following change seems to solve the issue, and the zarr store no longer contains nan. for i, f in enumerate(tqdm(files)): da = xr.open_dataarray(f) # Open the netCDF file if i == 0: da = da.chunk({'channel': da.channel.size, 'time': 'auto'}) # Chunk along the time dimension da.to_zarr(zarr_file, mode="w") else: da.to_zarr(zarr_file, append_dim='time') da.close() I didn't file this as a bug, because I was doing something that was a bad idea, but it does seem like to_zarr should have stopped me from doing it in the first place.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7812/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1717787692 I_kwDOAMm_X85mY1ws 7853 Surprising behavior of DataArray.chunk when using automatic chunksize determination grahamfindlay 4753005 closed 1     2 2023-05-19T20:31:25Z 2023-08-01T16:27:19Z 2023-08-01T16:27:19Z NONE      

What is your issue?

I have a DataArray da with dims (x, y), and additional coordinates such as x_coord on dim x. If I try to chunk this array using da.chunk(chunks={'x': 'auto'}), I end up with a situation where: 1. The data themselves are chunked along x with chunksize a. 2. The x coordinate itself is not chunked. 3. The x_coord coordinate on dim x is chunked, with chunksize b != a.

As far as I can tell, what is going on is that da.chunk(chunks={'x': 'auto'}) is autodetermining the chunksize differently for each "thing" (data, variable, coordinate, etc) on the x dimension. What I expected was for it to determine one chunksize based on the data in the array, then use that chunksize (or no chunking) to each coordinate as well. Maybe there could be an option to yield unified chunks by default.

I discovered this because after chunking, da.chunksizes raises a ValueError because of the mismatch between the data and x_coord, and the proposed solution -- calling da.unify_chunks() -- then results in irregular chunksizes on both the data and x_coord. To get the behavior that I expected I have to call da.chunk(da.encoding['preferred_chunks']), which also, incidentally, seems like what I would have expected from da.unify_chunks().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7853/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1421180629 I_kwDOAMm_X85UtX7V 7207 Difficulties with selecting from numpy.datetime64[ns] dimensions grahamfindlay 4753005 closed 0     3 2022-10-24T17:35:01Z 2022-10-24T22:45:36Z 2022-10-24T22:45:36Z NONE      

What is your issue?

I have a DataArray ("spgs") containing time-frequency data, with a time dimension of dtype numpy.datetime64[ns]. I used to be able to select using: ```

Select using datetime strings

spgs.sel(time=slice("2022-10-13T09:00:00", "2022-10-13T21:00:00")

Select using Timestamp objects

rng = tuple(pd.to_datetime(x) for x in ["2022-10-13T09:00:00", "2022-10-13T21:00:00"]) spgs.sel(time=slice(rng)) # Select using numpy.datetime64[ns] objects, such that rng[0].dtype == spgs.time.values.dtype rng = tuple(pd.to_datetime(["2022-10-13T09:00:00", "2022-10-13T21:00:00"]).values) spg.sel(time=slice(rng)) None of these work after upgrading to v2022.10.0. The first method yields: Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/dataarray.py", line 1523, in sel ds = self._to_temp_dataset().sel( File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/dataset.py", line 2550, in sel query_results = map_index_queries( File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexing.py", line 183, in map_index_queries results.append(index.sel(labels, **options)) # type: ignore[call-arg] File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexes.py", line 434, in sel indexer = _query_slice(self.index, label, coord_name, method, tolerance) File "/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexes.py", line 210, in _query_slice raise KeyError( KeyError: "cannot represent labeled-based slice indexer for coordinate 'time' with a slice over integer positions; the index is unsorted or non-unique" The second two methods yield: Traceback (most recent call last): File "pandas/_libs/index.pyx", line 545, in pandas._libs.index.DatetimeEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item KeyError: 1665651600000000000 ... KeyError: Timestamp('2022-10-13 09:00:00') Interestingly, this works: start = spgs.time.values.min() stop = spgs.time.values.max() spgs.sel(time=slice(start, stop)) This does not: start = spgs.time.values.min() stop = start + pd.to_timedelta('10s') spgs.sel(time=slice(start, stop)) ```

I filed this as an issue and not a bug, because from reading other issues here and over at pandas, it seems like this may be an unintended consequence of changes to Datetime/Timestamp handling, especially within pandas, rather than a bug with xarray per se. This is supported by the fact that downgrading xarray to 2022.9.0, without touching other dependencies (e.g. pandas), does not restore the old behavior.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7207/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1352920776 I_kwDOAMm_X85Qo-7I 6960 Unable to import xarray after installing "io" extras in Python 3.10.* grahamfindlay 4753005 closed 0     3 2022-08-27T02:50:48Z 2022-09-01T10:15:30Z 2022-09-01T10:15:30Z NONE      

What happened?

When installed into a Python 3.10 environment with a basic pip install xarray, there are no issues importing xarray. But when installing with pip install xarray[io], the following error results upon import: ``` Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import xarray as xr Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/init.py", line 1, in <module> from . import testing, tutorial File "/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/tutorial.py", line 13, in <module> from .backends.api import open_dataset as open_dataset File "/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/backends/__init__.py", line 14, in <module> from .pydap import PydapDataStore File "/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/backends/pydap_.py", line 20, in <module> import pydap.client File "/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/pydap/client.py", line 50, in <module> from .model import DapType File "/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/pydap/model.py", line 175, in <module> from collections import OrderedDict, Mapping ImportError: cannot import name 'Mapping' from 'collections' (/home/gfindlay/miniconda3/envs/foo/lib/python3.10/collections/init.py) `` It appears that having the extras installed causes an alternate series of imports within xarray that have not been updated for Python 3.10 (from collections import Mappingshould befrom collections.abc import Mapping`).

What did you expect to happen?

No response

Minimal Complete Verifiable Example

```Python

mamba create -n foo python=3 mamba activate foo pip install xarray[io] python

import xarray as xr ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

N/A
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6960/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1317502063 I_kwDOAMm_X85Oh3xv 6826 Success of DataArray.plot() depends on object's history. grahamfindlay 4753005 closed 0     1 2022-07-25T23:40:07Z 2022-07-26T22:48:39Z 2022-07-26T22:48:39Z NONE      

What happened?

I have the following 2D DataArray ``` ldda ````

I can select a portion of it like so da1 = ldda.sel(component=0) da1

I can get what seems like an equivalent array (equal values, matching dtypes, etc.) in the following way: da2 = ldda.to_dataset(dim="component")[0] da2

And yet, while I can successfully plot da1... da1.plot()

Trying to do the same with da2 results in the following error... da2.plot()

AttributeError: 'int' object has no attribute 'startswith'

See below for full traceback and minimal working example.

What did you expect to happen?

I expected da1 and da2 to be functionally equivalent.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

da = xr.DataArray( data=np.asarray([[1, 2], [3, 4], [5, 6]]), dims=["x", "y"], )

da.sel(x=0).plot() # Succeeds da.to_dataset(dim='x')[0].plot() # Fails ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

AttributeError Traceback (most recent call last) /Volumes/scratch/neuropixels/t2_shared_projects/discoflow_v2/discoflow/analysis/ANPIX30/discoflow-day2/get_senzai_ic_loadings.ipynb Cell 18 in <cell line: 1>() ----> 1 da2.plot()

File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/plot.py:866, in _PlotMethods.call(self, kwargs) 865 def call(self, kwargs): --> 866 return plot(self._da, **kwargs)

File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/plot.py:332, in plot(darray, row, col, col_wrap, ax, hue, rtol, subplot_kws, kwargs) 328 plotfunc = hist 330 kwargs["ax"] = ax --> 332 return plotfunc(darray, kwargs)

File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/plot.py:436, in line(darray, row, col, figsize, aspect, size, ax, hue, x, y, xincrease, yincrease, xscale, yscale, xticks, yticks, xlim, ylim, add_legend, _labels, args, kwargs) 432 xplt_val, yplt_val, x_suffix, y_suffix, kwargs = _resolve_intervals_1dplot( 433 xplt.to_numpy(), yplt.to_numpy(), kwargs 434 ) 435 xlabel = label_from_attrs(xplt, extra=x_suffix) --> 436 ylabel = label_from_attrs(yplt, extra=y_suffix) 438 _ensure_plottable(xplt_val, yplt_val) 440 primitive = ax.plot(xplt_val, yplt_val, args, **kwargs)

File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/utils.py:491, in label_from_attrs(da, extra) 488 units = _get_units_from_attrs(da) ... 493 textwrap.wrap(name + extra + units, 60, break_long_words=False) 494 ) 495 else:

AttributeError: 'int' object has no attribute 'startswith' ```

Anything else we need to know?

Thank you for one of my favorite packages!

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-122-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2022.3.0 pandas: 1.4.3 numpy: 1.21.0 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.7.0 distributed: None matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None setuptools: 63.2.0 pip: 22.2 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6826/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 801.031ms · About: xarray-datasette