id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1694671281,I_kwDOAMm_X85lAqGx,7812,"Appending to existing zarr store writes mostly NaN from dask arrays, but not numpy arrays",4753005,open,0,,,1,2023-05-03T19:30:13Z,2023-11-15T18:56:09Z,,NONE,,,,"### What is your issue? I am using `xarray` to consolidate ~24 pre-existing, moderately large netCDF files into a single zarr store. Each file contains a `DataArray` with dimensions `(channel, time)`, and no values are `nan`. Each file's timeseries picks up right where the previous one's left off, making this a perfect use case for out-of-memory file concatenation. ``` for i, f in enumerate(tqdm(files)): da = xr.open_dataarray(f) # Open the netCDF file da = da.chunk({'channel': da.channel.size, 'time': 'auto'}) # Chunk along the time dimension if i == 0: da.to_zarr(zarr_file, mode=""w"") else: da.to_zarr(zarr_file, append_dim='time') da.close() ``` This always writes the first file correctly, and every other file appends without warning or error, but when I read the resulting zarr store, ~25% of all timepoints (probably, time chunks) derived from files `i > 0` are `nan`. Admittedly, the above code seems dangerous, since there is no guarantee that `da.chunk({'time': 'auto'})` will always return chunks of the same size, even though the files are nearly identical in size, and I don't know what the expected behavior is if the dask chunksizes don't match the chunksizes of the pre-existing zarr store. I checked the docs but didn't find the answer. Even if the chunksizes always do match, I am not sure what will happen when appending to an existing store. If the last chunk in the store before appending is not a full chunk, will it be ""filled in"" when new data are appended to the store? Presumably, but this seems like it could cause problems with parallel writing, since the source chunks from a dask array almost certainly won't line up with the new chunks in the zarr store, unless you've been careful to make it so. In any case, the following change seems to solve the issue, and the zarr store no longer contains `nan`. ``` for i, f in enumerate(tqdm(files)): da = xr.open_dataarray(f) # Open the netCDF file if i == 0: da = da.chunk({'channel': da.channel.size, 'time': 'auto'}) # Chunk along the time dimension da.to_zarr(zarr_file, mode=""w"") else: da.to_zarr(zarr_file, append_dim='time') da.close() ``` I didn't file this as a bug, because I was doing something that was a bad idea, but it does seem like `to_zarr` should have stopped me from doing it in the first place. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7812/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1717787692,I_kwDOAMm_X85mY1ws,7853,Surprising behavior of DataArray.chunk when using automatic chunksize determination,4753005,closed,1,,,2,2023-05-19T20:31:25Z,2023-08-01T16:27:19Z,2023-08-01T16:27:19Z,NONE,,,,"### What is your issue? I have a DataArray `da` with dims `(x, y)`, and additional coordinates such as `x_coord` on dim `x`. If I try to chunk this array using `da.chunk(chunks={'x': 'auto'})`, I end up with a situation where: 1. The data themselves are chunked along `x` with chunksize `a`. 2. The `x` coordinate itself is not chunked. 3. The `x_coord` coordinate on dim `x` is chunked, with chunksize `b != a`. As far as I can tell, what is going on is that `da.chunk(chunks={'x': 'auto'})` is autodetermining the chunksize differently for each ""thing"" (data, variable, coordinate, etc) on the `x` dimension. What I expected was for it to determine one chunksize based on the data in the array, then use that chunksize (or no chunking) to each coordinate as well. Maybe there could be an option to yield unified chunks by default. I discovered this because after chunking, `da.chunksizes` raises a ValueError because of the mismatch between the data and `x_coord`, and the proposed solution -- calling `da.unify_chunks()` -- then results in irregular chunksizes on both the data and `x_coord`. To get the behavior that I *expected* I have to call `da.chunk(da.encoding['preferred_chunks'])`, which also, incidentally, seems like what I would have expected from `da.unify_chunks()`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7853/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1421180629,I_kwDOAMm_X85UtX7V,7207,Difficulties with selecting from numpy.datetime64[ns] dimensions,4753005,closed,0,,,3,2022-10-24T17:35:01Z,2022-10-24T22:45:36Z,2022-10-24T22:45:36Z,NONE,,,,"### What is your issue? I have a DataArray (""`spgs`"") containing time-frequency data, with a `time` dimension of dtype `numpy.datetime64[ns]`. I used to be able to select using: ``` # Select using datetime strings spgs.sel(time=slice(""2022-10-13T09:00:00"", ""2022-10-13T21:00:00"") # Select using Timestamp objects rng = tuple(pd.to_datetime(x) for x in [""2022-10-13T09:00:00"", ""2022-10-13T21:00:00""]) spgs.sel(time=slice(*rng)) # Select using numpy.datetime64[ns] objects, such that rng[0].dtype == spgs.time.values.dtype rng = tuple(pd.to_datetime([""2022-10-13T09:00:00"", ""2022-10-13T21:00:00""]).values) spg.sel(time=slice(*rng)) ``` None of these work after upgrading to v2022.10.0. The first method yields: ``` Traceback (most recent call last): File """", line 1, in File ""/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/dataarray.py"", line 1523, in sel ds = self._to_temp_dataset().sel( File ""/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/dataset.py"", line 2550, in sel query_results = map_index_queries( File ""/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexing.py"", line 183, in map_index_queries results.append(index.sel(labels, **options)) # type: ignore[call-arg] File ""/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexes.py"", line 434, in sel indexer = _query_slice(self.index, label, coord_name, method, tolerance) File ""/home/gfindlay/miniconda3/envs/seahorse/lib/python3.10/site-packages/xarray/core/indexes.py"", line 210, in _query_slice raise KeyError( KeyError: ""cannot represent labeled-based slice indexer for coordinate 'time' with a slice over integer positions; the index is unsorted or non-unique"" ``` The second two methods yield: ``` Traceback (most recent call last): File ""pandas/_libs/index.pyx"", line 545, in pandas._libs.index.DatetimeEngine.get_loc File ""pandas/_libs/hashtable_class_helper.pxi"", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item File ""pandas/_libs/hashtable_class_helper.pxi"", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item KeyError: 1665651600000000000 ... KeyError: Timestamp('2022-10-13 09:00:00') ``` Interestingly, this works: ``` start = spgs.time.values.min() stop = spgs.time.values.max() spgs.sel(time=slice(start, stop)) ``` This does not: ``` start = spgs.time.values.min() stop = start + pd.to_timedelta('10s') spgs.sel(time=slice(start, stop)) ``` I filed this as an issue and not a bug, because from reading other issues here and over at pandas, it seems like this may be an unintended consequence of changes to Datetime/Timestamp handling, especially within pandas, rather than a bug with xarray per se. This is supported by the fact that downgrading xarray to 2022.9.0, without touching other dependencies (e.g. pandas), does not restore the old behavior. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7207/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1352920776,I_kwDOAMm_X85Qo-7I,6960,"Unable to import xarray after installing ""io"" extras in Python 3.10.*",4753005,closed,0,,,3,2022-08-27T02:50:48Z,2022-09-01T10:15:30Z,2022-09-01T10:15:30Z,NONE,,,,"### What happened? When installed into a Python 3.10 environment with a basic `pip install xarray`, there are no issues importing xarray. But when installing with `pip install xarray[io]`, the following error results upon import: ``` Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:36:39) [GCC 10.4.0] on linux Type ""help"", ""copyright"", ""credits"" or ""license"" for more information. >>> import xarray as xr Traceback (most recent call last): File """", line 1, in File ""/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/__init__.py"", line 1, in from . import testing, tutorial File ""/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/tutorial.py"", line 13, in from .backends.api import open_dataset as _open_dataset File ""/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/backends/__init__.py"", line 14, in from .pydap_ import PydapDataStore File ""/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/xarray/backends/pydap_.py"", line 20, in import pydap.client File ""/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/pydap/client.py"", line 50, in from .model import DapType File ""/home/gfindlay/miniconda3/envs/foo/lib/python3.10/site-packages/pydap/model.py"", line 175, in from collections import OrderedDict, Mapping ImportError: cannot import name 'Mapping' from 'collections' (/home/gfindlay/miniconda3/envs/foo/lib/python3.10/collections/__init__.py) ``` It appears that having the extras installed causes an alternate series of imports within xarray that have not been updated for Python 3.10 (`from collections import Mapping` should be `from collections.abc import Mapping`). ### What did you expect to happen? _No response_ ### Minimal Complete Verifiable Example ```Python >>> mamba create -n foo python=3 >>> mamba activate foo >>> pip install xarray[io] >>> python > import xarray as xr ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output _No response_ ### Anything else we need to know? _No response_ ### Environment
N/A
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6960/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1317502063,I_kwDOAMm_X85Oh3xv,6826,Success of DataArray.plot() depends on object's history. ,4753005,closed,0,,,1,2022-07-25T23:40:07Z,2022-07-26T22:48:39Z,2022-07-26T22:48:39Z,NONE,,,,"### What happened? I have the following 2D DataArray ``` ldda ```` ![image](https://user-images.githubusercontent.com/4753005/180890862-b32001c5-bd39-44b7-9219-52300c9c3eca.png) I can select a portion of it like so ``` da1 = ldda.sel(component=0) da1 ``` ![image](https://user-images.githubusercontent.com/4753005/180890956-27897555-2711-479e-a5cf-155ff2958929.png) I can get what *seems* like an equivalent array (equal values, matching dtypes, etc.) in the following way: ``` da2 = ldda.to_dataset(dim=""component"")[0] da2 ``` ![image](https://user-images.githubusercontent.com/4753005/180891056-b965a138-71b4-4761-b13f-fda9d0a7238b.png) And yet, while I can successfully plot `da1`... ``` da1.plot() ``` ![image](https://user-images.githubusercontent.com/4753005/180891132-281b0c4d-1364-45ef-846c-8cc6bb71d293.png) Trying to do the same with `da2` results in the following error... ``` da2.plot() ``` > AttributeError: 'int' object has no attribute 'startswith' See below for full traceback and minimal working example. ### What did you expect to happen? I expected `da1` and `da2` to be functionally equivalent. ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np da = xr.DataArray( data=np.asarray([[1, 2], [3, 4], [5, 6]]), dims=[""x"", ""y""], ) da.sel(x=0).plot() # Succeeds da.to_dataset(dim='x')[0].plot() # Fails ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. ### Relevant log output ```Python --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /Volumes/scratch/neuropixels/t2_shared_projects/discoflow_v2/discoflow/analysis/ANPIX30/discoflow-day2/get_senzai_ic_loadings.ipynb Cell 18 in () ----> 1 da2.plot() File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/plot.py:866, in _PlotMethods.__call__(self, **kwargs) 865 def __call__(self, **kwargs): --> 866 return plot(self._da, **kwargs) File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/plot.py:332, in plot(darray, row, col, col_wrap, ax, hue, rtol, subplot_kws, **kwargs) 328 plotfunc = hist 330 kwargs[""ax""] = ax --> 332 return plotfunc(darray, **kwargs) File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/plot.py:436, in line(darray, row, col, figsize, aspect, size, ax, hue, x, y, xincrease, yincrease, xscale, yscale, xticks, yticks, xlim, ylim, add_legend, _labels, *args, **kwargs) 432 xplt_val, yplt_val, x_suffix, y_suffix, kwargs = _resolve_intervals_1dplot( 433 xplt.to_numpy(), yplt.to_numpy(), kwargs 434 ) 435 xlabel = label_from_attrs(xplt, extra=x_suffix) --> 436 ylabel = label_from_attrs(yplt, extra=y_suffix) 438 _ensure_plottable(xplt_val, yplt_val) 440 primitive = ax.plot(xplt_val, yplt_val, *args, **kwargs) File /Volumes/scratch/neuropixels/t2_shared_envs/discoflow_v2/lib/python3.8/site-packages/xarray/plot/utils.py:491, in label_from_attrs(da, extra) 488 units = _get_units_from_attrs(da) ... 493 textwrap.wrap(name + extra + units, 60, break_long_words=False) 494 ) 495 else: AttributeError: 'int' object has no attribute 'startswith' ``` ### Anything else we need to know? Thank you for one of my favorite packages! ### Environment
INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-122-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: None xarray: 2022.3.0 pandas: 1.4.3 numpy: 1.21.0 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.7.0 distributed: None matplotlib: 3.5.1 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: None sparse: None setuptools: 63.2.0 pip: 22.2 conda: None pytest: 7.1.2 IPython: 8.4.0 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6826/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue