id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1656130602,I_kwDOAMm_X85itowq,7726,open_zarr: PermissionError with multiple processes despite use of ProcessSynchronizer,34257249,open,0,,,0,2023-04-05T18:55:12Z,2023-04-06T01:37:32Z,,CONTRIBUTOR,,,,"### What happened?
Several processes read and write to a xarray stored in .zarr format, on a network.
The write operations write to existing regions. These regions are not aligned to chunks, [therefore](https://zarr.readthedocs.io/en/stable/tutorial.html#parallel-computing-and-synchronization) I use a [ProcessSynchronizer](https://zarr.readthedocs.io/en/stable/api/sync.html#zarr.sync.ProcessSynchronizer).
The ProcessSynchronizer points to a local folder on SSD, separate from the actual stored array.
After several hundreds of read/write I get permission errors like below.
So far I have failed to reproduce the error with a MCVE.
The file `0` that gave a permission error is the chunk of coordinates of a certain dimension in the dimension folder `dim_yyy`:
```
dim_yyy
|-- .zarray
|-- .zattrs
`-- 0
```
### What did you expect to happen?
No permission error.
### Minimal Complete Verifiable Example
```Python
**I have failed so far to reproduce the error with an MVCE. Here my attempt.**
from pathlib import Path
import dask.array as da
import pandas as pd
import xarray as xr
from dask.distributed import Client
from zarr.sync import ProcessSynchronizer
if __name__ == ""__main__"":
path_store = Path(aaa)
path_synchronizer = Path(bbb) # must exist, and not same location as store
# create and save a datset to zarr
s0, s1, s2 = 10, 10, 10
temperature = da.random.random((s0, s1, s2), chunks=[s0, s1, s2])
precipitation = da.random.random((s0, s1, s2), chunks=[s0, s1, s2])
lon = da.random.random((s0, s1))
lat = da.random.random((s0, s1))
time = pd.date_range(""2014-09-06"", periods=s2)
reference_time = pd.Timestamp(""2014-09-05"")
ds = xr.Dataset(
data_vars=dict(
temperature=([""x"", ""y"", ""time""], temperature),
precipitation=([""x"", ""y"", ""time""], precipitation),
),
coords=dict(
lon=([""x"", ""y""], lon),
lat=([""x"", ""y""], lat),
time=time,
reference_time=reference_time,
),
attrs=dict(description=""Weather related data.""),
)
print(f""{ds=}"")
ds.to_zarr(path_store, mode=""w"")
def read_write(path_store: Path):
""""""lazily opens the dataset, then writes into a region. Comment/uncomment to use synchronizer""""""
synchronizer = ProcessSynchronizer(path_synchronizer)
for b in range(100):
# open the saved dataset
# xr.open_zarr(path_store, synchronizer=synchronizer)
ds = xr.open_zarr(path_store)
# process a region
dst = (
ds.temperature.isel(x=slice(0, 5), y=slice(0, 5), time=slice(0, 5))
.to_dataset()
.load()
)
dst[""temperature""] = -dst[""temperature""]
dst = dst.drop_vars([""time"", ""reference_time""])
# save the region to the zarr store
dst.to_zarr(
path_store,
region={
""x"": slice(0, 5),
""y"": slice(0, 5),
""time"": slice(0, 5),
},
# synchronizer=synchronizer,
)
# independent processes that perform read and write operations
with Client(processes=True) as client:
futures = [client.submit(read_write, path_store) for a in range(1000)]
client.gather(futures)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
- [ ] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
return xr.open_zarr(path, synchronizer=synchronizer)
File ""C:\anaconda3\lib\site-packages\xarray\backends\zarr.py"", line 787, in open_zarr
ds = open_dataset(
File ""C:\anaconda3\lib\site-packages\xarray\backends\api.py"", line 539, in open_dataset
backend_ds = backend.open_dataset(
File ""C:\anaconda3\lib\site-packages\xarray\backends\zarr.py"", line 862, in open_dataset
ds = store_entrypoint.open_dataset(
File ""C:\anaconda3\lib\site-packages\xarray\backends\store.py"", line 43, in open_dataset
ds = Dataset(vars, attrs=attrs)
File ""C:\anaconda3\lib\site-packages\xarray\core\dataset.py"", line 604, in __init__
variables, coord_names, dims, indexes, _ = merge_data_and_coords(
File ""C:\anaconda3\lib\site-packages\xarray\core\merge.py"", line 575, in merge_data_and_coords
return merge_core(
File ""C:\anaconda3\lib\site-packages\xarray\core\merge.py"", line 755, in merge_core
collected = collect_variables_and_indexes(aligned, indexes=indexes)
File ""C:\anaconda3\lib\site-packages\xarray\core\merge.py"", line 365, in collect_variables_and_indexes
variable = as_variable(variable, name=name)
File ""C:\anaconda3\lib\site-packages\xarray\core\variable.py"", line 168, in as_variable
obj = obj.to_index_variable()
File ""C:\anaconda3\lib\site-packages\xarray\core\variable.py"", line 624, in to_index_variable
return IndexVariable(
File ""C:\anaconda3\lib\site-packages\xarray\core\variable.py"", line 2844, in __init__
self._data = PandasIndexingAdapter(self._data)
File ""C:\anaconda3\lib\site-packages\xarray\core\indexing.py"", line 1420, in __init__
self.array = safe_cast_to_index(array)
File ""C:\anaconda3\lib\site-packages\xarray\core\indexes.py"", line 177, in safe_cast_to_index
index = pd.Index(np.asarray(array), **kwargs)
File ""C:\anaconda3\lib\site-packages\xarray\core\indexing.py"", line 524, in __array__
return np.asarray(array[self.key], dtype=None)
File ""C:\anaconda3\lib\site-packages\xarray\backends\zarr.py"", line 68, in __getitem__
return array[key.tuple]
File ""C:\anaconda3\lib\site-packages\zarr\core.py"", line 821, in __getitem__
result = self.get_basic_selection(pure_selection, fields=fields)
File ""C:\anaconda3\lib\site-packages\zarr\core.py"", line 947, in get_basic_selection
return self._get_basic_selection_nd(selection=selection, out=out,
File ""C:\anaconda3\lib\site-packages\zarr\core.py"", line 990, in _get_basic_selection_nd
return self._get_selection(indexer=indexer, out=out, fields=fields)
File ""C:\anaconda3\lib\site-packages\zarr\core.py"", line 1285, in _get_selection
self._chunk_getitem(chunk_coords, chunk_selection, out, out_selection,
File ""C:\anaconda3\lib\site-packages\zarr\core.py"", line 1994, in _chunk_getitem
cdata = self.chunk_store[ckey]
File ""C:\anaconda3\lib\site-packages\zarr\storage.py"", line 1085, in __getitem__
return self._fromfile(filepath)
File ""C:\anaconda3\lib\site-packages\zarr\storage.py"", line 1059, in _fromfile
with open(fn, 'rb') as f:
PermissionError: [Errno 13] Permission denied: 'xxx.zarr\\dim_yyy/0'
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by Anaconda, Inc. | (main, Mar 1 2023, 18:18:15) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 85 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_United States', '1252')
libhdf5: 1.10.6
libnetcdf: None
xarray: 2022.11.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.10.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: 2.14.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.7.0
distributed: None
matplotlib: 3.7.0
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2022.11.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 65.6.3
pip: 23.0.1
conda: 23.1.0
pytest: 7.1.2
IPython: 8.10.0
sphinx: 5.0.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7726/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1584580877,I_kwDOAMm_X85ecskN,7527,DataArray.idxmax converts coordinates into float64 by default,34257249,open,0,,,0,2023-02-14T17:45:07Z,2023-02-14T17:51:33Z,,CONTRIBUTOR,,,,"### What happened?
Same example as in [DataArray.idxmax](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.idxmax.html#xarray.Dataset.idxmax) but instead we look at the ""y"" dimension.
The starting ""y"" coordinates are of type int: `[-1,0,1]`
The return values of argmax are of type int64: good.
The return values of idxmax are of type float64: bad.
### What did you expect to happen?
If no fillna operation must occur, then the return values of idxmax should be the same type as from the input.
Else, the return type might change to a new type depending on the type of the filled value.
### Minimal Complete Verifiable Example
```Python
array = xr.DataArray(
[
[2.0, 1.0, 2.0, 0.0, -2.0],
[-4.0, np.NaN, 2.0, np.NaN, -2.0],
[np.NaN, np.NaN, 1.0, np.NaN, np.NaN],
],
dims=[""y"", ""x""],
coords={""y"": [-1, 0, 1], ""x"": np.arange(5.0) ** 2},
)
print(array.argmax(dim=""y"").dtype)
print(array.idxmax(dim=""y"").dtype)
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
In [41]: print(array.argmax(dim=""y"").dtype)
int64
In [42]: print(array.idxmax(dim=""y"").dtype)
float64
```
### Anything else we need to know?
Suggestions:
- change [these two lines](https://github.com/pydata/xarray/blob/main/xarray/core/computation.py#L2086-L2087):
```
if skipna or (skipna is None and array.dtype.kind in na_dtypes):
# Put the NaN values back in after removing them
```
into
```
if (skipna or (skipna is None and array.dtype.kind in na_dtypes)) and allna.any():
# Put the NaN values back in after removing them, if any
```
- or maybe instead, it is a bug from `DataArray.where`: this [`res = res.where(~allna, fill_value)`](https://github.com/pydata/xarray/blob/main/xarray/core/computation.py#L2088) should not change the array type if `not allna.any()`? Actually, it is a known limitation of `where`: #3570
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_United States', '1252')
libhdf5: 1.10.6
libnetcdf: None
xarray: 0.20.1
pandas: 1.4.4
numpy: 1.24.2
scipy: 1.9.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.7.0
Nio: None
zarr: 2.13.3
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.7.0
distributed: 2022.7.0
matplotlib: 3.5.2
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: None
sparse: None
setuptools: 63.4.1
pip: 23.0
conda: 22.9.0
pytest: 7.1.2
IPython: 7.31.1
sphinx: 5.0.2
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7527/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1452123685,I_kwDOAMm_X85WjaYl,7294,DataArray.transpose with transpose_coords=True does not change coords order,34257249,open,0,,,6,2022-11-16T19:02:27Z,2022-11-24T20:40:32Z,,CONTRIBUTOR,,,,"### What happened?
I used [DataArray.transpose](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.transpose.html) with `transpose_coords=True` to change the coords order from
`startings_dims = ""dim_0"", ""dim_1"", ""dim_2""`
to
`reordered_dims = ""dim_2"", ""dim_1"", ""dim_0""`.
The order of dims was correctly transposed but the order of coords remained unchanged.
### What did you expect to happen?
I expected the transposed coords to be in the new order:
`reordered_dims = ""dim_2"", ""dim_1"", ""dim_0""`
### Minimal Complete Verifiable Example
```Python
import numpy as np
import pandas as pd
import xarray as xr
np.random.seed(0)
temperature = np.random.randn(4, 4, 3)
dim_0_values = [1, 2, 3, 4]
dim_1_values = [5, 6, 7, 8]
dim_2_values = pd.date_range(""2014-09-06"", periods=3)
starting_dims = ""dim_0"", ""dim_1"", ""dim_2""
da = xr.DataArray(
data=temperature,
dims=starting_dims,
coords=dict(
dim_0=dim_0_values,
dim_1=dim_1_values,
dim_2=dim_2_values,
),
attrs=dict(
description=""Ambient temperature."",
units=""degC"",
),
)
print(f""{da.dims=}"")
print(f""{da.coords.keys()=}"")
reordered_dims = ""dim_2"", ""dim_1"", ""dim_0""
print(f""{da.transpose(*reordered_dims).dims=}"")
print(f""{da.transpose(*reordered_dims).coords.keys()=}"")
print(f""{da.transpose(*reordered_dims, transpose_coords=True).coords.keys()=}"")
```
### MVCE confirmation
- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
### Relevant log output
```Python
da.dims=('dim_0', 'dim_1', 'dim_2')
da.coords.keys()=KeysView(Coordinates:
* dim_0 (dim_0) int32 1 2 3 4
* dim_1 (dim_1) int32 5 6 7 8
* dim_2 (dim_2) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08)
da.transpose(*reordered_dims).dims=('dim_2', 'dim_1', 'dim_0')
da.transpose(*reordered_dims).coords.keys()=KeysView(Coordinates:
* dim_0 (dim_0) int32 1 2 3 4
* dim_1 (dim_1) int32 5 6 7 8
* dim_2 (dim_2) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08)
da.transpose(*reordered_dims, transpose_coords=True).coords.keys()=KeysView(Coordinates:
* dim_0 (dim_0) int32 1 2 3 4
* dim_1 (dim_1) int32 5 6 7 8
* dim_2 (dim_2) datetime64[ns] 2014-09-06 2014-09-07 2014-09-08)
```
### Anything else we need to know?
_No response_
### Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.12 (main, Apr 4 2022, 05:22:27) [MSC v.1916 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 85 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_United States', '1252')
libhdf5: 1.10.6
libnetcdf: None
xarray: 2022.6.0
pandas: 1.4.2
numpy: 1.21.5
scipy: 1.9.3
netCDF4: None
pydap: None
h5netcdf: None
h5py: 3.6.0
Nio: None
zarr: 2.13.2
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.4
dask: 2022.02.1
distributed: 2022.2.1
matplotlib: 3.5.1
cartopy: None
seaborn: 0.11.2
numbagg: None
fsspec: 2022.02.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 61.2.0
pip: 22.3.1
conda: 4.12.0
pytest: 7.1.1
IPython: 8.2.0
sphinx: 4.4.0
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7294/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue