id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1415430795,I_kwDOAMm_X85UXcKL,7188,efficiently set values in a xarray using dask,10563614,closed,0,,,1,2022-10-19T18:44:44Z,2023-11-06T06:07:08Z,2023-11-06T06:07:08Z,CONTRIBUTOR,,,,"### What is your issue? I have a quite dataset (data) with three coords band=21, y = 5000, x=5000, and I want to set the value for a few bands in some points (x, y) given by a boolean dataset. The chunk size is band=1, y=16, x = 5000. My memory is 4Gb per worker and I've 4 workers, 1 thread per worker. The most compact form I found is this one: band = dict(band=[17, 18, 19, 20]) data['somevar'].loc[band] = data['somevar'].loc[band].where(~points, some_complex_calculation) points and some_complex_calculation are DataArray's with the same shape as data (in fact points is only a DataArray of x,y), they typically have a HighLevelGraph with 106 layers and 142610 keys from all layers. These datasets depend on data. data also has a HighLevelGraph with hundred layers. I can not use ""compute()"", this blow up the memory, I want directly to use data.to_zarr to exploit the chunks. Unfortunately, this calculation blocks the workers, which end up to be killed. I tried many forms, and I found this one: for b in [17, 18, 19, 20]: data['somevar'] = data['somevar'].where(~((snow.band == b) & ipoints), some_complex_calculation) it works! but its is very inefficient and I found it difficult to read. It seems that my objective is quite simple, set a few values in a large dataset at a given dimension, and this dimension is outer and has chunksize=1. It seems very easy from a C / Fortran perspective. Do you have any suggestion how to peform such operations ? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7188/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,not_planned,13221727,issue 657466413,MDU6SXNzdWU2NTc0NjY0MTM=,4228,to_dataframe: no valid index for a 0-dimensional object,10563614,closed,0,,,5,2020-07-15T15:58:43Z,2020-10-26T08:42:35Z,2020-10-26T08:42:35Z,CONTRIBUTOR,,,,"**What happened**: `xr.DataArray([1], coords=[('onecoord', [2])]).sel(onecoord=2).to_dataframe(name='name')` raise an exception `ValueError: no valid index for a 0-dimensional object` **What you expected to happen**: the same behavior as: `xr.DataArray([1], coords=[('onecoord', [2])]).to_dataframe(name='name')` **Anything else we need to know?**: I see that the array after the selection has no ""dims"" anymore, and this is what cause the error. but it still has one ""coords"", this is confusing. Is there any documentation about this difference ? **Environment**:
INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.19.0-9-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4 xarray: 0.15.1 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.18.1 distributed: 2.18.0 matplotlib: 3.2.1 cartopy: None seaborn: 0.10.1 numbagg: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.3 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.1.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4228/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue