id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 696047530,MDU6SXNzdWU2OTYwNDc1MzA=,4412,Dataset.encode_cf function,6130352,open,0,,,3,2020-09-08T17:22:55Z,2023-05-10T16:06:54Z,,NONE,,,,"I would like to be able to apply CF encoding to an existing DataArray (or multiple in a Dataset) and then store the encoded forms elsewhere. Is this already possible? More specifically, I would like to encode a large array of 32-bit floats as 8-bit ints and then write them to a Zarr store using [rechunker](https://rechunker.readthedocs.io/en/latest/index.html). I'm essentially after this https://github.com/pangeo-data/rechunker/issues/45 (Xarray support in rechunker), but I'm looking for what functionality exists in Xarray to make it possible in the meantime. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4412/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 692238160,MDU6SXNzdWU2OTIyMzgxNjA=,4405,open_zarr: concat_characters has no effect when dtype=U1,6130352,open,0,,,8,2020-09-03T19:22:52Z,2022-04-27T23:48:29Z,,NONE,,,,"**What happened**: It appears that either `to_zarr` or `from_zarr` is incorrectly concatenating the trailing dimension of single byte/character arrays and dropping the last dimension: ```python import xarray as xr import numpy as np xr.set_options(display_style='text') chrs = np.array([ ['A', 'B'], ['C', 'D'], ['E', 'F'], ], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1'), chrs))) ds.x array([[b'A', b'B'], [b'C', b'D'], [b'E', b'F']], dtype='|S1') Dimensions without coordinates: dim0, dim1 ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute() # The second dimension is lost and the values end up being concatenated array([b'AB', b'CD', b'EF'], dtype='|S2') Dimensions without coordinates: dim0 ``` For N columns in a 2D array, you end up with an ""|SN"" 1D array. When using say ""S2"" or any fixed-length greater than 1, it doesn't happen. Interestingly though, it only affects the trailing dimension. I.e. if you use 3 dimensions, you get a 2D result with the 3rd dimension dropped: ```python chrs = np.array([[ ['A', 'B'], ['C', 'D'], ['E', 'F'], ]], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1', 'dim2'), chrs))) ds Dimensions: (dim0: 1, dim1: 3, dim2: 2) Dimensions without coordinates: dim0, dim1, dim2 Data variables: x (dim0, dim1, dim2) |S1 b'A' b'B' b'C' b'D' b'E' b'F' ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute() # `dim2` is gone and the data concatenated to `dim1` array([[b'AB', b'CD', b'EF']], dtype='|S2') Dimensions without coordinates: dim0, dim1 ``` In short, this only affects the ""S1"" data type. ""U1"" is fine as is ""SN"" where N > 1. **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.2 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4405/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 707571360,MDU6SXNzdWU3MDc1NzEzNjA=,4452,Change default for concat_characters to False in open_* functions,6130352,open,0,,,2,2020-09-23T18:06:07Z,2022-04-09T03:21:43Z,,NONE,,,,"I wanted to propose that concat_characters be False for `open_{dataset,zarr,dataarray}`. I'm not sure how often that affects anyone since working with individual character arrays is probably rare, but it's a particularly bad default in genetics. We often represent individual variations as single characters and the concatenation is destructive because we can't invert it when one of the characters is an empty string (which often corresponds to a deletion at a base pair location, and the order of the characters matters). I also find it to be confusing behavior (e.g. https://github.com/pydata/xarray/issues/4405) since no other arrays are automatically transformed like this when deserialized. If submit a PR for this, would anybody object? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4452/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 770006670,MDU6SXNzdWU3NzAwMDY2NzA=,4704,Retries for rare failures,6130352,open,0,,,2,2020-12-17T13:06:51Z,2022-04-09T02:30:16Z,,NONE,,,,"I recently ran into several issues with gcsfs (https://github.com/dask/gcsfs/issues/316, https://github.com/dask/gcsfs/issues/315, and https://github.com/dask/gcsfs/issues/318) where errors are occasionally thrown, but only in large worfklows where enough http calls are made for them to become probable. @martindurant suggested forcing dask to retry tasks that may fail like this with `.compute(... retries=N)` in https://github.com/dask/gcsfs/issues/316, which has worked well. However, I also see this in Xarray/Zarr code interacting with gcsfs directly:
Example Traceback ``` Traceback (most recent call last): File ""scripts/convert_phesant_data.py"", line 100, in fire.Fire() File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py"", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py"", line 463, in _Fire component, remaining_args = _CallAndUpdateTrace( File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py"", line 672, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File ""scripts/convert_phesant_data.py"", line 96, in sort_zarr ds.to_zarr(fsspec.get_mapper(output_path), consolidated=True, mode=""w"") File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/core/dataset.py"", line 1652, in to_zarr return to_zarr( File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py"", line 1368, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py"", line 1128, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py"", line 417, in store self.set_variables( File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py"", line 489, in set_variables writer.add(v.data, zarr_array, region=region) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/common.py"", line 145, in add target[...] = source File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py"", line 1115, in __setitem__ self.set_basic_selection(selection, value, fields=fields) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py"", line 1210, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py"", line 1501, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py"", line 1550, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py"", line 1664, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py"", line 1729, in _chunk_setitem_nosync self.chunk_store[ckey] = cdata File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/mapping.py"", line 151, in __setitem__ self.fs.pipe_file(key, maybe_convert(value)) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py"", line 121, in wrapper return maybe_sync(func, self, *args, **kwargs) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py"", line 100, in maybe_sync return sync(loop, func, *args, **kwargs) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py"", line 71, in sync raise exc.with_traceback(tb) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py"", line 55, in f result[0] = await future File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py"", line 1007, in _pipe_file return await simple_upload( File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py"", line 1523, in simple_upload j = await fs._call( File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py"", line 525, in _call raise e File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py"", line 507, in _call self.validate_response(status, contents, json, path, headers) File ""/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py"", line 1228, in validate_response raise HttpError(error) gcsfs.utils.HttpError: Required ```
Has there already been a discussion about how to address rare errors like this? Arguably, I could file the same issue with Zarr but it seemed more productive to start here at a higher level of abstraction. To be clear, the code for the example failure above typically succeeds and reproducing this failure is difficult. I have only seen it a couple times now like this, where the calling code does not include dask, but it did make me want to know if there were any plans to tolerate rare failures in Xarray as Dask does. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4704/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 876394165,MDU6SXNzdWU4NzYzOTQxNjU=,5261,Export ufuncs from DataArray API,6130352,open,0,,,3,2021-05-05T12:24:03Z,2021-05-07T13:53:08Z,,NONE,,,,"Have there been discussions on promoting other ufuncs out of `xr.ufuncs` and into the `DataArray` API like `DataArray.isnull` or `DataArray.notnull`? I can see how those two would be an exception given the pandas semantics for them, as opposed to numpy, but I am curious how to recommend best practices for our users as we build a library for genetics on Xarray. We prefer to avoid anything in our documentation or examples outside of the Xarray API to make things simple for our users, who would likely be easily confused/frustrated by the intricacies of numpy, dask, and xarray API interactions (as we were too not long ago). To that end, we have a number of methods that produce `NaN` and infinite values, but recommending use of either of these to identify those values via `ds.my_variable.pipe(xr.ufuncs.isfinite)` or `np.isfinite(ds.my_variable)` is not ideal. I would prefer `ds.my_variable.isfinite()` or maybe even `ds.my_variable.ufuncs.isfinite()`. Is there a sane way to export all of `xr.ufuncs` from `DataArray`? ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5261/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 688501399,MDU6SXNzdWU2ODg1MDEzOTk=,4386,Zarr store array dtype incorrect,6130352,open,0,,,2,2020-08-29T09:54:19Z,2021-04-20T01:23:45Z,,NONE,,,,"Writing a boolean array to a zarr store once works, but not twice. The dtype switches to int8 after the second write: ```python import xarray as xr import numpy as np ds = xr.Dataset(dict( x=xr.DataArray(np.random.rand(100) > .5, dims='d1') )) ds.to_zarr('/tmp/ds1.zarr', mode='w') xr.open_zarr('/tmp/ds1.zarr').x.dtype.str # |b1 xr.open_zarr('/tmp/ds1.zarr').to_zarr('/tmp/ds2.zarr', mode='w') xr.open_zarr('/tmp/ds2.zarr').x.dtype.str # |i1 ``` **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.2 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.2.1
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4386/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue