issues
15 rows where repo = 13221727 and user = 6130352 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, state_reason, created_at (date), updated_at (date), closed_at (date)
| id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type | 
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 686608969 | MDU6SXNzdWU2ODY2MDg5Njk= | 4380 | Error when rechunking from Zarr store | eric-czech 6130352 | closed | 0 | 5 | 2020-08-26T20:53:05Z | 2023-11-12T05:50:29Z | 2023-11-12T05:50:29Z | NONE | My assumption for this is that it should be possible to: 
 However I see this behavior instead: ```python import xarray as xr import dask.array as da ds = xr.Dataset(dict( x=xr.DataArray(da.random.random(size=100, chunks=10), dims='d1') )) Write the storeds.to_zarr('/tmp/ds1.zarr', mode='w') Read it out, rechunk it, and attempt to write it againxr.open_zarr('/tmp/ds1.zarr').chunk(chunks=dict(d1=20)).to_zarr('/tmp/ds2.zarr', mode='w') ValueError: Final chunk of Zarr array must be the same size or smaller than the first. 
Specified Zarr chunk encoding['chunks']=(10,), for variable named 'x' but (20, 20, 20, 20, 20) 
in the variable's Dask chunks ((20, 20, 20, 20, 20),) is incompatible with this encoding. 
Consider either rechunking using  Full trace---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-122-e185759d81c5> in <module>
----> 1 xr.open_zarr('/tmp/ds1.zarr').chunk(chunks=dict(d1=20)).to_zarr('/tmp/ds2.zarr', mode='w')
/opt/conda/lib/python3.7/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim)
   1656             compute=compute,
   1657             consolidated=consolidated,
-> 1658             append_dim=append_dim,
   1659         )
   1660 
/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute, consolidated, append_dim)
   1351     writer = ArrayWriter()
   1352     # TODO: figure out how to properly handle unlimited_dims
-> 1353     dump_to_store(dataset, zstore, writer, encoding=encoding)
   1354     writes = writer.sync(compute=compute)
   1355 
/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1126         variables, attrs = encoder(variables, attrs)
   1127 
-> 1128     store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
   1129 
   1130 
/opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    411         self.set_dimensions(variables_encoded, unlimited_dims=unlimited_dims)
    412         self.set_variables(
--> 413             variables_encoded, check_encoding_set, writer, unlimited_dims=unlimited_dims
    414         )
    415 
/opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
    466                 # new variable
    467                 encoding = extract_zarr_variable_encoding(
--> 468                     v, raise_on_invalid=check, name=vn
    469                 )
    470                 encoded_attrs = {}
/opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in extract_zarr_variable_encoding(variable, raise_on_invalid, name)
    214 
    215     chunks = _determine_zarr_chunks(
--> 216         encoding.get("chunks"), variable.chunks, variable.ndim, name
    217     )
    218     encoding["chunks"] = chunks
/opt/conda/lib/python3.7/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim, name)
    154             if dchunks[-1] > zchunk:
    155                 raise ValueError(
--> 156                     "Final chunk of Zarr array must be the same size or "
    157                     "smaller than the first. "
    158                     f"Specified Zarr chunk encoding['chunks']={enc_chunks_tuple}, "
ValueError: Final chunk of Zarr array must be the same size or smaller than the first. Specified Zarr chunk encoding['chunks']=(10,), for variable named 'x' but (20, 20, 20, 20, 20) in the variable's Dask chunks ((20, 20, 20, 20, 20),) is incompatible with this encoding. Consider either rechunking using `chunk()` or instead deleting or modifying `encoding['chunks']`.
Overwriting chunks on  
 Does  Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.2 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.2.1 | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4380/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | ||||||
| 598991028 | MDU6SXNzdWU1OTg5OTEwMjg= | 3967 | Support static type analysis | eric-czech 6130352 | closed | 0 | 4 | 2020-04-13T16:34:43Z | 2023-09-17T19:43:32Z | 2023-09-17T19:43:31Z | NONE | As a related discussion to https://github.com/pydata/xarray/issues/3959, I wanted to see what possibilities exist for a user or API developer building on Xarray to enforce Dataset/DataArray structure through static analysis. In my specific scenario, I would like to model several different types of data in my domain as Dataset objects, but I'd like to be able enforce that names and dtypes associated with both data variables and coordinates meet certain constraints. @keewis mentioned an example of this in https://github.com/pydata/xarray/issues/3959#issuecomment-612076605 where it might be possible to use something like a  An example of where this would be useful is in adding extensions through accessors: ```python @xr.register_dataset_accessor('ext') def ExtAccessor: def init(self, ds) self.data = ds ds = xr.Dataset(dict(DATA=xr.DataArray([0.0]))) I'd like to catch that "data" was misspelled as "DATA" and thatthis particular method shouldn't be run against floats prior to runtimeds.ext.is_zero() ``` I probably care more about this as someone looking to build an API on top of Xarray, but I imagine typical users would find a solution to this problem beneficial too. There is a related conversation on doing something like this for Pandas DataFrames at https://github.com/python/typing/issues/28#issuecomment-351284520, so that might be helpful context for possibilities with  | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/3967/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | not_planned | xarray 13221727 | issue | ||||||
| 696047530 | MDU6SXNzdWU2OTYwNDc1MzA= | 4412 | Dataset.encode_cf function | eric-czech 6130352 | open | 0 | 3 | 2020-09-08T17:22:55Z | 2023-05-10T16:06:54Z | NONE | I would like to be able to apply CF encoding to an existing DataArray (or multiple in a Dataset) and then store the encoded forms elsewhere. Is this already possible? More specifically, I would like to encode a large array of 32-bit floats as 8-bit ints and then write them to a Zarr store using rechunker. I'm essentially after this https://github.com/pangeo-data/rechunker/issues/45 (Xarray support in rechunker), but I'm looking for what functionality exists in Xarray to make it possible in the meantime. | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4412/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | issue | ||||||||
| 759709924 | MDU6SXNzdWU3NTk3MDk5MjQ= | 4663 | Fancy indexing a Dataset with dask DataArray triggers multiple computes | eric-czech 6130352 | closed | 0 | 8 | 2020-12-08T19:17:08Z | 2023-03-15T02:48:01Z | 2023-03-15T02:48:01Z | NONE | It appears that boolean arrays (or any slicing array presumably) are evaluated many more times than necessary when applied to multiple variables in a Dataset. Is this intentional? Here is an example that demonstrates this: ```python Use a custom array type to know when data is being evaluatedclass Array(): Control case -- this shows that the print statement is only reached onceda.from_array(Array(np.random.rand(100))).compute(); EvaluatingThis usage somehow results in two evaluations of this one array?ds = xr.Dataset(dict( a=('x', da.from_array(Array(np.random.rand(100)))) )) ds.sel(x=ds.a) EvaluatingEvaluating<xarray.Dataset>Dimensions: (x: 51)Dimensions without coordinates: xData variables:a (x) bool dask.array<chunksize=(51,), meta=np.ndarray>The array is evaluated an extra time for each new variableds = xr.Dataset(dict( a=('x', da.from_array(Array(np.random.rand(100)))), b=(('x', 'y'), da.random.random((100, 10))), c=(('x', 'y'), da.random.random((100, 10))), d=(('x', 'y'), da.random.random((100, 10))), )) ds.sel(x=ds.a) EvaluatingEvaluatingEvaluatingEvaluatingEvaluating<xarray.Dataset>Dimensions: (x: 48, y: 10)Dimensions without coordinates: x, yData variables:a (x) bool dask.array<chunksize=(48,), meta=np.ndarray>b (x, y) float64 dask.array<chunksize=(48, 10), meta=np.ndarray>c (x, y) float64 dask.array<chunksize=(48, 10), meta=np.ndarray>d (x, y) float64 dask.array<chunksize=(48, 10), meta=np.ndarray>``` Given that slicing is already not lazy, why does the same predicate array need to be computed more than once? @tomwhite originally pointed this out in https://github.com/pystatgen/sgkit/issues/299. | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4663/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | ||||||
| 692238160 | MDU6SXNzdWU2OTIyMzgxNjA= | 4405 | open_zarr: concat_characters has no effect when dtype=U1 | eric-czech 6130352 | open | 0 | 8 | 2020-09-03T19:22:52Z | 2022-04-27T23:48:29Z | NONE | What happened: It appears that either  ```python import xarray as xr import numpy as np xr.set_options(display_style='text') chrs = np.array([ ['A', 'B'], ['C', 'D'], ['E', 'F'], ], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1'), chrs))) ds.x <xarray.DataArray 'x' (dim0: 3, dim1: 2)> array([[b'A', b'B'], [b'C', b'D'], [b'E', b'F']], dtype='|S1') Dimensions without coordinates: dim0, dim1 ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute() The second dimension is lost and the values end up being concatenated<xarray.DataArray 'x' (dim0: 3)> array([b'AB', b'CD', b'EF'], dtype='|S2') Dimensions without coordinates: dim0 ``` For N columns in a 2D array, you end up with an "|SN" 1D array. When using say "S2" or any fixed-length greater than 1, it doesn't happen. Interestingly though, it only affects the trailing dimension. I.e. if you use 3 dimensions, you get a 2D result with the 3rd dimension dropped: ```python chrs = np.array([[ ['A', 'B'], ['C', 'D'], ['E', 'F'], ]], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1', 'dim2'), chrs))) ds <xarray.Dataset> Dimensions: (dim0: 1, dim1: 3, dim2: 2) Dimensions without coordinates: dim0, dim1, dim2 Data variables: x (dim0, dim1, dim2) |S1 b'A' b'B' b'C' b'D' b'E' b'F' ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute() 
 | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4405/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | issue | ||||||||
| 707571360 | MDU6SXNzdWU3MDc1NzEzNjA= | 4452 | Change default for concat_characters to False in open_* functions | eric-czech 6130352 | open | 0 | 2 | 2020-09-23T18:06:07Z | 2022-04-09T03:21:43Z | NONE | I wanted to propose that concat_characters be False for  I also find it to be confusing behavior (e.g. https://github.com/pydata/xarray/issues/4405) since no other arrays are automatically transformed like this when deserialized. If submit a PR for this, would anybody object? | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4452/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | issue | ||||||||
| 770006670 | MDU6SXNzdWU3NzAwMDY2NzA= | 4704 | Retries for rare failures | eric-czech 6130352 | open | 0 | 2 | 2020-12-17T13:06:51Z | 2022-04-09T02:30:16Z | NONE | I recently ran into several issues with gcsfs (https://github.com/dask/gcsfs/issues/316, https://github.com/dask/gcsfs/issues/315, and https://github.com/dask/gcsfs/issues/318) where errors are occasionally thrown, but only in large worfklows where enough http calls are made for them to become probable. @martindurant suggested forcing dask to retry tasks that may fail like this with  Example Traceback``` Traceback (most recent call last): File "scripts/convert_phesant_data.py", line 100, in <module> fire.Fire() File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "scripts/convert_phesant_data.py", line 96, in sort_zarr ds.to_zarr(fsspec.get_mapper(output_path), consolidated=True, mode="w") File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr return to_zarr( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1368, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1128, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 417, in store self.set_variables( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 489, in set_variables writer.add(v.data, zarr_array, region=region) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/common.py", line 145, in add target[...] = source File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__ self.set_basic_selection(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync self.chunk_store[ckey] = cdata File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/mapping.py", line 151, in __setitem__ self.fs.pipe_file(key, maybe_convert(value)) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper return maybe_sync(func, self, *args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync return sync(loop, func, *args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync raise exc.with_traceback(tb) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f result[0] = await future File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file return await simple_upload( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload j = await fs._call( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call raise e File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call self.validate_response(status, contents, json, path, headers) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1228, in validate_response raise HttpError(error) gcsfs.utils.HttpError: Required ```Has there already been a discussion about how to address rare errors like this? Arguably, I could file the same issue with Zarr but it seemed more productive to start here at a higher level of abstraction. To be clear, the code for the example failure above typically succeeds and reproducing this failure is difficult. I have only seen it a couple times now like this, where the calling code does not include dask, but it did make me want to know if there were any plans to tolerate rare failures in Xarray as Dask does. | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4704/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | issue | ||||||||
| 884209406 | MDU6SXNzdWU4ODQyMDk0MDY= | 5286 | Zarr chunks would overlap multiple dask chunks error | eric-czech 6130352 | closed | 0 | 3 | 2021-05-10T13:20:46Z | 2021-05-12T16:16:05Z | 2021-05-12T16:16:05Z | NONE | Would it be possible to get an explanation on how this situation results in a zarr chunk overlapping multiple dask chunks? This code below is generating an array with 2 chunks, selecting one row from each chunk, and then writing that resulting two row array back to zarr. I don't see how it's possible in this case for one zarr chunk to correspond to different dask chunks. There are clearly two resulting dask chunks, two input zarr chunks, and a correspondence between them that should be 1 to 1 ... what does this error message really mean then? ```python import xarray as xr import dask.array as da ds = xr.Dataset(dict( x=(('a', 'b'), da.ones(shape=(10, 10), chunks=(5, 10))), )).assign(a=list(range(10))) ds <xarray.Dataset>Dimensions: (a: 10, b: 10)Coordinates:* a (a) int64 0 1 2 3 4 5 6 7 8 9Dimensions without coordinates: bData variables:x (a, b) float64 dask.array<chunksize=(5, 10), meta=np.ndarray>Write the dataset out!rm -rf /tmp/test.zarr ds.to_zarr('/tmp/test.zarr') Read it back in, subset to 1 record in two different chunks (two rows total), write back out!rm -rf /tmp/test2.zarr xr.open_zarr('/tmp/test.zarr').sel(a=[0, 11]).to_zarr('/tmp/test2.zarr') NotImplementedError: Specified zarr chunks encoding['chunks']=(5, 10) for variable named 'x' would overlap multiple dask chunks ((1, 1), (10,)). Writing this array in parallel with dask could lead to corrupted data. Consider either rechunking using  | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/5286/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | ||||||
| 876394165 | MDU6SXNzdWU4NzYzOTQxNjU= | 5261 | Export ufuncs from DataArray API | eric-czech 6130352 | open | 0 | 3 | 2021-05-05T12:24:03Z | 2021-05-07T13:53:08Z | NONE | Have there been discussions on promoting other ufuncs out of  I can see how those two would be an exception given the pandas semantics for them, as opposed to numpy, but I am curious how to recommend best practices for our users as we build a library for genetics on Xarray. We prefer to avoid anything in our documentation or examples outside of the Xarray API to make things simple for our users, who would likely be easily confused/frustrated by the intricacies of numpy, dask, and xarray API interactions (as we were too not long ago).  To that end, we have a number of methods that produce  I would prefer  | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/5261/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | issue | ||||||||
| 869792877 | MDU6SXNzdWU4Njk3OTI4Nzc= | 5229 | Index level naming bug with `concat` | eric-czech 6130352 | closed | 0 | 2 | 2021-04-28T10:29:34Z | 2021-04-28T19:38:26Z | 2021-04-28T19:38:26Z | NONE | There is an inconsistency with how indexes are generated in a concat operation: ```python def transform(df): return ( df.to_xarray() .set_index(index=['id1', 'id2']) .pipe(lambda ds: xr.concat([ ds.isel(index=ds.year == v) for v in ds.year.to_series().unique() ], dim='dates')) ) df1 = pd.DataFrame(dict( id1=[1,2,1,2], id2=[1,2,1,2], data=[1,2,3,4], year=[2019, 2019, 2020, 2020] )) transform(df1) <xarray.Dataset> Dimensions: (dates: 2, index: 2) Coordinates: * index (index) MultiIndex - id1 (index) int64 1 2 - id2 (index) int64 1 2 Dimensions without coordinates: dates Data variables: data (dates, index) int64 1 2 3 4 year (dates, index) int64 2019 2019 2020 2020 df2 = pd.DataFrame(dict( id1=[1,2,1,2], id2=[1,2,1,3], # These don't quite align now data=[1,2,3,4], year=[2019, 2019, 2020, 2020] )) transform(df2) <xarray.Dataset> Dimensions: (dates: 2, index: 3) Coordinates: * index (index) MultiIndex - index_level_0 (index) int64 1 2 2 # These names are now different from id1, id2 - index_level_1 (index) int64 1 2 3 Dimensions without coordinates: dates Data variables: data (dates, index) float64 1.0 2.0 nan 3.0 nan 4.0 year (dates, index) float64 2.019e+03 2.019e+03 ... nan 2.02e+03 ``` It only appears to happen when values in a multiindex for the datasets being concatenated differ. Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 4.19.0-16-cloud-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.17.0 pandas: 1.1.1 numpy: 1.20.2 scipy: 1.6.2 netCDF4: 1.5.6 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.4.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.20.0 matplotlib: 3.3.3 cartopy: None seaborn: 0.11.1 numbagg: None pint: None setuptools: 49.6.0.post20210108 pip: 21.0.1 conda: None pytest: 6.2.3 IPython: 7.22.0 sphinx: None | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/5229/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | ||||||
| 688501399 | MDU6SXNzdWU2ODg1MDEzOTk= | 4386 | Zarr store array dtype incorrect | eric-czech 6130352 | open | 0 | 2 | 2020-08-29T09:54:19Z | 2021-04-20T01:23:45Z | NONE | Writing a boolean array to a zarr store once works, but not twice. The dtype switches to int8 after the second write: ```python import xarray as xr import numpy as np ds = xr.Dataset(dict( x=xr.DataArray(np.random.rand(100) > .5, dims='d1') )) ds.to_zarr('/tmp/ds1.zarr', mode='w') xr.open_zarr('/tmp/ds1.zarr').x.dtype.str # |b1 xr.open_zarr('/tmp/ds1.zarr').to_zarr('/tmp/ds2.zarr', mode='w') xr.open_zarr('/tmp/ds2.zarr').x.dtype.str # |i1 ``` Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.2 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.2.1 | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4386/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | xarray 13221727 | issue | ||||||||
| 727623263 | MDU6SXNzdWU3Mjc2MjMyNjM= | 4529 | Dataset constructor with DataArray triggers computation | eric-czech 6130352 | closed | 0 | 5 | 2020-10-22T18:27:24Z | 2021-02-19T23:13:57Z | 2021-02-19T23:13:57Z | NONE | Is it intentional that creating a Dataset with a DataArray and dimension names for a single variable causes computation of that variable?  In other words, why does  A longer example: ```python import dask.array as da import xarray as xr x = da.random.randint(1, 10, size=(100, 25)) ds = xr.Dataset(dict(a=xr.DataArray(x, dims=('x', 'y')))) type(ds.a.data) dask.array.core.Array Recreate the dataset with the same array, but also redefine the dimensionsds2 = xr.Dataset(dict(a=(('x', 'y'), ds.a)) type(ds2.a.data) numpy.ndarray ``` | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4529/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | ||||||
| 660112216 | MDU6SXNzdWU2NjAxMTIyMTY= | 4238 | Missing return type annotations | eric-czech 6130352 | closed | 0 | 1 | 2020-07-18T12:09:06Z | 2020-08-19T20:32:37Z | 2020-08-19T20:32:37Z | NONE | Dataset.to_dataframe should have a return type hint like DataArray.to_dataframe. Similarly, can concat have a  | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/4238/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | ||||||
| 597475005 | MDU6SXNzdWU1OTc0NzUwMDU= | 3959 | Extending Xarray for domain-specific toolkits | eric-czech 6130352 | closed | 0 | 10 | 2020-04-09T18:34:34Z | 2020-04-13T16:36:33Z | 2020-04-13T16:36:32Z | NONE | Hi, I have a question about how to design an API over Xarray for a domain-specific use case (in genetics). Having seen the following now: 
 I wanted to reach out and seek some advice on what I'd like to do given that I don't think any of the solutions there are what I'm looking for. More specifically, I would like to model the datasets we work with as xr.Dataset subtypes but I'd like to enforce certain preconditions for those types as well as support conversions between them.  An example would be that I may have a domain-specific type  One API I envision around these models consists of functions that enforce nominal typing on Xarray classes, so in that case I don't actually care if my subtypes are preserved by Xarray when operations are run. It would be nice if that subtyping wasn't lost but I can understand that it's a limitation for now. Here's an example of what I mean: ```python from genetics import api arr1 = ??? # some 3D integer DataArray of allele indices arr2 = ??? # A missing data boolean DataArray arr3 = ??? # Some other domain-specific stuff like variant phasing ds = api.GenotypeDataset(arr1, arr2, arr3) A function that would be in the API would look like:def analyze_haplotype(ds: xr.Dataset) -> xr.Dataset: # Do stuff assuming that the user has supplied a dataset compliant with # the "HaplotypeDataset" constraints pass analyze_haplotype(ds.to_haplotype_dataset()) ``` I like the idea of trying to avoid requiring API-specific data structures for all functionality in favor of conventions over Xarray data structures. I think conveniences like these subtypes would be great for enforcing those conventions (rather than checking at the beginning of each function) as well as making it easier to go between representations, but I'm certainly open to suggestion. I think something akin to structural subtyping that extends to what arrays are contained in the Dataset, how coordinates are named, what datatypes are used, etc. would be great but I have no idea if that's possible. All that said, is it still a bad idea to try to subclass Xarray data structures even if the intent was never to touch any part of the internal APIs?  I noticed Xarray does some stuff like  cc: @alimanfoo - Alistair raised some concerns about trying this to me, so he may have some thoughts here too | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/3959/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | ||||||
| 569176457 | MDU6SXNzdWU1NjkxNzY0NTc= | 3791 | Self joins with non-unique indexes | eric-czech 6130352 | closed | 0 | 5 | 2020-02-21T20:47:35Z | 2020-03-26T17:51:35Z | 2020-03-05T19:32:38Z | NONE | Hi, is there a good way to self join arrays? For example, given a dataset like this: 
 I am not looking for the pandas  
 but rather the  
 I tried using  Output of  | {
    "url": "https://api.github.com/repos/pydata/xarray/issues/3791/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
} | completed | xarray 13221727 | issue | 
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);


