github: issues: 6 rows where state = "open" and user = 6130352 sorted by updated

6 rows where state = "open" and user = 6130352 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	author_association	body	reactions	repo	type
696047530	MDU6SXNzdWU2OTYwNDc1MzA=	4412	Dataset.encode_cf function	eric-czech 6130352	open	3	2020-09-08T17:22:55Z	2023-05-10T16:06:54Z	NONE	I would like to be able to apply CF encoding to an existing DataArray (or multiple in a Dataset) and then store the encoded forms elsewhere. Is this already possible? More specifically, I would like to encode a large array of 32-bit floats as 8-bit ints and then write them to a Zarr store using rechunker. I'm essentially after this https://github.com/pangeo-data/rechunker/issues/45 (Xarray support in rechunker), but I'm looking for what functionality exists in Xarray to make it possible in the meantime.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4412/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
692238160	MDU6SXNzdWU2OTIyMzgxNjA=	4405	open_zarr: concat_characters has no effect when dtype=U1	eric-czech 6130352	open	8	2020-09-03T19:22:52Z	2022-04-27T23:48:29Z	NONE	What happened: It appears that either `to_zarr` or `from_zarr` is incorrectly concatenating the trailing dimension of single byte/character arrays and dropping the last dimension: ```python import xarray as xr import numpy as np xr.set_options(display_style='text') chrs = np.array([ ['A', 'B'], ['C', 'D'], ['E', 'F'], ], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1'), chrs))) ds.x <xarray.DataArray 'x' (dim0: 3, dim1: 2)> array([[b'A', b'B'], [b'C', b'D'], [b'E', b'F']], dtype='\|S1') Dimensions without coordinates: dim0, dim1 ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute() The second dimension is lost and the values end up being concatenated <xarray.DataArray 'x' (dim0: 3)> array([b'AB', b'CD', b'EF'], dtype='\|S2') Dimensions without coordinates: dim0 ``` For N columns in a 2D array, you end up with an "\|SN" 1D array. When using say "S2" or any fixed-length greater than 1, it doesn't happen. Interestingly though, it only affects the trailing dimension. I.e. if you use 3 dimensions, you get a 2D result with the 3rd dimension dropped: ```python chrs = np.array([[ ['A', 'B'], ['C', 'D'], ['E', 'F'], ]], dtype='S1') ds = xr.Dataset(dict(x=(('dim0', 'dim1', 'dim2'), chrs))) ds <xarray.Dataset> Dimensions: (dim0: 1, dim1: 3, dim2: 2) Dimensions without coordinates: dim0, dim1, dim2 Data variables: x (dim0, dim1, dim2) \|S1 b'A' b'B' b'C' b'D' b'E' b'F' ds.to_zarr('/tmp/test.zarr', mode='w') xr.open_zarr('/tmp/test.zarr').x.compute() `dim2` is gone and the data concatenated to `dim1` <xarray.DataArray 'x' (dim0: 1, dim1: 3)> array([[b'AB', b'CD', b'EF']], dtype='\|S2') Dimensions without coordinates: dim0, dim1 ``` In short, this only affects the "S1" data type. "U1" is fine as is "SN" where N > 1. Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 \| packaged by conda-forge \| (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.2 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.2.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4405/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
707571360	MDU6SXNzdWU3MDc1NzEzNjA=	4452	Change default for concat_characters to False in open_* functions	eric-czech 6130352	open	2	2020-09-23T18:06:07Z	2022-04-09T03:21:43Z	NONE	I wanted to propose that concat_characters be False for `open_{dataset,zarr,dataarray}`. I'm not sure how often that affects anyone since working with individual character arrays is probably rare, but it's a particularly bad default in genetics. We often represent individual variations as single characters and the concatenation is destructive because we can't invert it when one of the characters is an empty string (which often corresponds to a deletion at a base pair location, and the order of the characters matters). I also find it to be confusing behavior (e.g. https://github.com/pydata/xarray/issues/4405) since no other arrays are automatically transformed like this when deserialized. If submit a PR for this, would anybody object?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4452/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
770006670	MDU6SXNzdWU3NzAwMDY2NzA=	4704	Retries for rare failures	eric-czech 6130352	open	2	2020-12-17T13:06:51Z	2022-04-09T02:30:16Z	NONE	I recently ran into several issues with gcsfs (https://github.com/dask/gcsfs/issues/316, https://github.com/dask/gcsfs/issues/315, and https://github.com/dask/gcsfs/issues/318) where errors are occasionally thrown, but only in large worfklows where enough http calls are made for them to become probable. @martindurant suggested forcing dask to retry tasks that may fail like this with `.compute(... retries=N)` in https://github.com/dask/gcsfs/issues/316, which has worked well. However, I also see this in Xarray/Zarr code interacting with gcsfs directly: Example Traceback ``` Traceback (most recent call last): File "scripts/convert_phesant_data.py", line 100, in <module> fire.Fire() File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 138, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 463, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fire/core.py", line 672, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "scripts/convert_phesant_data.py", line 96, in sort_zarr ds.to_zarr(fsspec.get_mapper(output_path), consolidated=True, mode="w") File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/core/dataset.py", line 1652, in to_zarr return to_zarr( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1368, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/api.py", line 1128, in dump_to_store store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 417, in store self.set_variables( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/zarr.py", line 489, in set_variables writer.add(v.data, zarr_array, region=region) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/xarray/backends/common.py", line 145, in add target[...] = source File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1115, in __setitem__ self.set_basic_selection(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1210, in set_basic_selection return self._set_basic_selection_nd(selection, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1501, in _set_basic_selection_nd self._set_selection(indexer, value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1550, in _set_selection self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1664, in _chunk_setitem self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/zarr/core.py", line 1729, in _chunk_setitem_nosync self.chunk_store[ckey] = cdata File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/mapping.py", line 151, in __setitem__ self.fs.pipe_file(key, maybe_convert(value)) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 121, in wrapper return maybe_sync(func, self, args, *kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 100, in maybe_sync return sync(loop, func, args, **kwargs) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 71, in sync raise exc.with_traceback(tb) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/fsspec/asyn.py", line 55, in f result[0] = await future File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1007, in _pipe_file return await simple_upload( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1523, in simple_upload j = await fs._call( File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 525, in _call raise e File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 507, in _call self.validate_response(status, contents, json, path, headers) File "/home/eczech/repos/ukb-gwas-pipeline-nealelab/.snakemake/conda/90e5c2a1/lib/python3.8/site-packages/gcsfs/core.py", line 1228, in validate_response raise HttpError(error) gcsfs.utils.HttpError: Required ``` Has there already been a discussion about how to address rare errors like this? Arguably, I could file the same issue with Zarr but it seemed more productive to start here at a higher level of abstraction. To be clear, the code for the example failure above typically succeeds and reproducing this failure is difficult. I have only seen it a couple times now like this, where the calling code does not include dask, but it did make me want to know if there were any plans to tolerate rare failures in Xarray as Dask does.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4704/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
876394165	MDU6SXNzdWU4NzYzOTQxNjU=	5261	Export ufuncs from DataArray API	eric-czech 6130352	open	3	2021-05-05T12:24:03Z	2021-05-07T13:53:08Z	NONE	Have there been discussions on promoting other ufuncs out of `xr.ufuncs` and into the `DataArray` API like `DataArray.isnull` or `DataArray.notnull`? I can see how those two would be an exception given the pandas semantics for them, as opposed to numpy, but I am curious how to recommend best practices for our users as we build a library for genetics on Xarray. We prefer to avoid anything in our documentation or examples outside of the Xarray API to make things simple for our users, who would likely be easily confused/frustrated by the intricacies of numpy, dask, and xarray API interactions (as we were too not long ago). To that end, we have a number of methods that produce `NaN` and infinite values, but recommending use of either of these to identify those values via `ds.my_variable.pipe(xr.ufuncs.isfinite)` or `np.isfinite(ds.my_variable)` is not ideal. I would prefer `ds.my_variable.isfinite()` or maybe even `ds.my_variable.ufuncs.isfinite()`. Is there a sane way to export all of `xr.ufuncs` from `DataArray`?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5261/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
688501399	MDU6SXNzdWU2ODg1MDEzOTk=	4386	Zarr store array dtype incorrect	eric-czech 6130352	open	2	2020-08-29T09:54:19Z	2021-04-20T01:23:45Z	NONE	Writing a boolean array to a zarr store once works, but not twice. The dtype switches to int8 after the second write: ```python import xarray as xr import numpy as np ds = xr.Dataset(dict( x=xr.DataArray(np.random.rand(100) > .5, dims='d1') )) ds.to_zarr('/tmp/ds1.zarr', mode='w') xr.open_zarr('/tmp/ds1.zarr').x.dtype.str # \|b1 xr.open_zarr('/tmp/ds1.zarr').to_zarr('/tmp/ds2.zarr', mode='w') xr.open_zarr('/tmp/ds2.zarr').x.dtype.str # \|i1 ``` Environment: Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 \| packaged by conda-forge \| (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.4.0-42-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: None xarray: 0.16.0 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.21.0 distributed: 2.21.0 matplotlib: 3.3.0 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: 4.8.2 pytest: 5.4.3 IPython: 7.15.0 sphinx: 3.2.1	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4386/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

6 rows where state = "open" and user = 6130352 sorted by updated_at descending

The second dimension is lost and the values end up being concatenated

dim2 is gone and the data concatenated to dim1

Advanced export

`dim2` is gone and the data concatenated to `dim1`