github: issues: 8 rows where state = "closed", type = "issue" and user = 1386642 sorted by updated

8 rows where state = "closed", type = "issue" and user = 1386642 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
856172272	MDU6SXNzdWU4NTYxNzIyNzI=	5144	Add chunks argument to {zeros/ones/empty}_like.	nbren12 1386642	closed	5	2021-04-12T17:01:47Z	2023-10-25T03:18:05Z	2023-10-25T03:18:05Z	CONTRIBUTOR	Describe the solution you'd like We have started using xarray objects as "schema" for initializing zarrs that will be written to using the `region` argument of `to_zarr`. For example, `output_schema.to_zarr(path, compute=False) for region in regions: output = func(input_data.isel(region)) output.to_zarr(path, region=region)` Currently, xarray's tools for computing the `output_schema` Dataset are a lacking since rechunking existing datasets can be slow. `dask.array.zeros_like` takes a chunks argument, can we add one here too? Describe alternatives you've considered `.chunk`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5144/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
334366223	MDU6SXNzdWUzMzQzNjYyMjM=	2241	Slow performance with isel on stacked coordinates	nbren12 1386642	closed	4	2018-06-21T07:13:32Z	2020-06-20T20:51:48Z	2020-06-20T20:51:48Z	CONTRIBUTOR	Code Sample ```python a = xr.DataArray(np.random.rand(64,64,64), dims=list('xyz')).chunk({'x':8, 'y': 8}) b = a.stack(b=['x', 'y']) %timeit b.isel(b=0).load() 3.81 ms ± 24.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit a.isel(x=0, y=0).load() 822 µs ± 3.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) np.allclose(b.isel(b=0).values, a.isel(x=0, y=0).values) True ``` Problem description I have noticed some pretty significant slow downs when using dask and stacked indices. As you can see in the example above, selecting the point x=0, y=0 takes about 4 times as long when the x and y dimensions are stacked together. This big difference only appears when `.load` is called. Does this mean it's a dask issue? Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.3.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.3.0 h5netcdf: 0.4.2 h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.1 distributed: 1.21.1 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.1.0 pip: 9.0.1 conda: None pytest: 3.5.1 IPython: 6.2.1 sphinx: 1.6.5	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2241/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
631940742	MDU6SXNzdWU2MzE5NDA3NDI=	4125	Improving typing of `xr.Dataset.__getitem__`	nbren12 1386642	closed	2	2020-06-05T20:40:39Z	2020-06-15T11:25:53Z	2020-06-15T11:25:53Z	CONTRIBUTOR	First, I'd like the thank the xarray dev's for adding type hints to this library, not many libraries have this feature! That said, the indexing notation of `xr.Dataset` does not currently play well wit mypy since it returns a Union type. This results in a lot of mypy errors like this: workflows/fine_res_budget/budget/budgets.py:284: error: Argument 6 to "compute_recoarsened_budget_field" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray" workflows/fine_res_budget/budget/budgets.py:285: error: Argument 1 to "storage" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray" workflows/fine_res_budget/budget/budgets.py:286: error: Argument "unresolved_flux" to "compute_recoarsened_budget_field" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray" workflows/fine_res_budget/budget/budgets.py:287: error: Argument "saturation_adjustment" to "compute_recoarsened_budget_field" has incompatible type "Union[DataArray, Dataset]"; expected "DataArray" MCVE Code Sample ``` def func(ds: xr.Dataset): pass dataset: xr.Dataset = ... error: this line will give type error because mypy doesn't know if ds[['a', 'b]] is Dataset or a DataArray func(ds[['a', 'b']]) ``` Expected Output Mypy should be able to infer that `ds[['a', b']]` is a Dataset, and that `ds['a']` is a DataArray. Problem Description This requires any routine with type hints that consume an output of `xr.Dataset.__getitem__` to require a `Union[DataArray, Dataset]` even if it really intends to be used with either `DataArray` or `DataArray`. Because `ds[something]` is a ubiquitous syntax, this behavior accounts for approximately 50% of mypy errors in my xarray heavy code. Versions Output of <tt>xr.show_versions()</tt> In [1]: import xarray as xr xr. In [2]: xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-1020-gcp machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: C.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.15.1 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.17.2 distributed: 2.17.0 matplotlib: 3.1.3 cartopy: 0.17.0 seaborn: 0.10.1 numbagg: None setuptools: 46.4.0.post20200518 pip: 20.0.2 conda: 4.8.3 pytest: 5.4.2 IPython: 7.13.0 sphinx: None Potential solution I think we can fix this with typing.overload. I am not too familiar with that librariy, but I think something like the following might work: ``` from typing import overload class Dataset @overload def getitem(self, key: Hashable) -> DataArray: ... `@overload def __getitem__(self, key: List[Hashable]) -> "Dataset": ... # actual implementation def __getitem__` ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4125/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
289837692	MDU6SXNzdWUyODk4Mzc2OTI=	1839	Add simple array creation functions for easier unit testing	nbren12 1386642	closed	3	2018-01-19T01:53:20Z	2020-01-19T04:21:10Z	2020-01-19T04:21:10Z	CONTRIBUTOR	When I am writing unit tests for routines that involve `DataArray` objects many lines of code are devoted to creating mock objects. Here is an example of a unit test I recently wrote to test some code which computes the fluid derivative of a field given the velocity. ```python def test_material_derivative(): dims = ['x', 'y', 'z', 'time'] coords = {dim: np.arange(10) for dim in dims} shape = [coords[dim].shape[0] for dim in coords] f = xr.Dataset({'f': (dims, np.ones(shape))}, coords=coords) f = f.f one = 0 f +1 zero = 0f md = material_derivative(zero, one, zero, f.x + 0f) np.testing.assert_array_almost_equal(md.values, 0) md = material_derivative(one, zero, zero, f.x + 0f) np.testing.assert_array_almost_equal(md.isel(x=slice(1,-1)).values, one.isel(x=slice(1,-1)).values) md = material_derivative(zero, one, zero, f.y + 0f) np.testing.assert_array_almost_equal(md.isel(y=slice(1,-1)).values, one.isel(y=slice(1,-1)).values) md = material_derivative(zero, zero, one, f.z + 0f) np.testing.assert_array_almost_equal(md.isel(z=slice(1,-1)).values, one.isel(z=slice(1,-1)).values) ``` As you can see, I devote many lines to initializing a 4D data array of all ones, where all the coordinates are `np.arange(10)` objects. It isn't too hard to do this once, but it gets pretty annoying to do many times, especially when I forget how the DataArray and Dataset constructors work. Now, I can do something like `xr.DataArray(np.ones(...))`, but I would still have to initialize the coordinates if I use them. In any case, having some sort of functions like `xr.ones`, `xr.zeros`, and `xr.rand` which initialize the coordinates and data would be very nice.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1839/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
497427114	MDU6SXNzdWU0OTc0MjcxMTQ=	3337	Dataset.groupby reductions give "Dataset does not contain dimensions error" in v0.13	nbren12 1386642	closed	1	2019-09-24T03:01:00Z	2019-10-10T18:23:22Z	2019-10-10T18:23:22Z	CONTRIBUTOR	MCVE Code Sample ```python ds = xr.DataArray(np.ones((4,5)), dims=['z', 'x']).to_dataset(name='a') ds.a.groupby('z').mean() <xarray.DataArray 'a' (z: 4)> array([1., 1., 1., 1.]) Dimensions without coordinates: z ds.groupby('z').mean() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/common.py", line 91, in wrapped_func kwargs File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 848, in reduce return self.apply(reduce_dataset) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 796, in apply return self._combine(applied) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 800, in _combine applied_example, applied = peek_at(applied) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/utils.py", line 181, in peek_at peek = next(gen) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 795, in <genexpr> applied = (func(ds, args, kwargs) for ds in self._iter_grouped()) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/groupby.py", line 846, in reduce_dataset return ds.reduce(func, dim, keep_attrs, *kwargs) File "/Users/noah/miniconda3/envs/broken/lib/python3.7/site-packages/xarray/core/dataset.py", line 3888, in reduce "Dataset does not contain the dimensions: %s" % missing_dimensions ValueError: Dataset does not contain the dimensions: ['z'] ds.dims Frozen(SortedKeysDict({'z': 4, 'x': 5})) ``` Problem Description Groupby reduction operations on `Dataset` objects no longer seem to work in xarray v0.13. In the example, above I create an xarray dataset with one dataarray called "a". The same groupby operations fails on this `Dataset`, but succeeds when called directly on "a". Is this a bug or an intended change? In addition the error message is confusing since `z` is one of the Dataset dimensions. Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 \| packaged by conda-forge \| (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.13.0 pandas: 0.25.1 numpy: 1.17.2 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: None IPython: None sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/3337/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
216215022	MDU6SXNzdWUyMTYyMTUwMjI=	1317	API for reshaping DataArrays as 2D "data matrices" for use in machine learning	nbren12 1386642	closed	9	2017-03-22T21:33:07Z	2019-07-05T00:32:51Z	2019-07-05T00:32:51Z	CONTRIBUTOR	Machine learning and linear algebra problems are often expressed in terms of operations on matrices rather than arrays of arbitrary dimension, and there is currently no convenient way to turn DataArrays (or combinations of DataArrays) into a single "data matrix". As an example, I have needed to use scikit-learn lately with data from DataArray objects. Scikit-learn requires the data to be expressed in terms of simple 2-dimensional matrices. The rows are called samples, and the columns are known as features. It is annoying and error to transpose and reshape a data array by hand to fit into this format. For instance, this gituhub repo for xarray aware sklearn-like objects devotes many lines of code to massaging data arrays into data matrices. I think that this reshaping workflow might be common enough to warrant some kind of treatment in xarray. I have written some code in this gist, that have found pretty convenient for doing this. This gist has an `XRReshaper` class which can be used for reshaping data to and from a matrix format. The basic usage for an EOF analysis of a dataset `A(lat, lon, time)` can be done like this ```python feature_dims = ['lat', 'lon'] rs = XRReshaper(A) data_matrix, _ = rs.to(feature_dims) Some linear algebra or machine learning ,, eofs = svd(data_matrix) eofs_datarray = rs.get(eofs[0], ['mode'] + feature_dims) ``` I am not sure this is the best API, but it seems to work pretty well and I have used it here to implement some xarray-aware sklearn-like objects for PCA, which can be used like `feature_dims = ['lat', 'lon'] pca = XPCA(feature_dims, n_components=10, weight=cos(A.lat)) pca.fit(A) pca.transform(A) eofs = pca.components_` Another syntax which might be helpful is some kind of context manager approach like ```python with XRReshaper(A) as rs, data_matrix: # do some stuff with data_matrix use rs to restore output to a data array. ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1317/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
291103680	MDU6SXNzdWUyOTExMDM2ODA=	1852	bug: 2D pcolormesh plots are wrong when coordinate is not ascending order	nbren12 1386642	closed	9	2018-01-24T07:01:07Z	2018-02-18T19:06:31Z	2018-02-18T19:06:31Z	CONTRIBUTOR	Code Sample, a copy-pastable example if possible ```python import matplotlib.pyplot as plt import numpy as np import xarray as xr x = np.arange(10) y = np.arange(20) np.random.shuffle(x) x = xr.DataArray(x, dims=['x'], coords={'x': x}) y = xr.DataArray(y, dims=['y'], coords={'y': y}) z = x + y z_sorted = z.isel(x=np.argsort(x.values)) make plot fig, axs= plt.subplots(1, 2, figsize=(6,3)) z_sorted.plot(ax=axs[0]) axs[0].set_title("X is sorted") z.plot(ax=axs[1]) axs[1].set_title("X is not unsorted") plt.tight_layout() ``` Problem description Sometime the coordinates in an xarray dataset are not always sorted in ascending order. I recently had an issue where the time coordinate of a 2D datasets was scrambled, so calling `x.plot` gave very strange results. In my opinion, `x.plot` should probably sort the data along the coordinates, or at least provide a warning if the coordinates are unsorted. Expected Output Here is the image generated by the snippet above: The left and right panels should be the same. Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.0.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0+dev50.ga988dc2 pandas: 0.20.3 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: 0.5.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.15.2 distributed: 1.18.3 matplotlib: 2.0.2 cartopy: None seaborn: 0.8.0 setuptools: 36.5.0.post20170921 pip: 9.0.1 conda: 4.3.29 pytest: 3.2.1 IPython: 6.1.0 sphinx: 1.6.3	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1852/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
258640421	MDU6SXNzdWUyNTg2NDA0MjE=	1577	Potential error in apply_ufunc docstring for input_core_dims	nbren12 1386642	closed	5	2017-09-18T22:28:10Z	2017-10-10T04:42:21Z	2017-10-10T04:42:21Z	CONTRIBUTOR	The documentation for `input_core_dims` reads: ` input_core_dims : Sequence[Sequence], optional List of the same length asargs`` giving the list of core dimensions on each input argument that should be broadcast. By default, we assume there are no core dimensions on any input arguments. For example ,``input_core_dims=[[], ['time']]`` indicates that all dimensions on the first argument and all dimensions other than 'time' on the second argument should be broadcast. ``` The first and second paragraphs seem contradictory to me. Shouldn't the first paragraph be changed to: List of the same length as ``args`` giving the list of core dimensions on each input argument that should not be broadcast. By default, we assume there are no core dimensions on any input arguments.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1577/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

8 rows where state = "closed", type = "issue" and user = 1386642 sorted by updated_at descending

Code Sample

Problem description

Output of `xr.show_versions()`

MCVE Code Sample

error:

this line will give type error because mypy doesn't know

if ds[['a', 'b]] is Dataset or a DataArray

Expected Output

Problem Description

Versions

Potential solution

MCVE Code Sample

Problem Description

Output of `xr.show_versions()`

Some linear algebra or machine learning

use rs to restore output to a data array.

Code Sample, a copy-pastable example if possible

make plot

Problem description

Expected Output

Paste the output here xr.show_versions() here

Advanced export

issues

8 rows where state = "closed", type = "issue" and user = 1386642 sorted by updated_at descending

Code Sample

Problem description

Output of xr.show_versions()

MCVE Code Sample

error:

this line will give type error because mypy doesn't know

if ds[['a', 'b]] is Dataset or a DataArray

Expected Output

Problem Description

Versions

Potential solution

MCVE Code Sample

Problem Description

Output of xr.show_versions()

Some linear algebra or machine learning

use rs to restore output to a data array.

Code Sample, a copy-pastable example if possible

make plot

Problem description

Expected Output

Paste the output here xr.show_versions() here

Advanced export

Output of `xr.show_versions()`

Output of `xr.show_versions()`