github: issues: 4 rows where user = 16925278 sorted by updated

4 rows where user = 16925278 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
2132686003	PR_kwDOAMm_X85mxZFT	8744	Update docs on view / copies	ks905383 16925278	closed	4	2024-02-13T16:14:40Z	2024-03-25T20:35:23Z	2024-03-25T20:35:19Z	CONTRIBUTOR	0	pydata/xarray/pulls/8744	Add reference to numpy docs on view / copies in the corresponding section of the xarray docs, to help clarify pydata#8728 . Add note that other xarray operations also return views rather than copies in the Copies vs. Views section of the docs Add note that `da.values()` returns a view in the header for `da.values()`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8744/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
2132593768	I_kwDOAMm_X85_HMxo	8743	`.reset_index()`/`.reset_coords()` maintain MultiIndex status	ks905383 16925278	closed	6	2024-02-13T15:32:47Z	2024-03-01T16:05:11Z	2024-03-01T16:05:11Z	CONTRIBUTOR			What happened? Trying to save a dataset to NetCDF using `ds.to_netcdf()` will fail when one of the coordinates is a multiindex. The error message suggests using `.reset_index()` to remove the multiindex. However, saving still fails after resetting the index, including after moving the offending coordinates to be data variables instead using `.reset_coords()`. What did you expect to happen? After calling `.reset_index()`, and especially after calling `.reset_coords()`, the save should be successful. As shown in the example below, a dataset that asserts identical to the dataset that throws the error saves without a problem. (this also points to a current workaround - to recreate the Dataset from scratch). Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np Create random dataset ds = xr.Dataset({'test':(['lat','lon'],np.random.rand(2,3))}, coords = {'lat':(['lat'],[0,1]), 'lon':(['lon'],[0,1,2])}) Create multiindex by stacking ds = ds.stack(locv=('lat','lon')) The index shows up as a MultiIndex print(ds.indexes) Try to export (this fails as expected, since multiindex) ds.to_netcdf('test.nc') Now, get rid of multiindex by resetting coords (i.e., turning coordinates into data variables) ds = ds.reset_index('locv').reset_coords() The index is no longer a MultiIndex print(ds.indexes) Try to export - this also fails! ds.to_netcdf('test.nc') A reference comparison dataset that is successfully asserted as identical ds_compare = xr.Dataset({k:(['locv'],ds[k].values) for k in ds}) xr.testing.assert_identical(ds_compare,ds) Try exporting (this succeeds) ds_compare.to_netcdf('test.nc') ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. Relevant log output ```Python NotImplementedError Traceback (most recent call last) Cell In[109], line 1 ----> 1 ds.to_netcdf('test.nc') File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/core/dataset.py:2303, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 2300 encoding = {} 2301 from xarray.backends.api import to_netcdf -> 2303 return to_netcdf( # type: ignore # mypy cannot resolve the overloads:( 2304 self, 2305 path, 2306 mode=mode, 2307 format=format, 2308 group=group, 2309 engine=engine, 2310 encoding=encoding, 2311 unlimited_dims=unlimited_dims, 2312 compute=compute, 2313 multifile=False, 2314 invalid_netcdf=invalid_netcdf, 2315 ) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1315, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1310 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1311 # to avoid this mess of conditionals 1312 try: 1313 # TODO: allow this work (setting up the file for writing array data) 1314 # to be parallelized with dask -> 1315 dump_to_store( 1316 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1317 ) 1318 if autoclose: 1319 store.close() File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/api.py:1362, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1359 if encoder: 1360 variables, attrs = encoder(variables, attrs) -> 1362 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:352, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 349 if writer is None: 350 writer = ArrayWriter() --> 352 variables, attributes = self.encode(variables, attributes) 354 self.set_attributes(attributes) 355 self.set_dimensions(variables, unlimited_dims=unlimited_dims) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/backends/common.py:441, in WritableCFDataStore.encode(self, variables, attributes) 438 def encode(self, variables, attributes): 439 # All NetCDF files get CF encoded by default, without this attempting 440 # to write times, for example, would fail. --> 441 variables, attributes = cf_encoder(variables, attributes) 442 variables = {k: self.encode_variable(v) for k, v in variables.items()} 443 attributes = {k: self.encode_attribute(v) for k, v in attributes.items()} File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:791, in cf_encoder(variables, attributes) 788 # add encoding for time bounds variables if present. 789 _update_bounds_encoding(variables) --> 791 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()} 793 # Remove attrs from bounds variables (issue #2921) 794 for var in new_vars.values(): File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:179, in encode_cf_variable(var, needs_copy, name) 157 def encode_cf_variable( 158 var: Variable, needs_copy: bool = True, name: T_Name = None 159 ) -> Variable: 160 """ 161 Converts a Variable into a Variable which follows some 162 of the CF conventions: (...) 177 A variable which has been encoded as described above. 178 """ --> 179 ensure_not_multiindex(var, name=name) 181 for coder in [ 182 times.CFDatetimeCoder(), 183 times.CFTimedeltaCoder(), (...) 190 variables.BooleanCoder(), 191 ]: 192 var = coder.encode(var, name=name) File ~/opt/anaconda3/envs/xagg_test2/lib/python3.12/site-packages/xarray/conventions.py:88, in ensure_not_multiindex(var, name) 86 def ensure_not_multiindex(var: Variable, name: T_Name = None) -> None: 87 if isinstance(var._data, indexing.PandasMultiIndexingAdapter): ---> 88 raise NotImplementedError( 89 f"variable {name!r} is a MultiIndex, which cannot yet be " 90 "serialized. Instead, either use reset_index() " 91 "to convert MultiIndex levels into coordinate variables instead " 92 "or use https://cf-xarray.readthedocs.io/en/latest/coding.html." 93 ) NotImplementedError: variable 'lat' is a MultiIndex, which cannot yet be serialized. Instead, either use reset_index() to convert MultiIndex levels into coordinate variables instead or use https://cf-xarray.readthedocs.io/en/latest/coding.html. ``` Anything else we need to know? This is a recent error that came up in some automated tests - an older version of it is still working; so `xarray v2023.1.0` does not have this issue. Given that saving works with a dataset that `xr.testing.assert_identical()` asserts is identical to the dataset that fails, and that `ds.indexes()` no longer shows a MultiIndex on the dataset that fails, perhaps the issue is in the error itself - i.e., in `xarray.conventions.ensure_not_multiindex` ? Looks like it was added recently https://github.com/pydata/xarray/commit/f9f4c730254073f0f5a8fce65f4bbaa0eefec5fd to address another bug. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.12.1 \| packaged by conda-forge \| (main, Dec 23 2023, 08:05:03) [Clang 16.0.6 ] python-bits: 64 OS: Darwin OS-release: 22.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.1.1 pandas: 2.2.0 numpy: 1.26.3 scipy: 1.12.0 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.8.2 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: 0.15.1 flox: None numpy_groupies: None setuptools: 69.0.3 pip: 24.0 conda: None pytest: 7.4.0 mypy: None IPython: 8.21.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8743/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
2127671156	I_kwDOAMm_X85-0a90	8728	Lingering memory connections when extracting underlying `np.arrays` from datasets	ks905383 16925278	open	6	2024-02-09T18:39:34Z	2024-02-26T06:02:15Z		CONTRIBUTOR			What is your issue? I know that generally, `ds2 = ds` connects the two objects in memory, and changes in one will also cause changes in the other. However, I generally assume that certain operations should break this connection, for example: - extracting the underlying `np.array` from a dataset (changing its type and destroying a lot of the xarray-specific information: index, dimensions, etc.) - using the underlying `np.array` into a new dataset In other words, I would expect that using `ds['var'].values` would be similar to `copy.deepcopy(ds['var'].values)`. Here's an example that illustrates how in these cases, the objects are still linked in memory: (apologies for the somewhat hokey example) ``` import xarray as xr import numpy as np Create a dataset ds = xr.Dataset(coords = {'lon':(['lon'],np.array([178.2,179.2,-179.8, -178.8,-177.8,-176.8]))}) print('\nds: ') print(ds) Create a new dataset that uses the values of the first dataset ds2 = xr.Dataset({'lon1':(['lon'],ds.lon.values)}, coords = {'lon':(['lon'],ds.lon.values)}) print('\nds2: ') print(ds2) Change ds2's 'lon1' variable ds2['lon1'][ds2['lon1']<0] = 360 + ds2['lon1'][ds2['lon1']<0] `ds2` is changed as expected print('\nds2 (should be modified): ') print(ds2) `ds` is changed, which is not expected print('\nds (should not be modified): ') print(ds) ``` The question is - am I right (from a UX perspective) to expect these kinds of operations to disconnect the objects in memory? If so, I might try to update the docs to be a bit clearer on this. (or, alternatively, if these kinds of operations should disconnect the objects in memory, maybe it's better to have `.values` also call `.copy(deep=True).values`) Appreciate y'all's thoughts on this!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8728/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1842072960	I_kwDOAMm_X85ty82A	8058	`ds.interp()` breaks if (non-interpolating) dimension is not numeric	ks905383 16925278	open	1	2023-08-08T20:43:08Z	2023-08-08T20:52:15Z		CONTRIBUTOR			What happened? I'm running `ds.interp()` using multi-dimensional new coordinates, using `xarray`'s broadcasting to expand the original dataset to new dimensions. In this case, I'm only interpolating on one dimension, but broadcasting out to others. If the dimensions are all numeric (or, presumably, able to be forced to numeric), then this works without an issue. However, if one of the other dimensions is, e.g., populated with string indices (weather station names, model run ids, etc.), then this process fails, even if the dimension on which the interpolating is conducted is purely numeric. What did you expect to happen? Here is an example with only numeric dimensions that works as expected: ``` import xarray as xr import numpy as np da1 = xr.DataArray(np.reshape(np.arange(0,12),(3,4)), coords = {'dim0':np.arange(0,3), 'dim1':np.arange(0,4)}) da2 = xr.DataArray(np.random.normal(loc=1,size=(2,4),scale=0.5), coords = {'dim2':np.arange(0,2), 'dim1':np.arange(0,4)}) da1.interp(dim0=da2) ``` this produces something like: as expected. Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np da1 = xr.DataArray(np.reshape(np.arange(0,12),(3,4)), coords = {'dim0':np.arange(0,3), 'dim1':np.arange(0,4).astype(str)}) da2 = xr.DataArray(np.random.normal(loc=1,size=(2,4),scale=0.5), coords = {'dim2':np.arange(0,2), 'dim1':np.arange(0,4).astype(str)}) da1.interp(dim0=da2) ``` MVCE confirmation [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [X] Complete example — the example is self-contained, including all data and the text of any traceback. [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [X] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output ```Python TypeError Traceback (most recent call last) Cell In[48], line 9 1 da1 = xr.DataArray(np.reshape(np.arange(0,12),(3,4)), 2 coords = {'dim0':np.arange(0,3), 3 'dim1':np.arange(0,4).astype(str)}) 5 da2 = xr.DataArray(np.random.normal(loc=1,size=(2,4),scale=0.5), 6 coords = {'dim2':np.arange(0,2), 7 'dim1':np.arange(0,4).astype(str)}) ----> 9 da1.interp(dim0=da2) File ~/.conda/envs/climate/lib/python3.10/site-packages/xarray/core/dataarray.py:2204, in DataArray.interp(self, coords, method, assume_sorted, kwargs, coords_kwargs) 2199 if self.dtype.kind not in "uifc": 2200 raise TypeError( 2201 "interp only works for a numeric type array. " 2202 "Given {}.".format(self.dtype) 2203 ) -> 2204 ds = self._to_temp_dataset().interp( 2205 coords, 2206 method=method, 2207 kwargs=kwargs, 2208 assume_sorted=assume_sorted, 2209 coords_kwargs, 2210 ) 2211 return self._from_temp_dataset(ds) File ~/.conda/envs/climate/lib/python3.10/site-packages/xarray/core/dataset.py:3666, in Dataset.interp(self, coords, method, assume_sorted, kwargs, method_non_numeric, coords_kwargs) 3664 if method in ["linear", "nearest"]: 3665 for k, v in validated_indexers.items(): -> 3666 obj, newidx = missing._localize(obj, {k: v}) 3667 validated_indexers[k] = newidx[k] 3669 # optimization: create dask coordinate arrays once per Dataset 3670 # rather than once per Variable when dask.array.unify_chunks is called later 3671 # GH4739 File ~/.conda/envs/climate/lib/python3.10/site-packages/xarray/core/missing.py:562, in _localize(var, indexes_coords) 560 indexes = {} 561 for dim, [x, new_x] in indexes_coords.items(): --> 562 minval = np.nanmin(new_x.values) 563 maxval = np.nanmax(new_x.values) 564 index = x.to_index() File <array_function** internals>:5, in nanmin(args, kwargs) File ~/.conda/envs/climate/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:319, in nanmin(a, axis, out, keepdims) 315 kwargs['keepdims'] = keepdims 316 if type(a) is np.ndarray and a.dtype != np.object_: 317 # Fast, but not safe for subclasses of ndarray, or object arrays, 318 # which do not implement isnan (gh-9009), or fmin correctly (gh-8975) --> 319 res = np.fmin.reduce(a, axis=axis, out=out, *kwargs) 320 if np.isnan(res).any(): 321 warnings.warn("All-NaN slice encountered", RuntimeWarning, 322 stacklevel=3) TypeError: cannot perform reduce with flexible type ``` Anything else we need to know? I'm pretty sure the issue is in this optimization step. It calls `_localize()` from missing.py, which calls `np.nanmin()` and `np.nanmax()` on all the coordinates, including the ones that aren't used in the interpolation, but only in the broadcasting. Perhaps a way to fix this would be to have a test in localize for numeric indices, and then only subset the numeric dimensions? (I could see generalizing `_localize()` to other data types may be more trouble than it's worth, especially for unsorted string dimensions...) Or only subset the dimensions used in the interpolation itself? Or, alternatively, having a way to turn off optimizations like this? Environment INSTALLED VERSIONS ------------------ commit: None python: 3.10.8 \| packaged by conda-forge \| (main, Nov 22 2022, 08:23:14) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.76.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: (None, None) libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2023.7.0 pandas: 1.4.1 numpy: 1.21.6 scipy: 1.11.1 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: 1.5.5 zarr: 2.13.2 cftime: 1.6.2 nc_time_axis: 1.4.1 PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.3.0 distributed: 2023.3.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: 0.11.2 numbagg: None fsspec: 2022.5.0 cupy: None pint: 0.22 sparse: 0.14.0 flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: 7.0.1 mypy: None IPython: 8.14.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8058/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

4 rows where user = 16925278 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Create random dataset

Create multiindex by stacking

The index shows up as a MultiIndex

Try to export (this fails as expected, since multiindex)

ds.to_netcdf('test.nc')

Now, get rid of multiindex by resetting coords (i.e.,

turning coordinates into data variables)

The index is no longer a MultiIndex

Try to export - this also fails!

ds.to_netcdf('test.nc')

A reference comparison dataset that is successfully asserted

as identical

Try exporting (this succeeds)

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

What is your issue?

Create a dataset

Create a new dataset that uses the values of the first dataset

Change ds2's 'lon1' variable

`ds2` is changed as expected

`ds` is changed, which is not expected

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

Advanced export

issues

4 rows where user = 16925278 sorted by updated_at descending

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

Create random dataset

Create multiindex by stacking

The index shows up as a MultiIndex

Try to export (this fails as expected, since multiindex)

ds.to_netcdf('test.nc')

Now, get rid of multiindex by resetting coords (i.e.,

turning coordinates into data variables)

The index is no longer a MultiIndex

Try to export - this also fails!

ds.to_netcdf('test.nc')

A reference comparison dataset that is successfully asserted

as identical

Try exporting (this succeeds)

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

What is your issue?

Create a dataset

Create a new dataset that uses the values of the first dataset

Change ds2's 'lon1' variable

ds2 is changed as expected

ds is changed, which is not expected

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

MVCE confirmation

Relevant log output

```Python

Anything else we need to know?

Environment

Advanced export

`ds2` is changed as expected

`ds` is changed, which is not expected