github: issues: 19 rows where comments = 6 and user = 1217238 sorted by updated

19 rows where comments = 6 and user = 1217238 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	milestone	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
842436143	MDU6SXNzdWU4NDI0MzYxNDM=	5081	Lazy indexing arrays as a stand-alone package	shoyer 1217238	open		6	2021-03-27T07:06:03Z	2023-12-15T13:20:03Z		MEMBER			From @rabernat on Twitter: "Xarray has some secret private classes for lazily indexing / wrapping arrays that are so useful I think they should be broken out into a standalone package. https://github.com/pydata/xarray/blob/master/xarray/core/indexing.py#L516" The idea here is create a first-class "duck array" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing. Desired features: Lazy indexing Lazy transposes Lazy concatenation (#4628) and stacking Lazy vectorized operations (e.g., unary and binary arithmetic) needed for decoding variables from disk (`xarray.encoding`) and building lazy multi-dimensional coordinate arrays corresponding to map projections (#3620) Maybe: lazy reshapes (#4113) A common feature of these operations is they can (and almost always should) be fused with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea. Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, `mean()` probably should not be lazy, because that could involve computation of a very large number of elements that one might want to cache. This is valuable functionality for Xarray for two reasons: It allows for "previewing" small bits of data loaded from disk or remote storage, even if that data needs some form of cheap "decoding" from its form on disk. It allows for xarray to decode data in a lazy fashion that is compatible with full-featured systems for lazy computation (e.g., Dask), without requiring the user to choose dask when reading the data. Related issues: [Proposal] Expose Variable without Pandas dependency #3981 Lazy concatenation of arrays #4628	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5081/reactions", "total_count": 6, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
253395960	MDU6SXNzdWUyNTMzOTU5NjA=	1533	Index variables loaded from dask can be computed twice	shoyer 1217238	closed		6	2017-08-28T17:18:27Z	2023-04-06T04:15:46Z	2023-04-06T04:15:46Z	MEMBER			as reported by @crusaderky in #1522	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1533/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
948890466	MDExOlB1bGxSZXF1ZXN0NjkzNjY1NDEy	5624	Make typing-extensions optional	shoyer 1217238	closed		6	2021-07-20T17:43:22Z	2021-07-22T23:30:49Z	2021-07-22T23:02:03Z	MEMBER	0	pydata/xarray/pulls/5624	Type checking may be a little worse if typing-extensions are not installed, but I don't think it's worth the trouble of adding another hard dependency just for one use for TypeGuard. Note: sadly this doesn't work yet. Mypy (and pylance) don't like the type alias defined with try/except. Any ideas? In the worst case, we could revert the TypeGuard entirely, but that would be a shame... [x] Closes #5495 [x] Passes `pre-commit run --all-files`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/5624/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
715374721	MDU6SXNzdWU3MTUzNzQ3MjE=	4490	Group together decoding options into a single argument	shoyer 1217238	open		6	2020-10-06T06:15:18Z	2020-10-29T04:07:46Z		MEMBER			Is your feature request related to a problem? Please describe. `open_dataset()` currently has a very long function signature. This makes it hard to keep track of everything it can do, and is particularly problematic for the authors of new backends (e.g., see https://github.com/pydata/xarray/pull/4477), which might need to know how to handle all these arguments. Describe the solution you'd like To simple the interface, I propose to group together all the decoding options into a new `DecodingOptions` class. I'm thinking something like: ```python from dataclasses import dataclass, field, asdict from typing import Optional, List @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None `@classmethods def disabled(cls): return cls(mask=False, scale=False, datetime=False, timedelta=False, concat_characters=False, coords=False) def non_defaults(self): return {k: v for k, v in asdict(self).items() if v is not None} # add another method for creating default Variable Coder() objects, # e.g., those listed in encode_cf_variable()` ``` The signature of `open_dataset` would then become: `python def open_dataset( filename_or_obj, group=None, * engine=None, chunks=None, lock=None, cache=None, backend_kwargs=None, decode: Union[DecodingOptions, bool] = None, deprecated_kwargs ): if decode is None: decode = DecodingOptions() if decode is False: decode = DecodingOptions.disabled() # handle deprecated_kwargs... ...` Question: are `decode` and `DecodingOptions` the right names? Maybe these should still include the name "CF", e.g., `decode_cf` and `CFDecodingOptions`, given that these are specific to CF conventions? Note*: the current signature is `open_dataset(filename_or_obj, group=None, decode_cf=True, mask_and_scale=None, decode_times=True, autoclose=None, concat_characters=True, decode_coords=True, engine=None, chunks=None, lock=None, cache=None, drop_variables=None, backend_kwargs=None, use_cftime=None, decode_timedelta=None)` Usage with the new interface would look like `xr.open_dataset(filename, decode=False)` or `xr.open_dataset(filename, decode=xr.DecodingOptions(mask=False, scale=False))`. This requires a little* bit more typing than what we currently have, but it has a few advantages: It's easier to understand the role of different arguments. Now there is a function with ~8 arguments and a class with ~8 arguments rather than a function with ~15 arguments. It's easier to add new decoding arguments (e.g., for more advanced CF conventions), because they don't clutter the `open_dataset` interface. For example, I separated out `mask` and `scale` arguments, versus the current `mask_and_scale` argument. If a new backend plugin for `open_dataset()` needs to handle every option supported by `open_dataset()`, this makes that task significantly easier. The only decoding options they need to worry about are non-default options that were explicitly set, i.e., those exposed by the `non_defaults()` method. If another decoding option wasn't explicitly set and isn't recognized by the backend, they can just ignore it. Describe alternatives you've considered For the overall approach: We could keep the current design, with separate keyword arguments for decoding options, and just be very careful about passing around these arguments. This seems pretty painful for the backend refactor, though. We could keep the current design only for the user facing `open_dataset()` interface, and then internally convert into the `DecodingOptions()` struct for passing to backend constructors. This would provide much needed flexibility for backend authors, but most users wouldn't benefit from the new interface. Perhaps this would make sense as an intermediate step?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4490/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
702372014	MDExOlB1bGxSZXF1ZXN0NDg3NjYxMzIz	4426	Fix for h5py deepcopy issues	shoyer 1217238	closed		6	2020-09-16T01:11:00Z	2020-09-18T22:31:13Z	2020-09-18T22:31:09Z	MEMBER	0	pydata/xarray/pulls/4426	[x] Closes #4425 [x] Tests added [x] Passes `isort . && black . && mypy . && flake8`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4426/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
417542619	MDU6SXNzdWU0MTc1NDI2MTk=	2803	Test failure with TestValidateAttrs.test_validating_attrs	shoyer 1217238	closed		6	2019-03-05T23:03:02Z	2020-08-25T14:29:19Z	2019-03-14T15:59:13Z	MEMBER			This is due to setting multi-dimensional attributes being an error, as of the latest netCDF4-Python release: https://github.com/Unidata/netcdf4-python/blob/master/Changelog E.g., as seen on Appveyor: https://ci.appveyor.com/project/shoyer/xray/builds/22834250/job/9q0ip6i3cchlbkw2 ``` ================================== FAILURES =================================== ___ TestValidateAttrs.test_validating_attrs _____ self = <xarray.tests.test_backends.TestValidateAttrs object at 0x00000096BE5FAFD0> def test_validating_attrs(self): def new_dataset(): return Dataset({'data': ('y', np.arange(10.0))}, {'y': np.arange(10)}) def new_dataset_and_dataset_attrs(): ds = new_dataset() return ds, ds.attrs def new_dataset_and_data_attrs(): ds = new_dataset() return ds, ds.data.attrs def new_dataset_and_coord_attrs(): ds = new_dataset() return ds, ds.coords['y'].attrs for new_dataset_and_attrs in [new_dataset_and_dataset_attrs, new_dataset_and_data_attrs, new_dataset_and_coord_attrs]: ds, attrs = new_dataset_and_attrs() attrs[123] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[MiscObject()] = 'test' with raises_regex(TypeError, 'Invalid name for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs[''] = 'test' with raises_regex(ValueError, 'Invalid name for attr'): ds.to_netcdf('test.nc') # This one should work ds, attrs = new_dataset_and_attrs() attrs['test'] = 'test' with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = {'a': 5} with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = MiscObject() with raises_regex(TypeError, 'Invalid value for attr'): ds.to_netcdf('test.nc') ds, attrs = new_dataset_and_attrs() attrs['test'] = 5 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = 3.14 with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = [1, 2, 3, 4] with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = (1.9, 2.5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(5) with create_tmp_file() as tmp_file: ds.to_netcdf(tmp_file) ds, attrs = new_dataset_and_attrs() attrs['test'] = np.arange(12).reshape(3, 4) with create_tmp_file() as tmp_file: `ds.to_netcdf(tmp_file)` xarray\tests\test_backends.py:3450: xarray\core\dataset.py:1323: in to_netcdf compute=compute) xarray\backends\api.py:767: in to_netcdf unlimited_dims=unlimited_dims) xarray\backends\api.py:810: in dump_to_store unlimited_dims=unlimited_dims) xarray\backends\common.py:262: in store self.set_attributes(attributes) xarray\backends\common.py:278: in set_attributes self.set_attribute(k, v) xarray\backends\netCDF4_.py:418: in set_attribute set_nc_attribute(self.ds, key, value) xarray\backends\netCDF4.py:294: in _set_nc_attribute obj.setncattr(key, value) netCDF4_netCDF4.pyx:2781: in netCDF4._netCDF4.Dataset.setncattr ??? ??? E ValueError: multi-dimensional array attributes not supported netCDF4_netCDF4.pyx:1514: ValueError ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2803/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
398107776	MDU6SXNzdWUzOTgxMDc3NzY=	2666	Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data	shoyer 1217238	open		6	2019-01-11T02:45:49Z	2019-12-30T22:58:23Z		MEMBER			This appears with the development version of pandas; see https://github.com/pandas-dev/pandas/issues/24716 for details. Example: ``` In [16]: df = pd.DataFrame({"A": pd.date_range('2000', periods=12, tz='US/Central')}) In [17]: df.to_xarray() /Users/taugspurger/Envs/pandas-dev/lib/python3.7/site-packages/xarray/core/dataset.py:3111: FutureWarning: Converting timezone-aware DatetimeArray to timezone-naive ndarray with 'datetime64[ns]' dtype. In the future, this will return an ndarray with 'object' dtype where each element is a 'pandas.Timestamp' with the correct 'tz'. To accept the future behavior, pass 'dtype=object'. To keep the old behavior, pass 'dtype="datetime64[ns]"'. data = np.asarray(series).reshape(shape) Out[17]: <xarray.Dataset> Dimensions: (index: 12) Coordinates: * index (index) int64 0 1 2 3 4 5 6 7 8 9 10 11 Data variables: A (index) datetime64[ns] 2000-01-01T06:00:00 ... 2000-01-12T06:00:00 ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2666/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
269348789	MDU6SXNzdWUyNjkzNDg3ODk=	1668	Remove use of allow_cleanup_failure in test_backends.py	shoyer 1217238	open		6	2017-10-28T20:47:31Z	2019-09-29T20:07:03Z		MEMBER			This exists for the benefit of Windows, on which trying to delete an open file results in an error. But really, it would be nice to have a test suite that doesn't leave any temporary files hanging around. The main culprit is tests like this, where opening a file triggers an error: `python with raises_regex(TypeError, 'pip install netcdf4'): open_dataset(tmp_file, engine='scipy')` The way to fix this is to use mocking of some sort, to intercept calls to backend file objects and close them afterwards.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1668/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
278713328	MDU6SXNzdWUyNzg3MTMzMjg=	1756	Deprecate inplace methods	shoyer 1217238	closed	0.11 2856429	6	2017-12-02T20:09:00Z	2019-03-25T19:19:10Z	2018-11-03T21:24:13Z	MEMBER			The following methods have an `inplace` argument: `DataArray.reset_coords` `DataArray.set_index` `DataArray.reset_index` `DataArray.reorder_levels` `Dataset.set_coords` `Dataset.reset_coords` `Dataset.rename` `Dataset.swap_dims` `Dataset.set_index` `Dataset.reset_index` `Dataset.reorder_levels` `Dataset.update` `Dataset.merge` As proposed in https://github.com/pydata/xarray/issues/1755#issuecomment-348682403, let's deprecate all of these at the next major release (v0.11). They add unnecessary complexity to methods and promote confusing about xarray's data model. Practically, we would change all of the default values to `inplace=None` and issue either a `DeprecationWarning` or `FutureWarning` (see PEP 565 for more details on that choice).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1756/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
395332265	MDExOlB1bGxSZXF1ZXN0MjQxODExMjc4	2642	Use pycodestyle for lint checks.	shoyer 1217238	closed		6	2019-01-02T18:11:38Z	2019-03-14T06:27:20Z	2019-01-03T18:10:13Z	MEMBER	0	pydata/xarray/pulls/2642	flake8 includes a few more useful checks, but it's annoying to only see it's output in Travis-CI results. This keeps Travis-CI and pep8speaks in sync.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2642/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
369310993	MDU6SXNzdWUzNjkzMTA5OTM=	2480	test_apply_dask_new_output_dimension is broken on master with dask-dev	shoyer 1217238	closed		6	2018-10-11T21:24:33Z	2018-10-12T16:26:17Z	2018-10-12T16:26:17Z	MEMBER			Example build failure: https://travis-ci.org/pydata/xarray/jobs/439949937 ``` =================================== FAILURES =================================== ___ test_apply_dask_new_output_dimension ___ @requires_dask def test_apply_dask_new_output_dimension(): import dask.array as da `array = da.ones((2, 2), chunks=(1, 1)) data_array = xr.DataArray(array, dims=('x', 'y')) def stack_negative(obj): def func(x): return np.stack([x, -x], axis=-1) return apply_ufunc(func, obj, output_core_dims=[['sign']], dask='parallelized', output_dtypes=[obj.dtype], output_sizes={'sign': 2}) expected = stack_negative(data_array.compute()) actual = stack_negative(data_array) assert actual.dims == ('x', 'y', 'sign') assert actual.shape == (2, 2, 2) assert isinstance(actual.data, da.Array)` `assert_identical(expected, actual)` xarray/tests/test_computation.py:737: xarray/tests/test_computation.py:24: in assert_identical assert a.identical(b), msg xarray/core/dataarray.py:1923: in identical self._all_compat(other, 'identical')) xarray/core/dataarray.py:1875: in _all_compat compat(self, other)) xarray/core/dataarray.py:1872: in compat return getattr(x.variable, compat_str)(y.variable) xarray/core/variable.py:1461: in identical self.equals(other)) xarray/core/variable.py:1439: in equals equiv(self.data, other.data))) xarray/core/duck_array_ops.py:144: in array_equiv arr1, arr2 = as_like_arrays(arr1, arr2) xarray/core/duck_array_ops.py:128: in as_like_arrays return tuple(np.asarray(d) for d in data) xarray/core/duck_array_ops.py:128: in <genexpr> return tuple(np.asarray(d) for d in data) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/numpy/core/numeric.py:501: in asarray return array(a, dtype, copy=False, order=order) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/core.py:1118: in array x = self.compute() ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:156: in compute (result,) = compute(self, traverse=False, kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:390: in compute dsk = collections_to_dsk(collections, optimize_graph, kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in collections_to_dsk for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in <listcomp> for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/optimization.py:41: in optimize dsk = ensure_dict(dsk) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/utils.py:830: in ensure_dict result.update(dd) ../../../miniconda/envs/test_env/lib/python3.6/_collections_abc.py:720: in iter yield from self._mapping ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:168: in iter return iter(self._dict) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:160: in _dict concatenate=self.concatenate ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: in top keytups = list(itertools.product([range(dims[i]) for i in out_indices])) .0 = <tuple_iterator object at 0x7f606ba84fd0> keytups = list(itertools.product([range(dims[i]) for i in out_indices])) E KeyError: '.0' ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: KeyError ``` My guess is that this is somehow related to @mrocklin's recent refactor of dask.array.atop: https://github.com/dask/dask/pull/3998 If the cause isn't obvious, I'll try to come up with a simple dask only example that reproduces it.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2480/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
361915770	MDU6SXNzdWUzNjE5MTU3NzA=	2424	0.10.9 release	shoyer 1217238	closed		6	2018-09-19T20:31:29Z	2018-09-26T01:05:09Z	2018-09-22T15:14:48Z	MEMBER			It's now been two months since the 0.10.8 release, so we really ought to issue a new minor release. I was initially thinking of skipping straight to 0.11.0 if we include https://github.com/pydata/xarray/pull/2261 (xarray.backends refactor), but it seems that will take a bit longer to review/test so it's probably worth issuing a 0.10.9 release first. @pydata/xarray -- are there any PRs / bug-fixes in particular we should wait for before issuing the release? I suppose it would be good to sort out https://github.com/pydata/xarray/issues/2422 (Plot2D no longer sorts coordinates before plotting)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2424/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
302153432	MDExOlB1bGxSZXF1ZXN0MTcyNzYxNTAw	1962	Support __array_ufunc__ for xarray objects.	shoyer 1217238	closed		6	2018-03-05T02:36:20Z	2018-03-12T20:31:07Z	2018-03-12T20:31:07Z	MEMBER	0	pydata/xarray/pulls/1962	This means NumPy ufuncs are now supported directly on xarray.Dataset objects, and opens the door to supporting computation on new data types, such as sparse arrays or arrays with units. [x] Closes #1617 (remove if there is no corresponding issue, which should only be the case for minor changes) [x] Tests added (for all bug fixes or enhancements) [x] Tests passed (for all non-documentation changes) [x] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1962/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
195579837	MDU6SXNzdWUxOTU1Nzk4Mzc=	1164	Don't warn when doing comparisons or arithmetic with NaN	shoyer 1217238	closed		6	2016-12-14T16:33:05Z	2018-02-27T19:35:25Z	2018-02-27T16:03:43Z	MEMBER			Pandas used to unilaterally disable NumPy's warnings for doing comparisons with NaN, but now it doesn't: https://github.com/pandas-dev/pandas/issues/13109 See also http://stackoverflow.com/questions/41130138/why-is-invalid-value-encountered-in-greater-warning-thrown-in-python-xarray-fo/41147570#41147570	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1164/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
171828347	MDU6SXNzdWUxNzE4MjgzNDc=	974	Indexing with alignment and broadcasting	shoyer 1217238	closed	1.0 741199	6	2016-08-18T06:39:27Z	2018-02-04T23:30:12Z	2018-02-04T23:30:11Z	MEMBER			I think we can bring all of NumPy's advanced indexing to xarray in a very consistent way, with only very minor breaks in backwards compatibility. For boolean indexing: - `da[key]` where `key` is a boolean labelled array (with any number of dimensions) is made equivalent to `da.where(key.reindex_like(ds), drop=True)`. This matches the existing behavior if `key` is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that `da[key].mean()` gives the same result as in NumPy. - `da[key] = value` where `key` is a boolean labelled array can be made equivalent to `da = da.where(align(key.reindex_like(da), value.reindex_like(da)))` (that is, the three argument form of `where`). - `da[key_0, ..., key_n]` where all of `key_i` are boolean arrays gets handled in the usual way. It is an `IndexingError` to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write `da[key_0 & ... & key_n]`. For vectorized indexing* (by integer or index value): - `da[key_0, ..., key_n]` where all of `key_i` are integer labelled arrays with any number of dimensions gets handled like NumPy, except instead of broadcasting numpy-style we do broadcasting xarray-style: - If any of `key_i` are unlabelled, 1D arrays (e.g., numpy arrays), we convert them into an `xarray.Variable` along the respective dimension. 0D arrays remain scalars. This ensures that the result of broadcasting them (in the next step) will be consistent with our current "outer indexing" behavior. Unlabelled higher dimensional arrays triggers an `IndexingError`. - We ensure all keys have the same dimensions/coordinates by mapping it to `da[broadcast(key_0, ..., key_n)]` (note that broadcast now includes automatic alignment). - The result's dimensions and coordinates are copied from the broadcast keys. - The result's values are taken by mapping each set of integer locations specified by the broadcast version of `key_i` to the integer position on the corresponding `i`th axis on `da`. - Labeled indexing like `ds.loc[key_0, ...., key_n]` works exactly as above, except instead of doing integer lookup, we lookup label values in the corresponding index instead. - Indexing with `.isel` and `.sel`/`.reindex` works like the two previous cases, except we lookup axes by dimension name instead of axis position. - I haven't fully thought through the implications for assignment (`da[key] = value` or `da.loc[key] = value`), but I think it works in a straightforwardly similar fashion. All of these methods should also work for indexing on `Dataset` by looping over Dataset variables in the usual way. This framework neatly subsumes most of the major limitations with xarray's existing indexing: - Boolean indexing on multi-dimensional arrays works in an intuitive way, for both selection and assignment. - No more need for specialized methods (`sel_points`/`isel_points`) for pointwise indexing. If you want to select along the diagonal of an array, you simply need to supply indexers that use a new dimension. Instead of `arr.sel_points(lat=stations.lat, lon=stations.lon, dim='station')`, you would simply write `arr.sel(lat=stations.lat, lon=stations.lon)` -- the `station` dimension is taken automatically from the indexer. - Other use cases for NumPy's advanced indexing that currently are impossible in xarray also automatically work. For example, nearest neighbor interpolation to a completely different grid is now as simple as `ds.reindex(lon=grid.lon, lat=grid.lat, method='nearest', tolerance=0.5)` or `ds.reindex_like(grid, method='nearest', tolerance=0.5)`. Questions to consider: - How does this interact with @benbovy's enhancements for MultiIndex indexing? (#802 and #947) - How do we handle mixed slice and array indexing? In NumPy, this is a major source of confusion, because slicing is done before broadcasting and the order of slices in the result is handled separately from broadcast indices. I think we may be able to resolve this by mapping slices in this case to 1D arrays along their respective axes, and using our normal broadcasting rules. - Should we deprecate non-boolean indexing with `[]` and `.loc[]` and non-labelled arrays when some but not all dimensions are provided? Instead, we would require explicitly indexing like `[key, ...]` (yes, writing `...`), which indicates "all trailing axes" like NumPy. This behavior has been suggested for new indexers in NumPy because it precludes a class of bugs where the array has an unexpected number of dimensions. On the other hand, it's not so necessary for us when we have explicit indexing by dimension name with `.sel`. xref these comments from @MaximilianR and myself Note: I would certainly* welcome help making this happen from a contributor other than myself, though you should probably wait until I finish #964, first, which lays important groundwork.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/974/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
274308380	MDU6SXNzdWUyNzQzMDgzODA=	1720	Possible regression with PyNIO data not being lazily loaded	shoyer 1217238	closed		6	2017-11-15T21:20:41Z	2017-11-17T17:33:13Z	2017-11-17T16:44:40Z	MEMBER			@weathergod reports on the mailing list: I just tried [0.10.0 rc2] out in combination with the pynio engine (v1.5.0 from conda-forge), and doing a print on a dataset object causes all of the data to get loaded into memory.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1720/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
207587161	MDU6SXNzdWUyMDc1ODcxNjE=	1269	GroupBy like API for resample	shoyer 1217238	closed		6	2017-02-14T17:46:02Z	2017-09-22T16:27:35Z	2017-09-22T16:27:35Z	MEMBER			Since we wrote `resample` in xarray, pandas updated resample to have a groupyby-like API (e.g., `df.resample('24H').mean()` vs. the old `df.resample('24H')` that uses the mean by default). It would be nice to redo the xarray resample API to match, e.g., `ds.resample(time='24H').mean()` vs `ds.resample('time', '24H')`. This would solve a few use cases, including grouped-resample arithmetic, iterating over groups and (mostly) take care of the need for `pd.TimeGrouper` support (https://github.com/pydata/xarray/issues/364). If we use `**kwargs` for matching dimension names, this could be done with a minimally painful deprecation cycle.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1269/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
88075523	MDU6SXNzdWU4ODA3NTUyMw==	432	Tools for converting between xray.Dataset and nested dictionaries/JSON	shoyer 1217238	closed		6	2015-06-13T22:25:28Z	2016-08-11T21:54:51Z	2016-08-11T21:54:51Z	MEMBER			This came up in discussion with @freeman-lab -- xray does not have direct support for converting datasets to or from nested dictionaries (i.e., as could be serialized in JSON). This is quite straightforward to implement oneself, of course, but there's something to be said for making this more obvious. I'm thinking of a serialization format that looks something like this: `{ 'variables': { 'temperature': { 'dimensions': ['x'], 'data': [1, 2, 3], 'attributes': {} } ... } 'attributes': { 'title': 'My example dataset', ... } }` The solution here would be to either: 1. add a few examples to the IO documentation of how to roll this one-self, or 2. create a few helper methods/functions to make this even easier: `xray.Dataset.to_dict`/`xray.read_dict`.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/432/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
104768781	MDExOlB1bGxSZXF1ZXN0NDQxNDkyOTQ=	559	Fix pcolormesh plots with cartopy	shoyer 1217238	closed	0.6.1 1307323	6	2015-09-03T19:50:22Z	2015-11-15T21:49:11Z	2015-09-14T20:33:36Z	MEMBER	0	pydata/xarray/pulls/559	`python proj = ccrs.Orthographic(central_longitude=230, central_latitude=5) fig, ax = plt.subplots(figsize=(20, 8), subplot_kw=dict(projection=proj)) x.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree()) ax.coastlines()` Before: After:	{ "url": "https://api.github.com/repos/pydata/xarray/issues/559/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);