github: issues: 20 rows where type = "issue" and user = 4295853 sorted by updated

20 rows where type = "issue" and user = 4295853 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	milestone	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
178200674	MDU6SXNzdWUxNzgyMDA2NzQ=	1013	Groupby exclude dimension	pwolfram 4295853	open		8	2016-09-20T22:48:46Z	2020-10-04T16:05:06Z		CONTRIBUTOR	Is there some way to do a groupby operation where some dimension is excluded from the operation, e.g., a vectorized version of something like this: `python vertlevels = [ds.vel.sel(nVertLevels=i).groupby('y').mean() for i in ds.nVertLevels] xavgvel = xr.concat(vertlevels, 'nVertLevels')` The application here is to average 3D data in the x coordinate to the unique y coordinates, but not the vertical coordinate. Thus, we are basically looking for something that allows a coordinate to be excluded from the groupby operation, e.g., in this case the vertical coordinate. Ideally this would also be possible within the context of the `groupby_bins` operation. Any ideas on how this should work or a pointer on how to implement this type of operation more cleanly with existing infrastructure is greatly appreciated. This appears to be related to #324 and especially (perhaps identically) #924. cc @vanroekel, @shoyer, @jhamman, @rabernat, @MaximilianR	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1013/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
217584777	MDU6SXNzdWUyMTc1ODQ3Nzc=	1335	`cumsum` providing correct behavior for non-coordinate DataArrays?	pwolfram 4295853	open		5	2017-03-28T14:45:12Z	2019-03-31T05:41:09Z		CONTRIBUTOR	In the case of a DataArray without coordinates, should cumsum work without specifying an axis, e.g., `da.cumsum()` is valid? This is not currently the obtained behavior. ```python In [1]: import xarray as xr imp In [2]: import numpy as np In [3]: da = xr.DataArray(np.arange(10)) In [4]: da Out[4]: <xarray.DataArray (dim_0: 10)> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Dimensions without coordinates: dim_0 In [5]: da.cumsum(axis=0) Out[5]: <xarray.DataArray (dim_0: 10)> array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45]) Dimensions without coordinates: dim_0 In [6]: da.cumsum() ValueError Traceback (most recent call last) <ipython-input-5-4ff0efc782ee> in <module>() ----> 1 da.cumsum() /Users/pwolfram/src/xarray/xarray/core/common.pyc in wrapped_func(self, dim, axis, skipna, keep_attrs, kwargs) 17 keep_attrs=False, kwargs): 18 return self.reduce(func, dim, axis, keep_attrs=keep_attrs, ---> 19 skipna=skipna, allow_lazy=True, kwargs) 20 else: 21 def wrapped_func(self, dim=None, axis=None, keep_attrs=False, /Users/pwolfram/src/xarray/xarray/core/dataarray.pyc in reduce(self, func, dim, axis, keep_attrs, kwargs) 1146 summarized data and the indicated dimension(s) removed. 1147 """ -> 1148 var = self.variable.reduce(func, dim, axis, keep_attrs, kwargs) 1149 return self._replace_maybe_drop_dims(var) 1150 /Users/pwolfram/src/xarray/xarray/core/variable.pyc in reduce(self, func, dim, axis, keep_attrs, allow_lazy, kwargs) 898 if dim is None and axis is None: 899 raise ValueError("must supply either single 'dim' or 'axis' " --> 900 "argument to %s" % (func.name)) 901 902 if dim is not None: ValueError: must supply either single 'dim' or 'axis' argument to cumsum ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1335/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
181017850	MDU6SXNzdWUxODEwMTc4NTA=	1037	attrs empty for open_mfdataset vs population for open_dataset	pwolfram 4295853	closed		4	2016-10-04T22:08:54Z	2019-02-02T06:30:20Z	2019-02-02T06:30:20Z	CONTRIBUTOR	Previously, a dataset would store `attrs` corresponding to netCDF global attributes. For some reason, this behavior does not appear to be supported anymore. Using this dataset: https://github.com/pydata/xarray-data/raw/master/rasm.nc ``` python In [1]: import xarray as xr In [2]: ds = xr.open_dataset('rasm.nc') /Users/pwolfram/src/xarray/xarray/conventions.py:386: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range result = decode_cf_datetime(example_value, units, calendar) In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates: * time (time) object 1980-09-16T12:00:00 1980-10-17 ... yc (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ... xc (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... Data variables: Tair (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ... Attributes: title: /workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc institution: U.W. source: RACM R1002RBRxaaa01a output_frequency: daily output_mode: averaged convention: CF-1.4 references: Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429. comment: Output from the Variable Infiltration Capacity (VIC) model. nco_openmp_thread_number: 1 NCO: 4.3.7 history: history deleted for brevity In [4]: ds = xr.open_mfdataset('rasm.nc') In [5]: ds Out[5]: <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates: * time (time) object 1980-09-16T12:00:00 1980-10-17 ... yc (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ... xc (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... Data variables: Tair (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ... ``` The attributes for `open_mfdataset` are missing whereas in previous versions of xarray I do not believe that this was the case because one of my scripts is failing because it does not obtain attributes when using the `open_mfdataset` initialization. @shoyer and @jhamman, is this the expected behavior and was the prior behavior simply an unspecified side-effect of the code vs a design decision? My preference would be to keep as many attributes as possible when using `open_mfdataset` to best provenance the dataset, i.e., `ds.attrs` should not be empty following initialization.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1037/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
180729538	MDU6SXNzdWUxODA3Mjk1Mzg=	1033	Extra arguments in templated doc strings are not being replaced properly	pwolfram 4295853	closed		3	2016-10-03T19:48:32Z	2019-01-26T15:08:30Z	2019-01-26T15:08:30Z	CONTRIBUTOR	For example, at http://xarray.pydata.org/en/stable/generated/xarray.Dataset.prod.html?highlight=prod, func should actually be prod:	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1033/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
144957100	MDU6SXNzdWUxNDQ5NTcxMDA=	813	Load fails following squeeze	pwolfram 4295853	closed		2	2016-03-31T16:57:13Z	2019-01-23T00:58:00Z	2019-01-23T00:58:00Z	CONTRIBUTOR	A `load` that follows a `squeeze` returns an error whereas a `squeeze` following a `load` does not. For example, `python test = acase.isel(Nb=layernum).sel(Np=np.where(idx)[1]) test = test.squeeze('Nr') test.load()` produces the error ``` ValueError Traceback (most recent call last) <ipython-input-66-2a98e96bc20c> in <module>() 1 test = acase.isel(Nb=layernum).sel(Np=np.where(idx)[1]) 2 test = test.squeeze('Nr') ----> 3 test.load() 4 test = test.squeeze('Nr') /users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.pyc in load(self) 355 356 for k, data in zip(lazy_data, evaluated_data): --> 357 self.variables[k].data = data 358 359 return self /users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/variable.pyc in data(self, data) 247 if data.shape != self.shape: 248 raise ValueError( --> 249 "replacement data must match the Variable's shape") 250 self._data = data 251 ValueError: replacement data must match the Variable's shape ``` whereas `python test = acase.isel(Nb=layernum).sel(Np=np.where(idx)[1]) test.load() test = test.squeeze('Nr') test.load()` works without error.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/813/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
142498006	MDU6SXNzdWUxNDI0OTgwMDY=	798	Integration with dask/distributed (xarray backend design)	pwolfram 4295853	closed		59	2016-03-21T23:18:02Z	2019-01-13T04:12:32Z	2019-01-13T04:12:32Z	CONTRIBUTOR	Dask (https://github.com/dask/dask) currently provides on-node parallelism for medium-size data problems. However, large climate data sets will require multiple-node parallelism to analyze large climate data sets because this constitutes a big data problem. A likely solution to this issue is integration of distributed (https://github.com/dask/distributed) with dask. Distributed is now integrated with dask and its benefits are already starting to be realized, e.g., see http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3. Thus, this issue is designed to identify the steps needed to perform this integration, at a high-level. As stated by @shoyer, it will definitely require some refactoring of the xarray backend system to make this work cleanly, but that's OK -- the xarray backend system is indicated as experimental/internal API precisely because we hadn't figured out all the use cases yet." To be honest, I've never been entirely happy with the design we took there (we use inheritance rather than composition for backend classes), but we did get it to work for our use cases. Some refactoring with an eye towards compatibility with dask distributed seems like a very worthwhile endeavor. We do have the benefit of a pretty large test suite covering existing use cases. Thus, we have the chance to make xarray big-data capable as well as provide improvements to the backend. To this end, I'm starting this issue to help begin the design process following the xarray mailing list discussion some of us have been having (@shoyer, @mrocklin, @rabernat). Task To Do List: - [x] Verify asynchronous access error for `to_netcdf` output is resolved (e.g., https://github.com/pydata/xarray/issues/793) - [x] LRU-cached file IO supporting serialization to robustly support HDF/NetCDF reads	{ "url": "https://api.github.com/repos/pydata/xarray/issues/798/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
387892184	MDU6SXNzdWUzODc4OTIxODQ=	2592	Deprecated autoclose option	pwolfram 4295853	closed		4	2018-12-05T18:41:38Z	2018-12-05T18:54:28Z	2018-12-05T18:54:28Z	CONTRIBUTOR	In updated versions of xarray we are getting a deprecation error for `autoclose`, e.g., at https://github.com/MPAS-Dev/MPAS-Analysis/pull/501/. A look through the issues is not transparent as to this reason and this issue is to collect the high-level information on this change. Is there an alternative use that should be considered instead?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2592/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
148771214	MDU6SXNzdWUxNDg3NzEyMTQ=	826	Storing history of xarray operations	pwolfram 4295853	closed		5	2016-04-15T21:15:10Z	2018-09-26T16:28:08Z	2016-06-23T14:27:21Z	CONTRIBUTOR	It may be useful to keep track of operations applied to DataArrays and Datasets in order to enhance provenance of output netcdf datasets, particularly for scientific applications to enhance reproducibility. However, this essentially would require keeping track of all the operations that were used to produce a given DataArray or Dataset. Ideally, we would want this to eventually result in appending data to the 'history' attribute for calls to `*.to_netcdf(...)`. This would keep track of data manipulation similar to nco/ncks/etc operations.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/826/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
217385961	MDU6SXNzdWUyMTczODU5NjE=	1332	Shape preserving `diff` via new keywords	pwolfram 4295853	closed		10	2017-03-27T21:49:52Z	2018-09-21T20:02:43Z	2018-09-21T20:02:43Z	CONTRIBUTOR	Currently, an operation such as `ds.diff('x')` will result in a smaller size dimension, e.g., ```python In [1]: import xarray as xr In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]}) In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 1 2 3 Data variables: foo (x) int64 1 2 3 In [4]: ds.diff('x') Out[4]: <xarray.Dataset> Dimensions: (x: 2) Coordinates: * x (x) int64 2 3 Data variables: foo (x) int64 1 1 ``` However, there are cases where the same size would be beneficial to keep so that you would get ```python In [1]: import xarray as xr In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]}) In [3]: ds.diff('x', preserve_shape=True, empty_value=0) Out[3]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 1 2 3 Data variables: foo (x) int64 0 1 1 ``` Is there interest in addition of a `preserve_shape=True` keyword such that it results in this shape-preserving behavior? I'm proposing you could use this with `label='upper'` and `label='lower'`. `empty_value` could be a value or `empty_index` could be an index for the fill value. If `empty_value=None` and `empty_index=None`, it would produce a `nan`. The reason I'm asking the community is because this is at least the second time I've encountered an application where this behavior would be helpful, e.g., computing ocean layer thicknesses from bottom depths. A previous application was computation of a time step from time slice output and the desire to use this product in an approximated integral, e.g., `python y*diff(t, label='lower', preserve_shape=True)` where `y` and `t` are both of size `n`, which is effectively a left-sided Riemann sum.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1332/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
139956689	MDU6SXNzdWUxMzk5NTY2ODk=	789	Time limitation (between years 1678 and 2262) restrictive to climate community	pwolfram 4295853	closed		13	2016-03-10T17:21:17Z	2018-05-14T22:42:09Z	2018-05-14T22:42:09Z	CONTRIBUTOR	The restriction of One unfortunate limitation of using datetime64[ns] is that it limits the native representation of dates to those that fall between the years 1678 and 2262. When a netCDF file contains dates outside of these bounds, dates will be returned as arrays of netcdftime.datetime objects. is a potential roadblock inhibiting easy adoption of this library in the climate community.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/789/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
235278888	MDU6SXNzdWUyMzUyNzg4ODg=	1450	Should an `apply` method exist for `DataArray` similar to the definition for `Dataset`?	pwolfram 4295853	closed		3	2017-06-12T15:51:52Z	2017-06-13T14:14:27Z	2017-06-13T00:35:40Z	CONTRIBUTOR	The method `apply` is defined for `Dataset`. Is there a design reason why it is not defined for `DataArray`? In general I think it would be good to have calculation methods apply to both `Dataset` and `DataArray` as possible but am thinking I'm missing a key design element here.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1450/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
219043002	MDU6SXNzdWUyMTkwNDMwMDI=	1350	where(..., drop=True) error	pwolfram 4295853	closed	v0.9.3 2444330	4	2017-04-03T19:53:33Z	2017-04-14T03:50:53Z	2017-04-14T03:50:53Z	CONTRIBUTOR	These results appear to be incorrect unless I'm missing something: ```python In [1]: import xarray as xr In [2]: import numpy as np In [3]: array = xr.DataArray(np.zeros((1,2,3)), dims=['time','x','y'], coords={'x':np.arange(2)}) In [4]: array[0,1,1] = 1 In [5]: array.where(array !=0, drop=True) Out[5]: <xarray.DataArray (time: 1, x: 1, y: 1)> array([[[ 0.]]]) Coordinates: * x (x) int64 1 Dimensions without coordinates: time, y In [5]: array.where(array !=0, drop=True).values Out[5]: array([[[ 0.]]]) In [7]: array.values[array.values !=0] Out[7]: array([ 1.]) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1350/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
218277814	MDU6SXNzdWUyMTgyNzc4MTQ=	1341	where(..., drop=True) failure for empty mask on python 2.7	pwolfram 4295853	closed		4	2017-03-30T17:55:38Z	2017-04-02T22:43:53Z	2017-04-02T22:43:53Z	CONTRIBUTOR	The following fails for 2.7 but not 3.5 (reproducible script at https://gist.github.com/89bd5bd62a475510b2611cbff8d5c67a): ```python In [1]: import xarray as xr In [2]: import numpy as np In [3]: da = xr.DataArray(np.random.rand(100,10), dims=['nCells','nVertLevels']) In [4]: mask = xr.DataArray(np.zeros((100,), dtype='bool'), dims='nCells') In [5]: da.where(mask, drop=True) ValueError Traceback (most recent call last) <ipython-input-5-ca5cd9c083a9> in <module>() ----> 1 da.where(mask, drop=True) /Users/pwolfram/src/xarray/xarray/core/common.pyc in where(self, cond, other, drop) 681 outcond = cond.isel(clip) 682 indexers = {dim: outcond.get_index(dim) for dim in outcond.dims} --> 683 outobj = self.sel(indexers) 684 else: 685 outobj = self /Users/pwolfram/src/xarray/xarray/core/dataarray.pyc in sel(self, method, tolerance, drop, indexers) 670 self, indexers, method=method, tolerance=tolerance 671 ) --> 672 result = self.isel(drop=drop, pos_indexers) 673 return result._replace_indexes(new_indexes) 674 /Users/pwolfram/src/xarray/xarray/core/dataarray.pyc in isel(self, drop, indexers) 655 DataArray.sel 656 """ --> 657 ds = self._to_temp_dataset().isel(drop=drop, indexers) 658 return self._from_temp_dataset(ds) 659 /Users/pwolfram/src/xarray/xarray/core/dataset.pyc in isel(self, drop, indexers) 1115 for name, var in iteritems(self._variables): 1116 var_indexers = dict((k, v) for k, v in indexers if k in var.dims) -> 1117 new_var = var.isel(var_indexers) 1118 if not (drop and name in var_indexers): 1119 variables[name] = new_var /Users/pwolfram/src/xarray/xarray/core/variable.pyc in isel(self, indexers) 545 if dim in indexers: 546 key[i] = indexers[dim] --> 547 return self[tuple(key)] 548 549 def squeeze(self, dim=None): /Users/pwolfram/src/xarray/xarray/core/variable.pyc in getitem(self, key) 375 dims = tuple(dim for k, dim in zip(key, self.dims) 376 if not isinstance(k, (int, np.integer))) --> 377 values = self._indexable_data[key] 378 # orthogonal indexing should ensure the dimensionality is consistent 379 if hasattr(values, 'ndim'): /Users/pwolfram/src/xarray/xarray/core/indexing.pyc in getitem(self, key) 465 466 def getitem**(self, key): --> 467 key = self._convert_key(key) 468 return self._ensure_ndarray(self.array[key]) 469 /Users/pwolfram/src/xarray/xarray/core/indexing.pyc in _convert_key(self, key) 452 if any(not isinstance(k, (int, np.integer, slice)) for k in key): 453 # key would trigger fancy indexing --> 454 key = orthogonal_indexer(key, self.shape) 455 return key 456 /Users/pwolfram/src/xarray/xarray/core/indexing.pyc in orthogonal_indexer(key, shape) 77 """ 78 # replace Ellipsis objects with slices ---> 79 key = list(canonicalize_indexer(key, len(shape))) 80 # replace 1d arrays and slices with broadcast compatible arrays 81 # note: we treat integers separately (instead of turning them into 1d /Users/pwolfram/src/xarray/xarray/core/indexing.pyc in canonicalize_indexer(key, ndim) 65 return indexer 66 ---> 67 return tuple(canonicalize(k) for k in expanded_indexer(key, ndim)) 68 69 /Users/pwolfram/src/xarray/xarray/core/indexing.pyc in <genexpr>((k,)) 65 return indexer 66 ---> 67 return tuple(canonicalize(k) for k in expanded_indexer(key, ndim)) 68 69 /Users/pwolfram/src/xarray/xarray/core/indexing.pyc in canonicalize(indexer) 62 'array indexing; all subkeys must be ' 63 'slices, integers or sequences of ' ---> 64 'integers or Booleans' % indexer) 65 return indexer 66 ValueError: invalid subkey array([], dtype=object) for integer based array indexing; all subkeys must be slices, integers or sequences of integers or Booleans ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1341/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
218013400	MDU6SXNzdWUyMTgwMTM0MDA=	1338	Chunking and dask memory errors	pwolfram 4295853	closed		2	2017-03-29T21:22:49Z	2017-03-29T22:56:45Z	2017-03-29T22:56:45Z	CONTRIBUTOR	What is the standard way of sub-chunking to prevent dask memory errors? For large dataset files there could be a dimension, say `nCells`, that is large enough to fill RAM. If this occurs, is there an automatic mechanism to prevent out-of-memory errors in dask or is the user's responsibility to specify maximum chunk sizes on their own?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1338/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
140214928	MDU6SXNzdWUxNDAyMTQ5Mjg=	791	Adding cumsum / cumprod reduction operators	pwolfram 4295853	closed		12	2016-03-11T15:36:41Z	2016-10-04T22:16:26Z	2016-10-04T22:16:26Z	CONTRIBUTOR	It would be useful to have the cumsum / cumprod reduction operator for DataArray and Dataset, analagous to http://xarray.pydata.org/en/stable/generated/xarray.DataArray.sum.html?highlight=sum#xarray.DataArray.sum and http://xarray.pydata.org/en/stable/generated/xarray.Dataset.sum.html?highlight=sum#xarray.Dataset.sum I notice this is on the TODO at https://github.com/pydata/xarray/blob/master/xarray/core/ops.py#L54 and am assuming there is something subtle here about the implementation. I believe the issue was probably with dask, but the issue / PR at https://github.com/dask/dask/issues/923 & https://github.com/dask/dask/pull/925 may have removed the roadblock.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/791/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
178111738	MDU6SXNzdWUxNzgxMTE3Mzg=	1010	py.test fails on master	pwolfram 4295853	closed		5	2016-09-20T16:36:51Z	2016-09-20T17:13:14Z	2016-09-20T17:12:09Z	CONTRIBUTOR	Following creation of a new conda environment: `bash cd /tmp conda create -n test_xarray27 python=2.7 -y source activate test_xarray27 conda install matplotlib dask bottleneck pytest -y git clone git@github.com:pydata/xarray.git cd xarray git co master python setup.py develop py.test` returns ``` bash ┌─[pwolfram][shapiro][/tmp/xarray][10:34][±][master ✓] └─▪ py.test =========================================================================================================== test session starts ============================================================================================================ platform darwin -- Python 2.7.10 -- py-1.4.27 -- pytest-2.7.1 rootdir: /private/tmp/xarray, inifile: setup.cfg collected 1028 items xarray/test/test_backends.py ............................................................................................................................................................................................sssssssssssssssssssssssssssssssssssss.........sssssssssssssssssssssssss...... xarray/test/test_combine.py .............. xarray/test/test_conventions.py .............................................s............ xarray/test/test_dask.py ..........F.................... xarray/test/test_dataarray.py ...........................................................s...................................................s............. xarray/test/test_dataset.py ............................................................................................................................................. xarray/test/test_extensions.py .... xarray/test/test_formatting.py ......... xarray/test/test_groupby.py ... xarray/test/test_indexing.py ......... xarray/test/test_merge.py .............. xarray/test/test_ops.py ............. xarray/test/test_plot.py .............................................................................................................................................................................................. xarray/test/test_tutorial.py s xarray/test/test_ufuncs.py .... xarray/test/test_utils.py ................... xarray/test/test_variable.py ............................................................................................................................... xarray/test/test_xray.py . ================================================================================================================= FAILURES ================================================================================================================= _______________ TestVariable.test_reduce _______________ self = <xarray.test.test_dask.TestVariable testMethod=test_reduce> `def test_reduce(self): u = self.eager_var v = self.lazy_var self.assertLazyAndAllClose(u.mean(), v.mean()) self.assertLazyAndAllClose(u.std(), v.std())` `self.assertLazyAndAllClose(u.argmax(dim='x'), v.argmax(dim='x'))` xarray/test/test_dask.py:145: xarray/core/common.py:16: in wrapped_func skipna=skipna, allow_lazy=True, kwargs) xarray/core/variable.py:899: in reduce axis=axis, kwargs) xarray/core/ops.py:308: in f return func(values, axis=axis, *kwargs) xarray/core/ops.py:64: in f return getattr(module, name)(args,** kwargs) /Users/pwolfram/anaconda/lib/python2.7/site-packages/dask/array/reductions.py:542: in _ return arg_reduction(x, chunk, combine, agg, axis, split_every) x = dask.array<from-ar..., shape=(4, 6), dtype=float64, chunksize=(2, 2)>, chunk = <functools.partial object at 0x115f3ef70>, combine = <functools.partial object at 0x115f3efc8>, agg = <functools.partial object at 0x115f4b050> axis = (0,), split_every = None def arg_reduction(x, chunk, combine, agg, axis=None, split_every=None): """Generic function for argreduction. Parameters ---------- x : Array chunk : callable Partialed ``arg_chunk``. combine : callable Partialed ``arg_combine``. agg : callable Partialed ``arg_agg``. axis : int, optional split_every : int or dict, optional """ if axis is None: axis = tuple(range(x.ndim)) ravel = True elif isinstance(axis, int): if axis < 0: axis += x.ndim if axis < 0 or axis >= x.ndim: raise ValueError("axis entry is out of bounds") axis = (axis,) ravel = x.ndim == 1 else: raise TypeError("axis must be either `None` or int, " "got '{0}'".format(axis)) # Map chunk across all blocks name = 'arg-reduce-chunk-{0}'.format(tokenize(chunk, axis)) old = x.name keys = list(product(map(range, x.numblocks))) offsets = list(product((accumulate(operator.add, bd[:-1], 0) `for bd in x.chunks)))` E TypeError: type object argument after * must be a sequence, not generator /Users/pwolfram/anaconda/lib/python2.7/site-packages/dask/array/reductions.py:510: TypeError ============================================================================================ 1 failed, 961 passed, 66 skipped in 52.49 seconds ============================================================================================= ``` cc @shoyer	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1010/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
144683276	MDU6SXNzdWUxNDQ2ODMyNzY=	811	Selection based on boolean DataArray	pwolfram 4295853	closed		17	2016-03-30T18:38:34Z	2016-04-15T20:30:03Z	2016-04-15T20:30:03Z	CONTRIBUTOR	Should xarray indexing account for boolean values without resorting to a call to `np.where`? For example, `acase.sel(Np=np.where(idx)[0])` works but `acase.sel(Np=idx)` does not.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/811/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
144037264	MDU6SXNzdWUxNDQwMzcyNjQ=	807	cumprod returns errors	pwolfram 4295853	closed		3	2016-03-28T17:50:13Z	2016-03-31T23:39:55Z	2016-03-31T23:39:55Z	CONTRIBUTOR	The xarray implementation of `cumprod` returns an assertion error, presumably because of bottleneck, e.g., https://github.com/pydata/xarray/blob/master/xarray/core/ops.py#L333. The error is ``` └─▪ ./test_cumprod.py [ 0.8841785 0.54181236 0.29075258 0.28883015 0.1137352 0.09909713 0.03570122 0.0304542 0.01578143 0.01496195 0.01442681 0.00980845] Traceback (most recent call last): File "./test_cumprod.py", line 13, in <module> foo.cumprod() File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/common.py", line 16, in wrapped_func skipna=skipna, allow_lazy=True, kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/dataarray.py", line 991, in reduce var = self.variable.reduce(func, dim, axis, keep_attrs, kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 871, in reduce axis=axis, kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/ops.py", line 346, in f assert using_numpy_nan_func AssertionError ``` If bottleneck is uninstalled then a value error is returned: └─▪ ./test_cumprod.py [ 2.99508768e-01 2.80142920e-01 1.56389242e-01 1.10791301e-01 4.58372649e-02 4.10865622e-02 9.91362500e-03 6.76033435e-03 3.83574249e-03 9.54972340e-04 1.56846616e-04 6.44088547e-05] Traceback (most recent call last): File "./test_cumprod.py", line 13, in <module> foo.cumprod() File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/common.py", line 16, in wrapped_func skipna=skipna, allow_lazy=True, kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/dataarray.py", line 991, in reduce var = self.variable.reduce(func, dim, axis, keep_attrs, **kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 880, in reduce return Variable(dims, data, attrs=attrs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 213, in __init__ self._dims = self._parse_dimensions(dims) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 321, in _parse_dimensions % (dims, self.ndim)) ValueError: dimensions () must have the same length as the number of data dimensions, ndim=1 No error occurs if the data array is converted to a numpy array prior to use of `cumprod`. This can easily be reproduced by https://gist.github.com/c32f231b773ecc4b0ccf, excerpted below ``` import numpy as np import pandas as pd import xarray as xr data = np.random.rand(4, 3) locs = ['IA', 'IL', 'IN'] times = pd.date_range('2000-01-01', periods=4) foo = xr.DataArray(data, coords=[times, locs], dims=['time', 'space']) print foo.values.cumprod() foo.cumprod() ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/807/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
140291221	MDU6SXNzdWUxNDAyOTEyMjE=	793	dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf	pwolfram 4295853	closed		21	2016-03-11T21:04:36Z	2016-03-24T02:49:26Z	2016-03-24T02:49:13Z	CONTRIBUTOR	Dask appears to be failing on serialization following a ds.to_netcdef() via a NETCDF: HDF error. Excerpted error below: ``` Traceback (most recent call last): File "reduce_dispersion_file.py", line 40, in <module> if name == "main": File "reduce_dispersion_file.py", line 36, in reduce_dispersion_file with timeit_context('output to disk'): File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.py", line 791, in to_netcdf engine=engine, encoding=encoding) File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/api.py", line 356, in to_netcdf dataset.dump_to_store(store, sync=sync, encoding=encoding) File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.py", line 739, in dump_to_store store.sync() File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/netCDF4_.py", line 283, in sync super(NetCDF4DataStore, self).sync() File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/common.py", line 186, in sync self.writer.sync() File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/common.py", line 165, in sync da.store(self.sources, self.targets) File "/users/pwolfram/lib/python2.7/site-packages/dask/array/core.py", line 712, in store Array._get(dsk, keys, kwargs) File "/users/pwolfram/lib/python2.7/site-packages/dask/base.py", line 43, in _get return get(dsk2, keys, kwargs) File "/users/pwolfram/lib/python2.7/site-packages/dask/threaded.py", line 57, in get *kwargs) File "/users/pwolfram/lib/python2.7/site-packages/dask/async.py", line 481, in get_async raise(remote_exception(res, tb)) dask.async.RuntimeError: NetCDF: HDF error Traceback File "/users/pwolfram/lib/python2.7/site-packages/dask/async.py", line 264, in execute_task result = _execute_task(task, data) File "/users/pwolfram/lib/python2.7/site-packages/dask/async.py", line 246, in _execute_task return func(args2) File "/users/pwolfram/lib/python2.7/site-packages/dask/array/core.py", line 1954, in store out[index] = np.asanyarray(x) File "netCDF4/_netCDF4.pyx", line 3678, in netCDF4._netCDF4.Variable.setitem (netCDF4/_netCDF4.c:37215) File "netCDF4/_netCDF4.pyx", line 3887, in netCDF4._netCDF4.Variable._put (netCDF4/_netCDF4.c:38907) ``` Script used: https://gist.github.com/98acaa31a4533b490f78 Full output: https://gist.github.com/248efce774ad08cb1dd6	{ "url": "https://api.github.com/repos/pydata/xarray/issues/793/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
138332032	MDU6SXNzdWUxMzgzMzIwMzI=	783	Array size changes following loading of numpy array	pwolfram 4295853	closed		19	2016-03-03T23:44:39Z	2016-03-08T23:41:38Z	2016-03-08T23:37:16Z	CONTRIBUTOR	The issue in a nutshell is that `(Pdb) rlzns.xParticle[rnumNtr:(rnum+1)Ntr,:].shape (30, 1012000) (Pdb) rlzns.xParticle[rnumNtr:(rnum+1)Ntr,:].values.shape (29, 1012000) (Pdb) rlzns.xParticle[rnumNtr:(rnum+1)Ntr,:].data.shape (30, 1012000) (Pdb) rlzns.xParticle[rnumNtr:(rnum+1)Ntr,:].data dask.array<getitem..., shape=(30, 1012000), dtype=float64, chunksize=(23, 1012000)>` It seems to me that for some reason when the array is loaded via values that it is no longer the same size. The dask shape appears to be correct. I previously do a filter on time via `rlzns = rlzns.isel(Time=np.where(reset > 0)[0])` and do some commands like `np.reshape(rlzns.Time[rnumNtr:(rnum+1)Ntr].values,(1,Ntr)),axis=1)` but it seems unlikely that this would be causing the problem. Has anyone had an issue like this? Any ideas on what could be causing the problem would be greatly appreciated because this behavior is very strange.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/783/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

20 rows where type = "issue" and user = 4295853 sorted by updated_at descending

In [6]: da.cumsum()

```

In [5]: da.where(mask, drop=True)

Traceback

Advanced export