home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

20 rows where type = "issue" and user = 4295853 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 18
  • open 2

type 1

  • issue · 20 ✖

repo 1

  • xarray 20
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
178200674 MDU6SXNzdWUxNzgyMDA2NzQ= 1013 Groupby exclude dimension pwolfram 4295853 open 0     8 2016-09-20T22:48:46Z 2020-10-04T16:05:06Z   CONTRIBUTOR      

Is there some way to do a groupby operation where some dimension is excluded from the operation, e.g., a vectorized version of something like this:

python vertlevels = [ds.vel.sel(nVertLevels=i).groupby('y').mean() for i in ds.nVertLevels] xavgvel = xr.concat(vertlevels, 'nVertLevels')

The application here is to average 3D data in the x coordinate to the unique y coordinates, but not the vertical coordinate.

Thus, we are basically looking for something that allows a coordinate to be excluded from the groupby operation, e.g., in this case the vertical coordinate. Ideally this would also be possible within the context of the groupby_bins operation.

Any ideas on how this should work or a pointer on how to implement this type of operation more cleanly with existing infrastructure is greatly appreciated. This appears to be related to #324 and especially (perhaps identically) #924.

cc @vanroekel, @shoyer, @jhamman, @rabernat, @MaximilianR

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1013/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
217584777 MDU6SXNzdWUyMTc1ODQ3Nzc= 1335 `cumsum` providing correct behavior for non-coordinate DataArrays? pwolfram 4295853 open 0     5 2017-03-28T14:45:12Z 2019-03-31T05:41:09Z   CONTRIBUTOR      

In the case of a DataArray without coordinates, should cumsum work without specifying an axis, e.g., da.cumsum() is valid? This is not currently the obtained behavior.

```python In [1]: import xarray as xr imp In [2]: import numpy as np

In [3]: da = xr.DataArray(np.arange(10))

In [4]: da Out[4]: <xarray.DataArray (dim_0: 10)> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) Dimensions without coordinates: dim_0

In [5]: da.cumsum(axis=0) Out[5]: <xarray.DataArray (dim_0: 10)> array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45]) Dimensions without coordinates: dim_0

In [6]: da.cumsum()

ValueError Traceback (most recent call last) <ipython-input-5-4ff0efc782ee> in <module>() ----> 1 da.cumsum()

/Users/pwolfram/src/xarray/xarray/core/common.pyc in wrapped_func(self, dim, axis, skipna, keep_attrs, kwargs) 17 keep_attrs=False, kwargs): 18 return self.reduce(func, dim, axis, keep_attrs=keep_attrs, ---> 19 skipna=skipna, allow_lazy=True, **kwargs) 20 else: 21 def wrapped_func(self, dim=None, axis=None, keep_attrs=False,

/Users/pwolfram/src/xarray/xarray/core/dataarray.pyc in reduce(self, func, dim, axis, keep_attrs, kwargs) 1146 summarized data and the indicated dimension(s) removed. 1147 """ -> 1148 var = self.variable.reduce(func, dim, axis, keep_attrs, kwargs) 1149 return self._replace_maybe_drop_dims(var) 1150

/Users/pwolfram/src/xarray/xarray/core/variable.pyc in reduce(self, func, dim, axis, keep_attrs, allow_lazy, **kwargs) 898 if dim is None and axis is None: 899 raise ValueError("must supply either single 'dim' or 'axis' " --> 900 "argument to %s" % (func.name)) 901 902 if dim is not None:

ValueError: must supply either single 'dim' or 'axis' argument to cumsum

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1335/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
181017850 MDU6SXNzdWUxODEwMTc4NTA= 1037 attrs empty for open_mfdataset vs population for open_dataset pwolfram 4295853 closed 0     4 2016-10-04T22:08:54Z 2019-02-02T06:30:20Z 2019-02-02T06:30:20Z CONTRIBUTOR      

Previously, a dataset would store attrs corresponding to netCDF global attributes. For some reason, this behavior does not appear to be supported anymore. Using this dataset: https://github.com/pydata/xarray-data/raw/master/rasm.nc

``` python In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('rasm.nc') /Users/pwolfram/src/xarray/xarray/conventions.py:386: RuntimeWarning: Unable to decode time axis into full numpy.datetime64 objects, continuing using dummy netCDF4.datetime objects instead, reason: dates out of range result = decode_cf_datetime(example_value, units, calendar)

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates: * time (time) object 1980-09-16T12:00:00 1980-10-17 ... yc (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ... xc (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... Data variables: Tair (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ... Attributes: title: /workspace/jhamman/processed/R1002RBRxaaa01a/lnd/temp/R1002RBRxaaa01a.vic.ha.1979-09-01.nc institution: U.W. source: RACM R1002RBRxaaa01a output_frequency: daily output_mode: averaged convention: CF-1.4 references: Based on the initial model of Liang et al., 1994, JGR, 99, 14,415- 14,429. comment: Output from the Variable Infiltration Capacity (VIC) model. nco_openmp_thread_number: 1 NCO: 4.3.7 history: history deleted for brevity

In [4]: ds = xr.open_mfdataset('rasm.nc')

In [5]: ds Out[5]: <xarray.Dataset> Dimensions: (time: 36, x: 275, y: 205) Coordinates: * time (time) object 1980-09-16T12:00:00 1980-10-17 ... yc (y, x) float64 16.53 16.78 17.02 17.27 17.51 17.76 18.0 18.25 ... xc (y, x) float64 189.2 189.4 189.6 189.7 189.9 190.1 190.2 190.4 ... * y (y) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... Data variables: Tair (time, y, x) float64 nan nan nan nan nan nan nan nan nan nan ...

```

The attributes for open_mfdataset are missing whereas in previous versions of xarray I do not believe that this was the case because one of my scripts is failing because it does not obtain attributes when using the open_mfdataset initialization.

@shoyer and @jhamman, is this the expected behavior and was the prior behavior simply an unspecified side-effect of the code vs a design decision? My preference would be to keep as many attributes as possible when using open_mfdataset to best provenance the dataset, i.e., ds.attrs should not be empty following initialization.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1037/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
180729538 MDU6SXNzdWUxODA3Mjk1Mzg= 1033 Extra arguments in templated doc strings are not being replaced properly pwolfram 4295853 closed 0     3 2016-10-03T19:48:32Z 2019-01-26T15:08:30Z 2019-01-26T15:08:30Z CONTRIBUTOR      

For example, at http://xarray.pydata.org/en/stable/generated/xarray.Dataset.prod.html?highlight=prod, func should actually be prod:

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1033/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
144957100 MDU6SXNzdWUxNDQ5NTcxMDA= 813 Load fails following squeeze pwolfram 4295853 closed 0     2 2016-03-31T16:57:13Z 2019-01-23T00:58:00Z 2019-01-23T00:58:00Z CONTRIBUTOR      

A load that follows a squeeze returns an error whereas a squeeze following a load does not.

For example,

python test = acase.isel(Nb=layernum).sel(Np=np.where(idx)[1]) test = test.squeeze('Nr') test.load()

produces the error

```

ValueError Traceback (most recent call last) <ipython-input-66-2a98e96bc20c> in <module>() 1 test = acase.isel(Nb=layernum).sel(Np=np.where(idx)[1]) 2 test = test.squeeze('Nr') ----> 3 test.load() 4 test = test.squeeze('Nr')

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.pyc in load(self) 355 356 for k, data in zip(lazy_data, evaluated_data): --> 357 self.variables[k].data = data 358 359 return self

/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/variable.pyc in data(self, data) 247 if data.shape != self.shape: 248 raise ValueError( --> 249 "replacement data must match the Variable's shape") 250 self._data = data 251

ValueError: replacement data must match the Variable's shape ```

whereas

python test = acase.isel(Nb=layernum).sel(Np=np.where(idx)[1]) test.load() test = test.squeeze('Nr') test.load()

works without error.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
142498006 MDU6SXNzdWUxNDI0OTgwMDY= 798 Integration with dask/distributed (xarray backend design) pwolfram 4295853 closed 0     59 2016-03-21T23:18:02Z 2019-01-13T04:12:32Z 2019-01-13T04:12:32Z CONTRIBUTOR      

Dask (https://github.com/dask/dask) currently provides on-node parallelism for medium-size data problems. However, large climate data sets will require multiple-node parallelism to analyze large climate data sets because this constitutes a big data problem. A likely solution to this issue is integration of distributed (https://github.com/dask/distributed) with dask. Distributed is now integrated with dask and its benefits are already starting to be realized, e.g., see http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3.

Thus, this issue is designed to identify the steps needed to perform this integration, at a high-level. As stated by @shoyer, it will

definitely require some refactoring of the xarray backend system to make this work cleanly, but that's OK -- the xarray backend system is indicated as experimental/internal API precisely because we hadn't figured out all the use cases yet."

To be honest, I've never been entirely happy with the design we took there (we use inheritance rather than composition for backend classes), but we did get it to work for our use cases. Some refactoring with an eye towards compatibility with dask distributed seems like a very worthwhile endeavor. We do have the benefit of a pretty large test suite covering existing use cases.

Thus, we have the chance to make xarray big-data capable as well as provide improvements to the backend.

To this end, I'm starting this issue to help begin the design process following the xarray mailing list discussion some of us have been having (@shoyer, @mrocklin, @rabernat).

Task To Do List: - [x] Verify asynchronous access error for to_netcdf output is resolved (e.g., https://github.com/pydata/xarray/issues/793) - [x] LRU-cached file IO supporting serialization to robustly support HDF/NetCDF reads

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/798/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
387892184 MDU6SXNzdWUzODc4OTIxODQ= 2592 Deprecated autoclose option pwolfram 4295853 closed 0     4 2018-12-05T18:41:38Z 2018-12-05T18:54:28Z 2018-12-05T18:54:28Z CONTRIBUTOR      

In updated versions of xarray we are getting a deprecation error for autoclose, e.g., at https://github.com/MPAS-Dev/MPAS-Analysis/pull/501/.

A look through the issues is not transparent as to this reason and this issue is to collect the high-level information on this change.

Is there an alternative use that should be considered instead?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2592/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
148771214 MDU6SXNzdWUxNDg3NzEyMTQ= 826 Storing history of xarray operations pwolfram 4295853 closed 0     5 2016-04-15T21:15:10Z 2018-09-26T16:28:08Z 2016-06-23T14:27:21Z CONTRIBUTOR      

It may be useful to keep track of operations applied to DataArrays and Datasets in order to enhance provenance of output netcdf datasets, particularly for scientific applications to enhance reproducibility. However, this essentially would require keeping track of all the operations that were used to produce a given DataArray or Dataset.

Ideally, we would want this to eventually result in appending data to the 'history' attribute for calls to *.to_netcdf(...). This would keep track of data manipulation similar to nco/ncks/etc operations.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/826/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
217385961 MDU6SXNzdWUyMTczODU5NjE= 1332 Shape preserving `diff` via new keywords pwolfram 4295853 closed 0     10 2017-03-27T21:49:52Z 2018-09-21T20:02:43Z 2018-09-21T20:02:43Z CONTRIBUTOR      

Currently, an operation such as ds.diff('x') will result in a smaller size dimension, e.g.,

```python In [1]: import xarray as xr

In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})

In [3]: ds Out[3]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 1 2 3 Data variables: foo (x) int64 1 2 3

In [4]: ds.diff('x') Out[4]: <xarray.Dataset> Dimensions: (x: 2) Coordinates: * x (x) int64 2 3 Data variables: foo (x) int64 1 1 ```

However, there are cases where the same size would be beneficial to keep so that you would get ```python In [1]: import xarray as xr

In [2]: ds = xr.Dataset({'foo': (('x',), [1, 2, 3])}, {'x': [1, 2, 3]})

In [3]: ds.diff('x', preserve_shape=True, empty_value=0) Out[3]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 1 2 3 Data variables: foo (x) int64 0 1 1 ```

Is there interest in addition of a preserve_shape=True keyword such that it results in this shape-preserving behavior? I'm proposing you could use this with label='upper' and label='lower'.

empty_value could be a value or empty_index could be an index for the fill value. If empty_value=None and empty_index=None, it would produce a nan.

The reason I'm asking the community is because this is at least the second time I've encountered an application where this behavior would be helpful, e.g., computing ocean layer thicknesses from bottom depths. A previous application was computation of a time step from time slice output and the desire to use this product in an approximated integral, e.g., python y*diff(t, label='lower', preserve_shape=True) where y and t are both of size n, which is effectively a left-sided Riemann sum.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1332/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
139956689 MDU6SXNzdWUxMzk5NTY2ODk= 789 Time limitation (between years 1678 and 2262) restrictive to climate community pwolfram 4295853 closed 0     13 2016-03-10T17:21:17Z 2018-05-14T22:42:09Z 2018-05-14T22:42:09Z CONTRIBUTOR      

The restriction of

One unfortunate limitation of using datetime64[ns] is that it limits the native representation of dates to those that fall between the years 1678 and 2262. When a netCDF file contains dates outside of these bounds, dates will be returned as arrays of netcdftime.datetime objects.

is a potential roadblock inhibiting easy adoption of this library in the climate community.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/789/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
235278888 MDU6SXNzdWUyMzUyNzg4ODg= 1450 Should an `apply` method exist for `DataArray` similar to the definition for `Dataset`? pwolfram 4295853 closed 0     3 2017-06-12T15:51:52Z 2017-06-13T14:14:27Z 2017-06-13T00:35:40Z CONTRIBUTOR      

The method apply is defined for Dataset. Is there a design reason why it is not defined for DataArray? In general I think it would be good to have calculation methods apply to both Dataset and DataArray as possible but am thinking I'm missing a key design element here.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1450/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
219043002 MDU6SXNzdWUyMTkwNDMwMDI= 1350 where(..., drop=True) error pwolfram 4295853 closed 0   v0.9.3 2444330 4 2017-04-03T19:53:33Z 2017-04-14T03:50:53Z 2017-04-14T03:50:53Z CONTRIBUTOR      

These results appear to be incorrect unless I'm missing something: ```python In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: array = xr.DataArray(np.zeros((1,2,3)), dims=['time','x','y'], coords={'x':np.arange(2)})

In [4]: array[0,1,1] = 1

In [5]: array.where(array !=0, drop=True) Out[5]: <xarray.DataArray (time: 1, x: 1, y: 1)> array([[[ 0.]]]) Coordinates: * x (x) int64 1 Dimensions without coordinates: time, y

In [5]: array.where(array !=0, drop=True).values Out[5]: array([[[ 0.]]])

In [7]: array.values[array.values !=0] Out[7]: array([ 1.]) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1350/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
218277814 MDU6SXNzdWUyMTgyNzc4MTQ= 1341 where(..., drop=True) failure for empty mask on python 2.7 pwolfram 4295853 closed 0     4 2017-03-30T17:55:38Z 2017-04-02T22:43:53Z 2017-04-02T22:43:53Z CONTRIBUTOR      

The following fails for 2.7 but not 3.5 (reproducible script at https://gist.github.com/89bd5bd62a475510b2611cbff8d5c67a): ```python In [1]: import xarray as xr

In [2]: import numpy as np

In [3]: da = xr.DataArray(np.random.rand(100,10), dims=['nCells','nVertLevels'])

In [4]: mask = xr.DataArray(np.zeros((100,), dtype='bool'), dims='nCells')

In [5]: da.where(mask, drop=True)

ValueError Traceback (most recent call last) <ipython-input-5-ca5cd9c083a9> in <module>() ----> 1 da.where(mask, drop=True)

/Users/pwolfram/src/xarray/xarray/core/common.pyc in where(self, cond, other, drop) 681 outcond = cond.isel(clip) 682 indexers = {dim: outcond.get_index(dim) for dim in outcond.dims} --> 683 outobj = self.sel(indexers) 684 else: 685 outobj = self

/Users/pwolfram/src/xarray/xarray/core/dataarray.pyc in sel(self, method, tolerance, drop, indexers) 670 self, indexers, method=method, tolerance=tolerance 671 ) --> 672 result = self.isel(drop=drop, pos_indexers) 673 return result._replace_indexes(new_indexes) 674

/Users/pwolfram/src/xarray/xarray/core/dataarray.pyc in isel(self, drop, indexers) 655 DataArray.sel 656 """ --> 657 ds = self._to_temp_dataset().isel(drop=drop, indexers) 658 return self._from_temp_dataset(ds) 659

/Users/pwolfram/src/xarray/xarray/core/dataset.pyc in isel(self, drop, indexers) 1115 for name, var in iteritems(self._variables): 1116 var_indexers = dict((k, v) for k, v in indexers if k in var.dims) -> 1117 new_var = var.isel(var_indexers) 1118 if not (drop and name in var_indexers): 1119 variables[name] = new_var

/Users/pwolfram/src/xarray/xarray/core/variable.pyc in isel(self, **indexers) 545 if dim in indexers: 546 key[i] = indexers[dim] --> 547 return self[tuple(key)] 548 549 def squeeze(self, dim=None):

/Users/pwolfram/src/xarray/xarray/core/variable.pyc in getitem(self, key) 375 dims = tuple(dim for k, dim in zip(key, self.dims) 376 if not isinstance(k, (int, np.integer))) --> 377 values = self._indexable_data[key] 378 # orthogonal indexing should ensure the dimensionality is consistent 379 if hasattr(values, 'ndim'):

/Users/pwolfram/src/xarray/xarray/core/indexing.pyc in getitem(self, key) 465 466 def getitem(self, key): --> 467 key = self._convert_key(key) 468 return self._ensure_ndarray(self.array[key]) 469

/Users/pwolfram/src/xarray/xarray/core/indexing.pyc in _convert_key(self, key) 452 if any(not isinstance(k, (int, np.integer, slice)) for k in key): 453 # key would trigger fancy indexing --> 454 key = orthogonal_indexer(key, self.shape) 455 return key 456

/Users/pwolfram/src/xarray/xarray/core/indexing.pyc in orthogonal_indexer(key, shape) 77 """ 78 # replace Ellipsis objects with slices ---> 79 key = list(canonicalize_indexer(key, len(shape))) 80 # replace 1d arrays and slices with broadcast compatible arrays 81 # note: we treat integers separately (instead of turning them into 1d

/Users/pwolfram/src/xarray/xarray/core/indexing.pyc in canonicalize_indexer(key, ndim) 65 return indexer 66 ---> 67 return tuple(canonicalize(k) for k in expanded_indexer(key, ndim)) 68 69

/Users/pwolfram/src/xarray/xarray/core/indexing.pyc in <genexpr>((k,)) 65 return indexer 66 ---> 67 return tuple(canonicalize(k) for k in expanded_indexer(key, ndim)) 68 69

/Users/pwolfram/src/xarray/xarray/core/indexing.pyc in canonicalize(indexer) 62 'array indexing; all subkeys must be ' 63 'slices, integers or sequences of ' ---> 64 'integers or Booleans' % indexer) 65 return indexer 66

ValueError: invalid subkey array([], dtype=object) for integer based array indexing; all subkeys must be slices, integers or sequences of integers or Booleans ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1341/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
218013400 MDU6SXNzdWUyMTgwMTM0MDA= 1338 Chunking and dask memory errors pwolfram 4295853 closed 0     2 2017-03-29T21:22:49Z 2017-03-29T22:56:45Z 2017-03-29T22:56:45Z CONTRIBUTOR      

What is the standard way of sub-chunking to prevent dask memory errors? For large dataset files there could be a dimension, say nCells, that is large enough to fill RAM. If this occurs, is there an automatic mechanism to prevent out-of-memory errors in dask or is the user's responsibility to specify maximum chunk sizes on their own?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1338/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
140214928 MDU6SXNzdWUxNDAyMTQ5Mjg= 791 Adding cumsum / cumprod reduction operators pwolfram 4295853 closed 0     12 2016-03-11T15:36:41Z 2016-10-04T22:16:26Z 2016-10-04T22:16:26Z CONTRIBUTOR      

It would be useful to have the cumsum / cumprod reduction operator for DataArray and Dataset, analagous to http://xarray.pydata.org/en/stable/generated/xarray.DataArray.sum.html?highlight=sum#xarray.DataArray.sum and http://xarray.pydata.org/en/stable/generated/xarray.Dataset.sum.html?highlight=sum#xarray.Dataset.sum

I notice this is on the TODO at https://github.com/pydata/xarray/blob/master/xarray/core/ops.py#L54 and am assuming there is something subtle here about the implementation. I believe the issue was probably with dask, but the issue / PR at https://github.com/dask/dask/issues/923 & https://github.com/dask/dask/pull/925 may have removed the roadblock.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/791/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
178111738 MDU6SXNzdWUxNzgxMTE3Mzg= 1010 py.test fails on master pwolfram 4295853 closed 0     5 2016-09-20T16:36:51Z 2016-09-20T17:13:14Z 2016-09-20T17:12:09Z CONTRIBUTOR      

Following creation of a new conda environment:

bash cd /tmp conda create -n test_xarray27 python=2.7 -y source activate test_xarray27 conda install matplotlib dask bottleneck pytest -y git clone git@github.com:pydata/xarray.git cd xarray git co master python setup.py develop py.test

returns

``` bash ┌─[pwolfram][shapiro][/tmp/xarray][10:34][±][master ✓] └─▪ py.test =========================================================================================================== test session starts ============================================================================================================ platform darwin -- Python 2.7.10 -- py-1.4.27 -- pytest-2.7.1 rootdir: /private/tmp/xarray, inifile: setup.cfg collected 1028 items

xarray/test/test_backends.py ............................................................................................................................................................................................sssssssssssssssssssssssssssssssssssss.........sssssssssssssssssssssssss...... xarray/test/test_combine.py .............. xarray/test/test_conventions.py .............................................s............ xarray/test/test_dask.py ..........F.................... xarray/test/test_dataarray.py ...........................................................s...................................................s............. xarray/test/test_dataset.py ............................................................................................................................................. xarray/test/test_extensions.py .... xarray/test/test_formatting.py ......... xarray/test/test_groupby.py ... xarray/test/test_indexing.py ......... xarray/test/test_merge.py .............. xarray/test/test_ops.py ............. xarray/test/test_plot.py .............................................................................................................................................................................................. xarray/test/test_tutorial.py s xarray/test/test_ufuncs.py .... xarray/test/test_utils.py ................... xarray/test/test_variable.py ............................................................................................................................... xarray/test/test_xray.py .

================================================================================================================= FAILURES ================================================================================================================= _______________ TestVariable.test_reduce _______________

self = <xarray.test.test_dask.TestVariable testMethod=test_reduce>

def test_reduce(self):
    u = self.eager_var
    v = self.lazy_var
    self.assertLazyAndAllClose(u.mean(), v.mean())
    self.assertLazyAndAllClose(u.std(), v.std())
  self.assertLazyAndAllClose(u.argmax(dim='x'), v.argmax(dim='x'))

xarray/test/test_dask.py:145:


xarray/core/common.py:16: in wrapped_func skipna=skipna, allow_lazy=True, kwargs) xarray/core/variable.py:899: in reduce axis=axis, kwargs) xarray/core/ops.py:308: in f return func(values, axis=axis, kwargs) xarray/core/ops.py:64: in f return getattr(module, name)(*args, kwargs) /Users/pwolfram/anaconda/lib/python2.7/site-packages/dask/array/reductions.py:542: in _ return arg_reduction(x, chunk, combine, agg, axis, split_every)


x = dask.array<from-ar..., shape=(4, 6), dtype=float64, chunksize=(2, 2)>, chunk = <functools.partial object at 0x115f3ef70>, combine = <functools.partial object at 0x115f3efc8>, agg = <functools.partial object at 0x115f4b050> axis = (0,), split_every = None

def arg_reduction(x, chunk, combine, agg, axis=None, split_every=None):
    """Generic function for argreduction.

    Parameters
    ----------
    x : Array
    chunk : callable
        Partialed ``arg_chunk``.
    combine : callable
        Partialed ``arg_combine``.
    agg : callable
        Partialed ``arg_agg``.
    axis : int, optional
    split_every : int or dict, optional
    """
    if axis is None:
        axis = tuple(range(x.ndim))
        ravel = True
    elif isinstance(axis, int):
        if axis < 0:
            axis += x.ndim
        if axis < 0 or axis >= x.ndim:
            raise ValueError("axis entry is out of bounds")
        axis = (axis,)
        ravel = x.ndim == 1
    else:
        raise TypeError("axis must be either `None` or int, "
                        "got '{0}'".format(axis))

    # Map chunk across all blocks
    name = 'arg-reduce-chunk-{0}'.format(tokenize(chunk, axis))
    old = x.name
    keys = list(product(*map(range, x.numblocks)))
    offsets = list(product(*(accumulate(operator.add, bd[:-1], 0)
                           for bd in x.chunks)))

E TypeError: type object argument after * must be a sequence, not generator

/Users/pwolfram/anaconda/lib/python2.7/site-packages/dask/array/reductions.py:510: TypeError ============================================================================================ 1 failed, 961 passed, 66 skipped in 52.49 seconds =============================================================================================

```

cc @shoyer

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1010/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
144683276 MDU6SXNzdWUxNDQ2ODMyNzY= 811 Selection based on boolean DataArray pwolfram 4295853 closed 0     17 2016-03-30T18:38:34Z 2016-04-15T20:30:03Z 2016-04-15T20:30:03Z CONTRIBUTOR      

Should xarray indexing account for boolean values without resorting to a call to np.where? For example, acase.sel(Np=np.where(idx)[0]) works but acase.sel(Np=idx) does not.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/811/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
144037264 MDU6SXNzdWUxNDQwMzcyNjQ= 807 cumprod returns errors pwolfram 4295853 closed 0     3 2016-03-28T17:50:13Z 2016-03-31T23:39:55Z 2016-03-31T23:39:55Z CONTRIBUTOR      

The xarray implementation of cumprod returns an assertion error, presumably because of bottleneck, e.g., https://github.com/pydata/xarray/blob/master/xarray/core/ops.py#L333. The error is

``` └─▪ ./test_cumprod.py [ 0.8841785 0.54181236 0.29075258 0.28883015 0.1137352 0.09909713 0.03570122 0.0304542 0.01578143 0.01496195 0.01442681 0.00980845] Traceback (most recent call last): File "./test_cumprod.py", line 13, in <module> foo.cumprod() File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/common.py", line 16, in wrapped_func skipna=skipna, allow_lazy=True, kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/dataarray.py", line 991, in reduce var = self.variable.reduce(func, dim, axis, keep_attrs, kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 871, in reduce axis=axis, **kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/ops.py", line 346, in f assert using_numpy_nan_func AssertionError

```

If bottleneck is uninstalled then a value error is returned:

└─▪ ./test_cumprod.py [ 2.99508768e-01 2.80142920e-01 1.56389242e-01 1.10791301e-01 4.58372649e-02 4.10865622e-02 9.91362500e-03 6.76033435e-03 3.83574249e-03 9.54972340e-04 1.56846616e-04 6.44088547e-05] Traceback (most recent call last): File "./test_cumprod.py", line 13, in <module> foo.cumprod() File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/common.py", line 16, in wrapped_func skipna=skipna, allow_lazy=True, **kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/dataarray.py", line 991, in reduce var = self.variable.reduce(func, dim, axis, keep_attrs, **kwargs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 880, in reduce return Variable(dims, data, attrs=attrs) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 213, in __init__ self._dims = self._parse_dimensions(dims) File "/Users/pwolfram/anaconda/lib/python2.7/site-packages/xarray/core/variable.py", line 321, in _parse_dimensions % (dims, self.ndim)) ValueError: dimensions () must have the same length as the number of data dimensions, ndim=1

No error occurs if the data array is converted to a numpy array prior to use of cumprod.

This can easily be reproduced by https://gist.github.com/c32f231b773ecc4b0ccf, excerpted below

``` import numpy as np import pandas as pd import xarray as xr

data = np.random.rand(4, 3) locs = ['IA', 'IL', 'IN'] times = pd.date_range('2000-01-01', periods=4) foo = xr.DataArray(data, coords=[times, locs], dims=['time', 'space']) print foo.values.cumprod() foo.cumprod() ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/807/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
140291221 MDU6SXNzdWUxNDAyOTEyMjE= 793 dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf pwolfram 4295853 closed 0     21 2016-03-11T21:04:36Z 2016-03-24T02:49:26Z 2016-03-24T02:49:13Z CONTRIBUTOR      

Dask appears to be failing on serialization following a ds.to_netcdef() via a NETCDF: HDF error.
Excerpted error below:

``` Traceback (most recent call last): File "reduce_dispersion_file.py", line 40, in <module> if name == "main": File "reduce_dispersion_file.py", line 36, in reduce_dispersion_file with timeit_context('output to disk'): File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.py", line 791, in to_netcdf engine=engine, encoding=encoding) File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/api.py", line 356, in to_netcdf dataset.dump_to_store(store, sync=sync, encoding=encoding) File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/core/dataset.py", line 739, in dump_to_store store.sync() File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/netCDF4_.py", line 283, in sync super(NetCDF4DataStore, self).sync() File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/common.py", line 186, in sync self.writer.sync() File "/users/pwolfram/envs/LIGHT_analysis/lib/python2.7/site-packages/xarray/backends/common.py", line 165, in sync da.store(self.sources, self.targets) File "/users/pwolfram/lib/python2.7/site-packages/dask/array/core.py", line 712, in store Array._get(dsk, keys, kwargs) File "/users/pwolfram/lib/python2.7/site-packages/dask/base.py", line 43, in _get return get(dsk2, keys, kwargs) File "/users/pwolfram/lib/python2.7/site-packages/dask/threaded.py", line 57, in get **kwargs) File "/users/pwolfram/lib/python2.7/site-packages/dask/async.py", line 481, in get_async raise(remote_exception(res, tb)) dask.async.RuntimeError: NetCDF: HDF error

Traceback

File "/users/pwolfram/lib/python2.7/site-packages/dask/async.py", line 264, in execute_task result = _execute_task(task, data) File "/users/pwolfram/lib/python2.7/site-packages/dask/async.py", line 246, in _execute_task return func(*args2) File "/users/pwolfram/lib/python2.7/site-packages/dask/array/core.py", line 1954, in store out[index] = np.asanyarray(x) File "netCDF4/_netCDF4.pyx", line 3678, in netCDF4._netCDF4.Variable.setitem (netCDF4/_netCDF4.c:37215) File "netCDF4/_netCDF4.pyx", line 3887, in netCDF4._netCDF4.Variable._put (netCDF4/_netCDF4.c:38907) ```

Script used: https://gist.github.com/98acaa31a4533b490f78 Full output: https://gist.github.com/248efce774ad08cb1dd6

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/793/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
138332032 MDU6SXNzdWUxMzgzMzIwMzI= 783 Array size changes following loading of numpy array pwolfram 4295853 closed 0     19 2016-03-03T23:44:39Z 2016-03-08T23:41:38Z 2016-03-08T23:37:16Z CONTRIBUTOR      

The issue in a nutshell is that

(Pdb) rlzns.xParticle[rnum*Ntr:(rnum+1)*Ntr,:].shape (30, 1012000) (Pdb) rlzns.xParticle[rnum*Ntr:(rnum+1)*Ntr,:].values.shape (29, 1012000) (Pdb) rlzns.xParticle[rnum*Ntr:(rnum+1)*Ntr,:].data.shape (30, 1012000) (Pdb) rlzns.xParticle[rnum*Ntr:(rnum+1)*Ntr,:].data dask.array<getitem..., shape=(30, 1012000), dtype=float64, chunksize=(23, 1012000)>

It seems to me that for some reason when the array is loaded via values that it is no longer the same size. The dask shape appears to be correct.

I previously do a filter on time via rlzns = rlzns.isel(Time=np.where(reset > 0)[0]) and do some commands like np.reshape(rlzns.Time[rnum*Ntr:(rnum+1)*Ntr].values,(1,Ntr)),axis=1) but it seems unlikely that this would be causing the problem.

Has anyone had an issue like this? Any ideas on what could be causing the problem would be greatly appreciated because this behavior is very strange.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/783/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1139.051ms · About: xarray-datasette