html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7280#issuecomment-1311919228,https://api.github.com/repos/pydata/xarray/issues/7280,1311919228,IC_kwDOAMm_X85OMkx8,743508,2022-11-11T16:27:57Z,2022-11-11T16:27:57Z,CONTRIBUTOR,"@keewis using your solution things seem to more or less work, except that every operation of course 'loses' the `__array_namespace__` attr so anything like slicing only half works, plus a lot of indexing operations are not implemented on scipy sparse arrays.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1445486904
https://github.com/pydata/xarray/issues/7280#issuecomment-1311902588,https://api.github.com/repos/pydata/xarray/issues/7280,1311902588,IC_kwDOAMm_X85OMgt8,743508,2022-11-11T16:14:12Z,2022-11-11T16:14:12Z,CONTRIBUTOR,"Ok I had assumed that scipy would have directly implemented the array interface, I will see if there is already an issue open there. Then we can slowly see what else does/doesn't work.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1445486904
https://github.com/pydata/xarray/issues/4380#issuecomment-795114188,https://api.github.com/repos/pydata/xarray/issues/4380,795114188,MDEyOklzc3VlQ29tbWVudDc5NTExNDE4OA==,743508,2021-03-10T09:00:48Z,2021-03-10T09:00:48Z,CONTRIBUTOR,"Running into the same issue, when I:

1. Load input from a Zarr data source
2. Queue some processing (delayed dask ufuncs)
3. Re-chunk using `chunk()` to get the dask task size I want
4. use to_zarr to trigger the calculation (dask distributed backend) and save to a new file on disk

I get the chunk size mismatch error which I solve by manually overwriting the `encoding['chunks']` value, which seems unintuitive to me. Since I'm going from->to a zarr, I assumed that calling `chunk()` would set the chunk size for both the dask arrays and the zarr output, since calling `to_zarr` on a dask array will only work if the dask and zarr encoding chunk size match. 

I didn't realize the `overwrite_encoded_chunks` option existed but it's also a bit confusing that to get the right chunksize on the *output* i need to set the overwrite option on the *input*.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,686608969
https://github.com/pydata/xarray/issues/1378#issuecomment-602795869,https://api.github.com/repos/pydata/xarray/issues/1378,602795869,MDEyOklzc3VlQ29tbWVudDYwMjc5NTg2OQ==,743508,2020-03-23T19:02:26Z,2020-03-23T19:02:26Z,CONTRIBUTOR,"Just wondering what the status of this is. I've been running into bugs trying to model symmetric distance matrices using the same dimension. Interestingly, it does work very well for selecting, e.g. if use `.sel(nodes=node_list)` on a square matrix i correctly get a square matrix subset 👍  But unfortunately a lot of other things seems to break, e.g. concatenating fails with
`ValueError: axes don't match array` :( What would need to happen to make this work?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,222676855
https://github.com/pydata/xarray/issues/2049#issuecomment-584701023,https://api.github.com/repos/pydata/xarray/issues/2049,584701023,MDEyOklzc3VlQ29tbWVudDU4NDcwMTAyMw==,743508,2020-02-11T15:47:28Z,2020-02-11T15:48:08Z,CONTRIBUTOR,"Just run into this issue, present in 0.15, also does not respect the option `keep_attrs=True`","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,313010564
https://github.com/pydata/xarray/issues/3761#issuecomment-583488834,https://api.github.com/repos/pydata/xarray/issues/3761,583488834,MDEyOklzc3VlQ29tbWVudDU4MzQ4ODgzNA==,743508,2020-02-07T16:37:05Z,2020-02-07T16:37:05Z,CONTRIBUTOR,"I think it makes sense to support the conversion. Perhaps a better example is with a dataset:

```python
x = np.arange(10)
y = np.arange(10)

data = np.zeros((len(x), len(y)))

ds = xr.Dataset({k: xr.DataArray(data, coords=[x, y], dims=['x', 'y']) for k in ['a', 'b', 'c']})
ds.sel(x=1,y=1)
>>> <xarray.Dataset>
Dimensions:  ()
Coordinates:
    x        int64 1
    y        int64 1
Data variables:
    a        float64 0.0
    b        float64 0.0
    c        float64 0.0
```

The output is a dataset of scalars, which converts fairly intuitively to a single row dataframe. But the folloiwing throws the same error.

```python
ds.sel(x=1,y=1).to_dataframe()
```

Or think of it another way - isn't it very un-intuitive that converting a single-item dataset to a dataframe works *only if* the item was selected using a length-1 list? To me that seems like a very arbitrary restriction. Following that logic, it also makes sense to have consistent behaviour between Datasets and DataArrays (even if you end up producing a single-element table).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,561539035
https://github.com/pydata/xarray/issues/2531#issuecomment-460174589,https://api.github.com/repos/pydata/xarray/issues/2531,460174589,MDEyOklzc3VlQ29tbWVudDQ2MDE3NDU4OQ==,743508,2019-02-04T09:06:14Z,2019-02-04T09:06:43Z,CONTRIBUTOR,"Perhaps related - I was running into MemoryErrors with a large array and also noticed that chunksizes were not respected (basically xarray tried to process the array in one go) - but it turned out that i'd forgotten to install both `bottleneck` and `numexpr` and after installing both (just installing bottleneck was not enough), everything worked as expected.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,376154741
https://github.com/pydata/xarray/issues/1467#issuecomment-311621960,https://api.github.com/repos/pydata/xarray/issues/1467,311621960,MDEyOklzc3VlQ29tbWVudDMxMTYyMTk2MA==,743508,2017-06-28T10:33:33Z,2017-06-28T10:33:33Z,CONTRIBUTOR,"I think I do mean 'years' in the CF convention sense, in this case the time dimension is:

```
double time(time=145);
      :standard_name = ""time"";
      :units = ""years since 1860-1-1 12:00:00"";
      :calendar = ""proleptic_gregorian"";
```

This is correctly interpreted by the NASA Panoply NetCDF file viewer. From glancing at the `xarray` code, it seems it depends on the pandas Timedelta object which in turn doesn't support years as delta objects (although date ranges can be generated at year intervals so it should be possible to implement).
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,238990919
https://github.com/pydata/xarray/issues/1424#issuecomment-303857073,https://api.github.com/repos/pydata/xarray/issues/1424,303857073,MDEyOklzc3VlQ29tbWVudDMwMzg1NzA3Mw==,743508,2017-05-24T21:28:44Z,2017-05-24T21:28:44Z,CONTRIBUTOR,"Dataset isn't chunked, and yes I am using cartopy to draw coastlines following the example in the docs:

```python
p = heatwaves_pop.plot(x='longitude', y='latitude', col='time', 
                       col_wrap=3, cmap='RdBu_r', vmin=-v_both, vmax=v_both,
                    size=2,
                 subplot_kws=dict(projection=crs.PlateCarree())
                      )
for ax in p.axes.flat:
    ax.coastlines()
```

where `heatwaves_pop` is calculated from a bunch of other xarray datasets. What surprised me was that they should all have been loaded into memory so I did not expect further increase in memory use.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,231061878
https://github.com/pydata/xarray/issues/1424#issuecomment-303748239,https://api.github.com/repos/pydata/xarray/issues/1424,303748239,MDEyOklzc3VlQ29tbWVudDMwMzc0ODIzOQ==,743508,2017-05-24T14:51:06Z,2017-05-24T14:51:06Z,CONTRIBUTOR,"16 maps, although like you say, I'm not sure if this is coming from xarray or matplotlib","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,231061878
https://github.com/pydata/xarray/issues/1301#issuecomment-285052725,https://api.github.com/repos/pydata/xarray/issues/1301,285052725,MDEyOklzc3VlQ29tbWVudDI4NTA1MjcyNQ==,743508,2017-03-08T14:20:30Z,2017-03-08T14:20:30Z,CONTRIBUTOR,My 2cents - I've found that with big files any `%prun` tends to show `method 'acquire' of '_thread.lock'` as one of the highest time but it's not necessarily indicative of where the perf issue comes from because it's effectively just waiting for IO which is always slow. One thing that helps get a better profile is setting `dask` backend to the non-parallel `sync` option which gives cleaner profiles. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,212561278
https://github.com/pydata/xarray/pull/1162#issuecomment-274602298,https://api.github.com/repos/pydata/xarray/issues/1162,274602298,MDEyOklzc3VlQ29tbWVudDI3NDYwMjI5OA==,743508,2017-01-23T20:09:24Z,2017-01-23T20:09:24Z,CONTRIBUTOR,Crickey. Fixed merge hopefully it works (I hate merge conflicts),"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-274567523,https://api.github.com/repos/pydata/xarray/issues/1162,274567523,MDEyOklzc3VlQ29tbWVudDI3NDU2NzUyMw==,743508,2017-01-23T18:04:09Z,2017-01-23T18:04:09Z,CONTRIBUTOR,OK added a performance improvements section to the docs,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-274564256,https://api.github.com/repos/pydata/xarray/issues/1162,274564256,MDEyOklzc3VlQ29tbWVudDI3NDU2NDI1Ng==,743508,2017-01-23T17:52:33Z,2017-01-23T17:52:33Z,CONTRIBUTOR,"Note - waiting for 0.9.0 to be released before updating whats new, don't want to end up with conflicts in docs ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-272844516,https://api.github.com/repos/pydata/xarray/issues/1162,272844516,MDEyOklzc3VlQ29tbWVudDI3Mjg0NDUxNg==,743508,2017-01-16T11:59:01Z,2017-01-16T11:59:01Z,CONTRIBUTOR,Ok will wait for 0.9.0 to be released,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-272715240,https://api.github.com/repos/pydata/xarray/issues/1162,272715240,MDEyOklzc3VlQ29tbWVudDI3MjcxNTI0MA==,743508,2017-01-15T18:53:26Z,2017-01-15T18:53:26Z,CONTRIBUTOR,"Completed changes based on recommendations and cleaned up old code and comments.

As for benchmarks, I don't have anything rigourous but I do have the following example `dataset` weather data from the CFSR dataset, 7 variables at hourly resolution, collected in one netCDF3 file per variable per month. In the particular case the difference is striking!

```python
%%time
data = dataset.isel_points(time=np.arange(0,1000), lat=np.ones(1000, dtype=int), lon=np.ones(1000, dtype=int))
data.load()
```

Results:

```
xarray 0.8.2
CPU times: user 1min 21s, sys: 41.5 s, total: 2min 2s
Wall time: 47.8 s

master
CPU times: user 385 ms, sys: 238 ms, total: 623 ms
Wall time: 288 ms
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-269093854,https://api.github.com/repos/pydata/xarray/issues/1162,269093854,MDEyOklzc3VlQ29tbWVudDI2OTA5Mzg1NA==,743508,2016-12-24T17:49:10Z,2016-12-24T17:49:10Z,CONTRIBUTOR,"@shoyer Tidied up based on recommendations, now everything done in a single loop (still need to make distinction between variables and coordinates for output but still a lot neater)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-269026887,https://api.github.com/repos/pydata/xarray/issues/1162,269026887,MDEyOklzc3VlQ29tbWVudDI2OTAyNjg4Nw==,743508,2016-12-23T18:13:52Z,2016-12-23T18:25:03Z,CONTRIBUTOR,"OK I adjusted for the new behaviour and all tests pass locally, hopefully travis agrees...

Edit: Looks like it's green","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-268927305,https://api.github.com/repos/pydata/xarray/issues/1162,268927305,MDEyOklzc3VlQ29tbWVudDI2ODkyNzMwNQ==,743508,2016-12-23T01:42:03Z,2016-12-23T01:42:03Z,CONTRIBUTOR,"@shoyer I'm down to 1 test failing locally in `sel_points` but not sure what the desired behaviour is. I get:

```
<xarray.Dataset>
Dimensions:  (points: 3)
Coordinates:
  * points   (points) int64 0 1 2
Data variables:
    foo      (points) int64 0 4 8
```
instead of

```
AssertionError: <xarray.Dataset>
Dimensions:  (points: 3)
Coordinates:
  o points   (points) -
Data variables:
    foo      (points) int64 0 4 8
```

But here I'm not sure if my code is wrong or the test. It seems that the test requires `sel_points` NOT to generate a new coordinate values for points - however I'm pretty sure `isel_points` does require this (it passes in any case). Don't really see a way in my code to generate subsets without having a matching coordinate array (I don't know how to use the Dataset constructors without one for instance).

I've updated the test according to how I think it should be working, but please correct me if i misunderstood.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/pull/1162#issuecomment-266995169,https://api.github.com/repos/pydata/xarray/issues/1162,266995169,MDEyOklzc3VlQ29tbWVudDI2Njk5NTE2OQ==,743508,2016-12-14T10:10:11Z,2016-12-14T10:10:36Z,CONTRIBUTOR,"So it seems to work fine in the Dask case, but I don't have a deep understanding of  how DataArrays are constructed from arrays and dims so it fails in the non-dask case. Also not sure how you feel about making a special case for the dask backend here (since up till now it was all backend agnostic). ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195125296
https://github.com/pydata/xarray/issues/1161#issuecomment-266598007,https://api.github.com/repos/pydata/xarray/issues/1161,266598007,MDEyOklzc3VlQ29tbWVudDI2NjU5ODAwNw==,743508,2016-12-13T00:29:16Z,2016-12-13T00:29:16Z,CONTRIBUTOR,Seems to run a lot faster for me too...,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195050684
https://github.com/pydata/xarray/issues/1161#issuecomment-266596464,https://api.github.com/repos/pydata/xarray/issues/1161,266596464,MDEyOklzc3VlQ29tbWVudDI2NjU5NjQ2NA==,743508,2016-12-13T00:20:12Z,2016-12-13T00:20:12Z,CONTRIBUTOR,Done with PR #1162 ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195050684
https://github.com/pydata/xarray/issues/1161#issuecomment-266587849,https://api.github.com/repos/pydata/xarray/issues/1161,266587849,MDEyOklzc3VlQ29tbWVudDI2NjU4Nzg0OQ==,743508,2016-12-12T23:32:19Z,2016-12-12T23:33:03Z,CONTRIBUTOR,"Thanks, I've been looking around and I think i'm getting close, however i'm not sure the best way to turn the array slice i get from vindex into a DataArray variable. I'm thinking I might but together a draft PR for comments. This is what i have so far:

```python

def isel_points(self, dim='points', **indexers):
    """"""Returns a new dataset with each array indexed pointwise along the
    specified dimension(s).

    This method selects pointwise values from each array and is akin to
    the NumPy indexing behavior of `arr[[0, 1], [0, 1]]`, except this
    method does not require knowing the order of each array's dimensions.

    Parameters
    ----------
    dim : str or DataArray or pandas.Index or other list-like object, optional
        Name of the dimension to concatenate along. If dim is provided as a
        string, it must be a new dimension name, in which case it is added
        along axis=0. If dim is provided as a DataArray or Index or
        list-like object, its name, which must not be present in the
        dataset, is used as the dimension to concatenate along and the
        values are added as a coordinate.
    **indexers : {dim: indexer, ...}
        Keyword arguments with names matching dimensions and values given
        by array-like objects. All indexers must be the same length and
        1 dimensional.

    Returns
    -------
    obj : Dataset
        A new Dataset with the same contents as this dataset, except each
        array and dimension is indexed by the appropriate indexers. With
        pointwise indexing, the new Dataset will always be a copy of the
        original.

    See Also
    --------
    Dataset.sel
    Dataset.isel
    Dataset.sel_points
    DataArray.isel_points
    """"""
    from .dataarray import DataArray

    indexer_dims = set(indexers)

    def relevant_keys(mapping):
        return [k for k, v in mapping.items()
                if any(d in indexer_dims for d in v.dims)]

    data_vars = relevant_keys(self.data_vars)
    coords = relevant_keys(self.coords)

    # all the indexers should be iterables
    keys = indexers.keys()
    indexers = [(k, np.asarray(v)) for k, v in iteritems(indexers)]
    # Check that indexers are valid dims, integers, and 1D
    for k, v in indexers:
        if k not in self.dims:
            raise ValueError(""dimension %s does not exist"" % k)
        if v.dtype.kind != 'i':
            raise TypeError('Indexers must be integers')
        if v.ndim != 1:
            raise ValueError('Indexers must be 1 dimensional')

    # all the indexers should have the same length
    lengths = set(len(v) for k, v in indexers)
    if len(lengths) > 1:
        raise ValueError('All indexers must be the same length')

    # Existing dimensions are not valid choices for the dim argument
    if isinstance(dim, basestring):
        if dim in self.dims:
            # dim is an invalid string
            raise ValueError('Existing dimension names are not valid '
                             'choices for the dim argument in sel_points')
    elif hasattr(dim, 'dims'):
        # dim is a DataArray or Coordinate
        if dim.name in self.dims:
            # dim already exists
            raise ValueError('Existing dimensions are not valid choices '
                             'for the dim argument in sel_points')

    if not utils.is_scalar(dim) and not isinstance(dim, DataArray):
        dim = as_variable(dim, name='points')

    variables = OrderedDict()
    indexers_dict = dict(indexers)
    non_indexed = list(set(self.dims) - indexer_dims)

    # TODO need to figure out how to make sure we get the indexed vs non indexed dimensions in the right order
    for name, var in self.variables.items():
        slc = []
        
        for k in var.dims:
            if k in indexers_dict:
                slc.append(indexers_dict[k])
            else:
                slc.append(slice(None, None))
        if hasattr(var.data, 'vindex'):
            variables[name] = DataArray(var.data.vindex[tuple(slc)], name=name)
        else:
            variables[name] = var[tuple(slc)]
    
    points_len = lengths.pop()
    
    new_variables = OrderedDict()
    for name, var in variables.items():
        if name not in self.dims:
            coords = [variables[k] for k in non_indexed]
            new_variables[name] = DataArray(var, coords=[np.arange(points_len)] + coords, dims=[dim] + non_indexed)

    return xr.merge([v for k,v in new_variables.items() if k not in selection.dims])
    # TODO: This would be sped up with vectorized indexing. This will
    # require dask to support pointwise indexing as well.
#     return concat([self.isel(**d) for d in
#                    [dict(zip(keys, inds)) for inds in
#                     zip(*[v for k, v in indexers])]],
#                   dim=dim, coords=coords, data_vars=data_vars)

```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195050684
https://github.com/pydata/xarray/issues/1161#issuecomment-266519121,https://api.github.com/repos/pydata/xarray/issues/1161,266519121,MDEyOklzc3VlQ29tbWVudDI2NjUxOTEyMQ==,743508,2016-12-12T18:59:15Z,2016-12-12T18:59:15Z,CONTRIBUTOR,"Ok I will have a look, where is this implemented (I always seem to have trouble pinpointing the dask-specific bits in the codebase :S )","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,195050684
https://github.com/pydata/xarray/pull/1128#issuecomment-265966887,https://api.github.com/repos/pydata/xarray/issues/1128,265966887,MDEyOklzc3VlQ29tbWVudDI2NTk2Njg4Nw==,743508,2016-12-09T09:08:48Z,2016-12-09T09:08:48Z,CONTRIBUTOR,"@shoyer thanks, with a little testing it seems `lock=False` is fine (so don't automatically need   dask dev for `lock=dask.utils.SerializableLock()`). Using spawning pool is necessary, just doesn't work without. Also looks like using dask distributed ipython backend works fine (works similar to spawn pool in that the worker engines aren't forked but kinda live in their own little world) - this is really nice because ipython in turn has good support for HPC systems (SGE batch scheduling + MPI for process handling).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/pull/1128#issuecomment-265875012,https://api.github.com/repos/pydata/xarray/issues/1128,265875012,MDEyOklzc3VlQ29tbWVudDI2NTg3NTAxMg==,743508,2016-12-08T22:28:25Z,2016-12-08T22:28:25Z,CONTRIBUTOR,I'm trying out the latest code to subset a set of netcdf4 files with dask.multiprocessing using `set_options(get=dask.multiprocessing.get)` but I'm still getting `TypeError: can't pickle _thread.lock objects`  - this expect or there something specific I need to do to make it work?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,189817033
https://github.com/pydata/xarray/issues/894#issuecomment-230289863,https://api.github.com/repos/pydata/xarray/issues/894,230289863,MDEyOklzc3VlQ29tbWVudDIzMDI4OTg2Mw==,743508,2016-07-04T13:23:53Z,2016-07-04T13:23:53Z,CONTRIBUTOR,"I think this is also a bug if you load a multifile dataset, since when you rename it you get a new dataset but when you trigger a read that goes back to the original files which haven't been renamed on-disk.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,163414759
https://github.com/pydata/xarray/issues/463#issuecomment-223918870,https://api.github.com/repos/pydata/xarray/issues/463,223918870,MDEyOklzc3VlQ29tbWVudDIyMzkxODg3MA==,743508,2016-06-06T10:09:48Z,2016-06-06T10:09:48Z,CONTRIBUTOR,"So using a cleaner minimal example it does appear that the files _are_ closed after the dataset is closed. However, they are _all_ open _during_ dataset loading - this is what blows past the OSX default max open file limit.

I think this could be a real issue when using Xarray to handle too-big-for-ram datasets - you could easily be trying to access 1000s of files (especially with weather data), so Xarray should limit the number it holds open at any one time during data load. Not being familiar with the internals I'm not sure if this is an issue in Xarray itself or in the Dask backend.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-223905394,https://api.github.com/repos/pydata/xarray/issues/463,223905394,MDEyOklzc3VlQ29tbWVudDIyMzkwNTM5NA==,743508,2016-06-06T09:06:33Z,2016-06-06T09:06:33Z,CONTRIBUTOR,"@shoyer thanks - here's how i'm using mfdataset - not using any options. I'm going to try using the `h5netcdf` backend to see if I get the same results. I'm still not 100% confident that I'm tracking open files correctly with `lsof` so I'm going to try to make a minimal example to investigate. 

``` python


def weather_dataset(root_path: Path, *, start_date: datetime = None, end_date: datetime = None):
    flat_files_paths = get_dset_file_paths(root_path, start_date=start_date, end_date=end_date)
    # Convert Paths to list of strings for xarray
    dataset = xr.open_mfdataset([str(f) for f in flat_files_paths])
    return dataset


def cfsr_weather_loader(db, site_lookup_fn=None, dset_start=None, dset_end=None, site_conf=None):
    # Pull values out of the
    dt_conf = site_conf if site_conf else WEATHER_CFSR
    dset_start = dset_start if dset_start else dt_conf['start_dt']
    dset_end = dset_end if dset_end else dt_conf['end_dt']

    if site_lookup_fn is None:
        site_lookup_fn = site_lookup_postcode_district

    def weather_loader(site_id, start_date, end_date, resample=None):
        # using the tuple because always getting mixed up with lon/lat
        geo_lookup = site_lookup_fn(site_id, db)

        # With statement should ensure dset is closed after loading.
        with weather_dataset(WEATHER_CFSR['path'],
                             start_date=dset_start,
                             end_date=dset_end) as weather:
            data = weighted_regional_timeseries(weather, start_date, end_date,
                                                lon=geo_lookup.lon,
                                                lat=geo_lookup.lat,
                                                weights=geo_lookup.weights)

        # RENAME from CFSR standard
        data = data.rename(columns=WEATHER_RENAME)

        if resample is not None:
            data = data.resample(resample).mean()
        data.irradiance /= 1000.0  # convert irradiance to kW
        return data

    return weather_loader
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-223837612,https://api.github.com/repos/pydata/xarray/issues/463,223837612,MDEyOklzc3VlQ29tbWVudDIyMzgzNzYxMg==,743508,2016-06-05T21:05:40Z,2016-06-05T21:05:40Z,CONTRIBUTOR,"So on investigation, even though my dataset creation is wrapped in a `with` block, using lsof to check the file handles held by my iPython kernel suggests that all the input files are still open. Are you certain that the backend correctly closes files in a multifile dataset? Is there a way to explicitly force this to happen?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-223810723,https://api.github.com/repos/pydata/xarray/issues/463,223810723,MDEyOklzc3VlQ29tbWVudDIyMzgxMDcyMw==,743508,2016-06-05T12:34:11Z,2016-06-05T12:34:11Z,CONTRIBUTOR,"I still hit this issue after wrapping my open_mfdataset in a with statement. I'm suspecting to be an OSX problem, MacOS has a very low default max-open-files limit for applications started from the shell (like 256). It's not yet clear to me whether my datasets are being correctly closed, investigating...
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-223687053,https://api.github.com/repos/pydata/xarray/issues/463,223687053,MDEyOklzc3VlQ29tbWVudDIyMzY4NzA1Mw==,743508,2016-06-03T20:31:56Z,2016-06-03T20:31:56Z,CONTRIBUTOR,"It seems to happen even with a freshly restarted notebook, but I'll try a
with statement to see if helps.
On 3 Jun 2016 19:53, ""Stephan Hoyer"" notifications@github.com wrote:

> I suspect you hit this in IPython after rerunning cells, because file
> handles are only automatically closed when programs exit. You might find it
> a good idea to explicitly close files by calling .close() (or using a
> ""with"" statement) on Datasets opened with open_mfdataset.
> 
> On Fri, Jun 3, 2016 at 11:08 AM, mangecoeur notifications@github.com
> wrote:
> 
> > I'm also running into this error - but strangely it only happens when
> > using IPython interactive backend. I have some tests which work fine, but
> > doing the same in IPython fails.
> > 
> > I'm opening a few hundred files (about 10Mb each, one per month across a
> > few variables). I'm using the default NetCDF backend.
> > 
> > —
> > You are receiving this because you commented.
> > Reply to this email directly, view it on GitHub
> > https://github.com/pydata/xarray/issues/463#issuecomment-223651454,
> > or mute
> > the thread
> > <
> > https://github.com/notifications/unsubscribe/ABKS1sOTvuTtWVVFM7tnP7tnuGKvI-MBks5qIG2YgaJpZM4FWKen
> > 
> > .
> 
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/463#issuecomment-223663026, or mute
> the thread
> https://github.com/notifications/unsubscribe/AAtYVCtspqRb0AXy1ilbgoRuZN_syEDvks5qIHglgaJpZM4FWKen
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-223651454,https://api.github.com/repos/pydata/xarray/issues/463,223651454,MDEyOklzc3VlQ29tbWVudDIyMzY1MTQ1NA==,743508,2016-06-03T18:08:24Z,2016-06-03T18:08:24Z,CONTRIBUTOR,"I'm also running into this error - but strangely it only happens when using IPython interactive backend. I have some tests which work fine, but doing the same in IPython fails. 

I'm opening a few hundred files (about 10Mb each, one per month across a few variables). I'm using the default NetCDF backend.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/864#issuecomment-222995827,https://api.github.com/repos/pydata/xarray/issues/864,222995827,MDEyOklzc3VlQ29tbWVudDIyMjk5NTgyNw==,743508,2016-06-01T13:42:21Z,2016-06-01T13:42:59Z,CONTRIBUTOR,"On further investigation, it appears the problem is the dataset contains a mix of string and float data - the strings are redundant representations of the time stamp, therefore they don't appear in the index query. When I tried to convert to array, the numpy chokes on the mixed types. Explicitly selecting on the desired data variable solves this: 

`selection = cfsr_new.TMP_L103.sel(lon=lon_sel, lat=lat_sel, time=time_sel)`

I think a clearer error message may be needed: when you do `sel` without indexing on certain dimensions, those are included in the resulting selection. It's possible for those to be of mixed incompatible types. Clearly  to do `to_array` you need a numpy-friendly uniform type. The error should make this clearer.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,157886730