html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/463#issuecomment-347157526,https://api.github.com/repos/pydata/xarray/issues/463,347157526,MDEyOklzc3VlQ29tbWVudDM0NzE1NzUyNg==,1217238,2017-11-27T11:40:35Z,2017-11-27T11:40:35Z,MEMBER,"Using autoclose=True should also fix this.
On Mon, Nov 27, 2017 at 10:26 AM Sebastian Hahn <notifications@github.com>
wrote:

> Ok, I found my problem. I had to increase ulimit -n
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/pydata/xarray/issues/463#issuecomment-347140117>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/ABKS1mu2bDkvJoV-fAz8DVAKp22bOMATks5s6o5xgaJpZM4FWKen>
> .
>
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-288832922,https://api.github.com/repos/pydata/xarray/issues/463,288832922,MDEyOklzc3VlQ29tbWVudDI4ODgzMjkyMg==,1217238,2017-03-23T19:22:43Z,2017-03-23T19:22:43Z,MEMBER,"OK, I'm closing this issue as ""Fixed"" by #1198. Feel free to open new issue for any follow-up concerns.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263734251,https://api.github.com/repos/pydata/xarray/issues/463,263734251,MDEyOklzc3VlQ29tbWVudDI2MzczNDI1MQ==,1217238,2016-11-29T23:30:02Z,2016-11-29T23:30:02Z,MEMBER,"> if I understand correctly the best approach as you see it to build on opener via #1128, recognizing this will be essentially ""upgraded"" sometime in the future, right?

Yes, exactly. I plan to merge that PR very shortly, after a few fixes for the failing tests on Windows (less than an hour of work).","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263706346,https://api.github.com/repos/pydata/xarray/issues/463,263706346,MDEyOklzc3VlQ29tbWVudDI2MzcwNjM0Ng==,1217238,2016-11-29T21:35:06Z,2016-11-29T21:35:06Z,MEMBER,"@pwolfram NcML is just an XML specification for how variables in a set of NetCDF files can be combined into a single virtual NetCDF file. This would be useful because it would allow building a version of `open_mfdataset` that doesn't need to inspect every single file. So this is definitely independent of the other options.

I suspect that even the LRU cache approach would build on `opener` from #1128. From a design perspective in the DataStore subclasses, I would guess that both the LRU cache and my latest suggestion should look pretty similar: the appropriate methods on DataStore and the data store Array subclasses will need to use something like `with self._ensure_open():` block to guard all access to underlying file objects.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263652409,https://api.github.com/repos/pydata/xarray/issues/463,263652409,MDEyOklzc3VlQ29tbWVudDI2MzY1MjQwOQ==,1217238,2016-11-29T18:17:17Z,2016-11-29T18:17:17Z,MEMBER,"> @shoyer is it ever feasible to read the first NetCDF file in a sequence and
assume that they are all the same except to increment a datetime dimension
by increasing days?

Sure. This should probably be a different wrapper function than `open_mfdataset`, though, one that can make stronger assumptions. For example, one might make a wrapper function for handling [NcML](http://www.unidata.ucar.edu/software/thredds/current/netcdf-java/ncml/Aggregation.html).

@kmpaul thanks for sharing! This is useful background.

There is at least one other option worth considering. Instead of using the open file LRU cache, a simpler option could be to add an optional argument to xarray backends (building on `opener` from https://github.com/pydata/xarray/pull/1128) that switches them to open/close files every time data is accessed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-263437709,https://api.github.com/repos/pydata/xarray/issues/463,263437709,MDEyOklzc3VlQ29tbWVudDI2MzQzNzcwOQ==,1217238,2016-11-29T00:19:53Z,2016-11-29T00:19:53Z,MEMBER,"> if I understand correctly, incorporation of the LRU cache could help with this problem assuming time series were sliced into small chunks for access, correct? We would still run into problems, however, if there were say 10^6 files and we wanted to get a time-series spanning these files, right? 

The LRU cache solution proposed in https://github.com/pydata/xarray/issues/798 would work in either case. It just would have poor performance when accessing a small piece of each of 10^6 files, both to build the graph (because xarray needs to open each file to read the metadata) and to do the actual computation (again, because of the need to open so many files). If you only need a small amount of data from many files, you probably want to reshape your data to minimize the amount of necessary file access no matter what, whether you do that reshaping with PyReshaper or xarray/dask.array/dask-distributed.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-223838593,https://api.github.com/repos/pydata/xarray/issues/463,223838593,MDEyOklzc3VlQ29tbWVudDIyMzgzODU5Mw==,1217238,2016-06-05T21:23:41Z,2016-06-05T21:23:41Z,MEMBER,"@mangecoeur I can take a look. Can you share an example of how you use the `with` block? Are you using any special options to `open_mfdataset`?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-223663026,https://api.github.com/repos/pydata/xarray/issues/463,223663026,MDEyOklzc3VlQ29tbWVudDIyMzY2MzAyNg==,1217238,2016-06-03T18:53:22Z,2016-06-03T18:53:22Z,MEMBER,"I suspect you hit this in IPython after rerunning cells, because file
handles are only automatically closed when programs exit. You might find it
a good idea to explicitly close files by calling .close() (or using a
""with"" statement) on Datasets opened with open_mfdataset.

On Fri, Jun 3, 2016 at 11:08 AM, mangecoeur notifications@github.com
wrote:

> I'm also running into this error - but strangely it only happens when
> using IPython interactive backend. I have some tests which work fine, but
> doing the same in IPython fails.
> 
> I'm opening a few hundred files (about 10Mb each, one per month across a
> few variables). I'm using the default NetCDF backend.
> 
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> https://github.com/pydata/xarray/issues/463#issuecomment-223651454, or mute
> the thread
> https://github.com/notifications/unsubscribe/ABKS1sOTvuTtWVVFM7tnP7tnuGKvI-MBks5qIG2YgaJpZM4FWKen
> .
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-143382040,https://api.github.com/repos/pydata/xarray/issues/463,143382040,MDEyOklzc3VlQ29tbWVudDE0MzM4MjA0MA==,1217238,2015-09-26T00:22:51Z,2015-09-26T00:22:51Z,MEMBER,"OK, I think you could also just add an ensured_open() to the **repr**() method. Right now that class is inheriting it from NDArrayMixin.

On Fri, Sep 25, 2015 at 5:11 PM, Christoph Paulik
notifications@github.com wrote:

> OK, I'll try. Thanks. 
> But I originally tested if netCDF4 can work with a closed/reopened variable like this:
> 
> ``` python
> In [1]: import netCDF4
> In [2]: a = netCDF4.Dataset(""temp.nc"", mode=""w"")
> In [3]: a.createDimension(""lon"")
> Out[3]: <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lon', size = 0
> In [4]: a.createVariable(""lon"", ""f8"", dimensions=(""lon""))
> Out[4]:
>     <class 'netCDF4._netCDF4.Variable'>
>     float64 lon(lon)
>     unlimited dimensions: lon
>     current shape = (0,)
>     filling on, default _FillValue of 9.969209968386869e+36 used
> In [5]: v = a.variables['lon']
> In [6]: v
> Out[6]:
>     <class 'netCDF4._netCDF4.Variable'>
>     float64 lon(lon)
>     unlimited dimensions: lon
>     current shape = (0,)
>     filling on, default _FillValue of 9.969209968386869e+36 used
> In [7]: a.close()
> In [8]: v
> Out[8]: ---------------------------------------------------------------------------
>     RuntimeError                              Traceback (most recent call last)
>     /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
>     695                 type_pprinters=self.type_printers,
>     696                 deferred_pprinters=self.deferred_printers)
>     --> 697             printer.pretty(obj)
>     698             printer.flush()
>     699             return stream.getvalue()
>     /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj)
>     381                             if callable(meth):
>     382                                 return meth(obj, self, cycle)
>     --> 383             return _default_pprint(obj, self, cycle)
>     384         finally:
>     385             self.end_group()
>     /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
>     501     if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
>     502         # A user-provided repr. Find newlines and replace them with p.break_()
>     --> 503         _repr_pprint(obj, p, cycle)
>     504         return
>     505     p.begin_group(1, '<')
>     /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
>     683     """"""A pprint that just redirects to the normal repr function.""""""
>     684     # Find newlines and replace them with p.break_()
>     --> 685     output = repr(obj)
>     686     for idx,output_line in enumerate(output.splitlines()):
>     687         if idx:
>     netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__repr__ (netCDF4/_netCDF4.c:25045)()
>     netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__unicode__ (netCDF4/_netCDF4.c:25243)()
>     netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.dimensions.__get__ (netCDF4/_netCDF4.c:27486)()
>     netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._getdims (netCDF4/_netCDF4.c:26297)()
>     RuntimeError: NetCDF: Not a valid ID
> In [9]: a = netCDF4.Dataset(""temp.nc"")
> In [10]: v
> Out[10]:
>     class 'netCDF4._netCDF4.Variable'>
>     lon(lon)
>     dimensions: lon
>     shape = (0,)
>     on, default _FillValue of 9.969209968386869e+36 used
> ---
> Reply to this email directly or view it on GitHub:
> https://github.com/xray/xray/issues/463#issuecomment-143373357
> ```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-143347373,https://api.github.com/repos/pydata/xarray/issues/463,143347373,MDEyOklzc3VlQ29tbWVudDE0MzM0NzM3Mw==,1217238,2015-09-25T20:35:38Z,2015-09-25T20:35:38Z,MEMBER,"OK, so the problem is that `self.array` on `NetCDF4ArrayWrapper` is retaining a reference to `netCDF4.Variable` object on the closed dataset. It's not enough to merely ensure that a netCDF4 dataset is opened -- you also need to ensure that no references to variables on the old dataset are still around. So `get_variables`/`open_store_variable` may need a refactor to deal with this.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-143325053,https://api.github.com/repos/pydata/xarray/issues/463,143325053,MDEyOklzc3VlQ29tbWVudDE0MzMyNTA1Mw==,1217238,2015-09-25T19:06:51Z,2015-09-25T19:06:51Z,MEMBER,"@cpaulik I wonder if the issue is this section in your `__getitem__` method:

``` python
        data = getitem(self.array, key)
        try:
            self.store.ensure_open()
            data = getitem(self.array, key)
        except RuntimeError as e:
            raise e
            pass
        if self.ndim == 0:
            # work around for netCDF4-python's broken handling of 0-d
            # arrays (slicing them always returns a 1-dimensional array):
            # https://github.com/Unidata/netcdf4-python/pull/220
            data = np.asscalar(data)
        self.store.close()
        return data
```

I would put `self.store.close()` in a `finally` clause following the `getitem` clause.

Actually, you probably want to put this in a context manager that automatically closes the file, something like:

``` python
with self.store.opened():
    data = getitem(self.array, key)
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-142675701,https://api.github.com/repos/pydata/xarray/issues/463,142675701,MDEyOklzc3VlQ29tbWVudDE0MjY3NTcwMQ==,1217238,2015-09-23T17:41:49Z,2015-09-23T17:41:49Z,MEMBER,"I think we can actually read all the variable metadata (shape and dtype) in when we open the file -- we already do that for reading in attributes. Something like this prototype, which would also be useful for reading compressed netCDF4 files with multiprocessing: https://github.com/blaze/dask/pull/457#issuecomment-123512166
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-120666380,https://api.github.com/repos/pydata/xarray/issues/463,120666380,MDEyOklzc3VlQ29tbWVudDEyMDY2NjM4MA==,1217238,2015-07-11T22:36:30Z,2015-07-11T22:36:30Z,MEMBER,"Hmm. How big are each of your netCDF files?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-120448308,https://api.github.com/repos/pydata/xarray/issues/463,120448308,MDEyOklzc3VlQ29tbWVudDEyMDQ0ODMwOA==,1217238,2015-07-10T16:12:52Z,2015-07-10T16:12:52Z,MEMBER,"Sure, you could do this on the scipy backend -- the logic will be essentially the same on both backends.

I believe your issue with netCDF4 backend is the same as this one: https://github.com/xray/xray/issues/444. This will be fixed in the next release.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-120443929,https://api.github.com/repos/pydata/xarray/issues/463,120443929,MDEyOklzc3VlQ29tbWVudDEyMDQ0MzkyOQ==,1217238,2015-07-10T15:58:41Z,2015-07-10T15:58:41Z,MEMBER,"Yes, this is a known issue, and I agree that it is annoying. We could work around this by opening up (and closing) netCDF files inside the `__getitem__` call. If you're interested in possibly working on this, take a look at the netCDF4 backend for xray: https://github.com/xray/xray/blob/master/xray/backends/netCDF4_.py
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498