html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/463#issuecomment-347165242,https://api.github.com/repos/pydata/xarray/issues/463,347165242,MDEyOklzc3VlQ29tbWVudDM0NzE2NTI0Mg==,5929935,2017-11-27T12:17:17Z,2017-11-27T12:17:17Z,NONE,"Thanks, I'll test it!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-347140117,https://api.github.com/repos/pydata/xarray/issues/463,347140117,MDEyOklzc3VlQ29tbWVudDM0NzE0MDExNw==,5929935,2017-11-27T10:26:56Z,2017-11-27T10:26:56Z,NONE,"Ok, I found my problem. I had to increase `ulimit -n`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-347126256,https://api.github.com/repos/pydata/xarray/issues/463,347126256,MDEyOklzc3VlQ29tbWVudDM0NzEyNjI1Ng==,5929935,2017-11-27T09:33:29Z,2017-11-27T09:33:29Z,NONE,"@shoyer I just ran into this issue again (with 8000 files, each 50 kB), I'm using xarray 0.9.6 and work on some performance tests. Is there any upper limit of number of files?
```
File ""/home/shahn/.pyenv/versions/warp_conda/envs/pyraster_env/lib/python2.7/site-packages/xarray/backends/api.py"", line 505, in open_mfdataset
File ""/home/shahn/.pyenv/versions/warp_conda/envs/pyraster_env/lib/python2.7/site-packages/xarray/backends/api.py"", line 282, in open_dataset
File ""/home/shahn/.pyenv/versions/warp_conda/envs/pyraster_env/lib/python2.7/site-packages/xarray/backends/netCDF4_.py"", line 210, in __init__
File ""/home/shahn/.pyenv/versions/warp_conda/envs/pyraster_env/lib/python2.7/site-packages/xarray/backends/netCDF4_.py"", line 185, in _open_netcdf4_group
File ""netCDF4/_netCDF4.pyx"", line 1811, in netCDF4._netCDF4.Dataset.__init__ (netCDF4/_netCDF4.c:13231)
IOError: Too many open files
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-288868053,https://api.github.com/repos/pydata/xarray/issues/463,288868053,MDEyOklzc3VlQ29tbWVudDI4ODg2ODA1Mw==,2615433,2017-03-23T21:37:19Z,2017-03-23T21:37:19Z,NONE,Yessir @pwolfram we are in business.! ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-288835940,https://api.github.com/repos/pydata/xarray/issues/463,288835940,MDEyOklzc3VlQ29tbWVudDI4ODgzNTk0MA==,2615433,2017-03-23T19:34:33Z,2017-03-23T19:34:33Z,NONE,Thanks @pwolfram ... shot you a follow up email at your Gmail... ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-288829145,https://api.github.com/repos/pydata/xarray/issues/463,288829145,MDEyOklzc3VlQ29tbWVudDI4ODgyOTE0NQ==,2615433,2017-03-23T19:08:37Z,2017-03-23T19:08:37Z,NONE,"Not sure this is good feedback at all but I just wanted to provide an additional problematic case, from my end, that is returning this ""too many files"" problem:
NOTE: I have the latest xarray package.
I have about 365 1.7MB Netcdf files that I am trying to read using open_mfdataset() and it continuously gives me the ""too many files"" error and completely hangs jupyter notebooks to the point where I have to ctrl+C out of it. Note that each netcdf contains a Dataset that is 195x195x1. Obviously it's not a file-size issue as I'm not dealing with multiple gigs worth of data. Should I increase the OSX open max file limit, or will that not solve anything in my case? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-224049602,https://api.github.com/repos/pydata/xarray/issues/463,224049602,MDEyOklzc3VlQ29tbWVudDIyNDA0OTYwMg==,4992424,2016-06-06T18:42:06Z,2016-06-06T18:42:06Z,NONE,"@mangecoeur, although it's not an xarray-based solution, I've found that by far the best solution to this problem is to transform your dataset from the ""timeslice"" format (which is convenient for models to write out - all the data at a given point in time, often in separate files for each time step) to ""timeseries"" format - a continuous format, where you have all the data for a single variable in a single (or much smaller collection of) files.
NCAR published a great utility for converting batches of NetCDF output from timeslice to timeseries format [here](https://github.com/NCAR/PyReshaper); it's significantly faster than any shell-script/CDO/NCO solution I've ever encountered, and it parallelizes extremely easily.
Adding a simple post-processing step to convert my simulation output to timeseries format dramatically reduced my overall work time. Before, I had a separate handler which re-implemented open_mfdataset(), performed an intermediate reduction (usually extracting a variable), and then concatenated within xarray. This could get around the open file limit, but it wasn't fast. My pre-processed data is often still big - barely fitting within memory - but it's far easier to handle, and you can throw dask at it no problem to get huge speedups in analysis.
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-143373357,https://api.github.com/repos/pydata/xarray/issues/463,143373357,MDEyOklzc3VlQ29tbWVudDE0MzM3MzM1Nw==,380927,2015-09-25T23:11:39Z,2015-09-25T23:11:39Z,NONE,"OK, I'll try. Thanks.
But I originally tested if netCDF4 can work with a closed/reopened variable like this:
``` python
In [1]: import netCDF4
In [2]: a = netCDF4.Dataset(""temp.nc"", mode=""w"")
In [3]: a.createDimension(""lon"")
Out[3]: (unlimited): name = 'lon', size = 0
In [4]: a.createVariable(""lon"", ""f8"", dimensions=(""lon""))
Out[4]:
float64 lon(lon)
unlimited dimensions: lon
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
In [5]: v = a.variables['lon']
In [6]: v
Out[6]:
float64 lon(lon)
unlimited dimensions: lon
current shape = (0,)
filling on, default _FillValue of 9.969209968386869e+36 used
In [7]: a.close()
In [8]: v
Out[8]: ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
695 type_pprinters=self.type_printers,
696 deferred_pprinters=self.deferred_printers)
--> 697 printer.pretty(obj)
698 printer.flush()
699 return stream.getvalue()
/home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj)
381 if callable(meth):
382 return meth(obj, self, cycle)
--> 383 return _default_pprint(obj, self, cycle)
384 finally:
385 self.end_group()
/home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle)
501 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs:
502 # A user-provided repr. Find newlines and replace them with p.break_()
--> 503 _repr_pprint(obj, p, cycle)
504 return
505 p.begin_group(1, '<')
/home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
683 """"""A pprint that just redirects to the normal repr function.""""""
684 # Find newlines and replace them with p.break_()
--> 685 output = repr(obj)
686 for idx,output_line in enumerate(output.splitlines()):
687 if idx:
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__repr__ (netCDF4/_netCDF4.c:25045)()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.__unicode__ (netCDF4/_netCDF4.c:25243)()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.dimensions.__get__ (netCDF4/_netCDF4.c:27486)()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._getdims (netCDF4/_netCDF4.c:26297)()
RuntimeError: NetCDF: Not a valid ID
In [9]: a = netCDF4.Dataset(""temp.nc"")
In [10]: v
Out[10]:
class 'netCDF4._netCDF4.Variable'>
lon(lon)
dimensions: lon
shape = (0,)
on, default _FillValue of 9.969209968386869e+36 used
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-143338384,https://api.github.com/repos/pydata/xarray/issues/463,143338384,MDEyOklzc3VlQ29tbWVudDE0MzMzODM4NA==,380927,2015-09-25T20:02:42Z,2015-09-25T20:02:42Z,NONE,"I've only put the try - except there to conditionally set the breakpoint. How does it make a difference if the self.store.close is called? It it is not called then the dataset remains opened which should not cause the weird behaviour reported above?
Nevertheless I have updated my branch to use a contextmanager because it is a better solution but I still have this strange behaviour of only printing the variable altering the test outcome.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-143222580,https://api.github.com/repos/pydata/xarray/issues/463,143222580,MDEyOklzc3VlQ29tbWVudDE0MzIyMjU4MA==,380927,2015-09-25T13:27:59Z,2015-09-25T13:27:59Z,NONE,"I've pushed a few commits trying this out to https://github.com/cpaulik/xray/tree/closing_netcdf_backend . I can open a WIP PR if this would be easier to discuss there.
There are however a few tests that keep failing and I can not figure out why.
e.g.: `test_backends.py::NetCDF4ViaDaskDataTest::test_compression_encoding`:
If I set a breakpoint at [line 941 of dataset.py](https://github.com/cpaulik/xray/blob/closing_netcdf_backend/xray/core/dataset.py#L941) and just continue the test fails.
If I however evaluate `self.variables.items()` or even `self.variables` at the breakpoint I get the correct output and the test passes when continued. I can not really see the difference between me evaluating this in `ipdb` and the code that is on the line.
The error I get when running the test without interference is:
``` shell
test_backends.py::NetCDF4ViaDaskDataTest::test_compression_encoding FAILED
====================================================== FAILURES =======================================================
__________________________________ NetCDF4ViaDaskDataTest.test_compression_encoding ___________________________________
self =
def test_compression_encoding(self):
data = create_test_data()
data['var2'].encoding.update({'zlib': True,
'chunksizes': (5, 5),
'fletcher32': True})
> with self.roundtrip(data) as actual:
test_backends.py:502:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib/python2.7/contextlib.py:17: in __enter__
return self.gen.next()
test_backends.py:596: in roundtrip
yield ds.chunk()
../core/dataset.py:942: in chunk
for k, v in self.variables.items()])
../core/dataset.py:935: in maybe_chunk
token2 = tokenize(name, token if token else var._data)
/home/cpa/.virtualenvs/xray/local/lib/python2.7/site-packages/dask/base.py:152: in tokenize
return md5(str(tuple(map(normalize_token, args))).encode()).hexdigest()
../core/indexing.py:301: in __repr__
(type(self).__name__, self.array, self.key))
../core/utils.py:377: in __repr__
return '%s(array=%r)' % (type(self).__name__, self.array)
../core/indexing.py:301: in __repr__
(type(self).__name__, self.array, self.key))
../core/utils.py:377: in __repr__
return '%s(array=%r)' % (type(self).__name__, self.array)
netCDF4/_netCDF4.pyx:2931: in netCDF4._netCDF4.Variable.__repr__ (netCDF4/_netCDF4.c:25068)
???
netCDF4/_netCDF4.pyx:2938: in netCDF4._netCDF4.Variable.__unicode__ (netCDF4/_netCDF4.c:25243)
???
netCDF4/_netCDF4.pyx:3059: in netCDF4._netCDF4.Variable.dimensions.__get__ (netCDF4/_netCDF4.c:27486)
???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> ???
E RuntimeError: NetCDF: Not a valid ID
netCDF4/_netCDF4.pyx:2994: RuntimeError
============================================== 1 failed in 0.50 seconds ===============================================
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498
https://github.com/pydata/xarray/issues/463#issuecomment-142637232,https://api.github.com/repos/pydata/xarray/issues/463,142637232,MDEyOklzc3VlQ29tbWVudDE0MjYzNzIzMg==,380927,2015-09-23T15:19:36Z,2015-09-23T15:19:36Z,NONE,"I've run into the same problem and have been looking at the netCDF backend. A solution does not seem to be so easy as to open and close the file in the `__getitem__` method since this closes the file also for any other access e.g. attributes like `shape` or `dtype`.
Short of decorating all the functions of the netCDF4 package I can not think of a workable solution to this. But maybe I'm overlooking something fundamental.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,94328498