issue_comments
11 rows where author_association = "NONE" and issue = 94328498 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- open_mfdataset too many files · 11 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
347165242 | https://github.com/pydata/xarray/issues/463#issuecomment-347165242 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDM0NzE2NTI0Mg== | sebhahn 5929935 | 2017-11-27T12:17:17Z | 2017-11-27T12:17:17Z | NONE | Thanks, I'll test it! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
347140117 | https://github.com/pydata/xarray/issues/463#issuecomment-347140117 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDM0NzE0MDExNw== | sebhahn 5929935 | 2017-11-27T10:26:56Z | 2017-11-27T10:26:56Z | NONE | Ok, I found my problem. I had to increase |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
347126256 | https://github.com/pydata/xarray/issues/463#issuecomment-347126256 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDM0NzEyNjI1Ng== | sebhahn 5929935 | 2017-11-27T09:33:29Z | 2017-11-27T09:33:29Z | NONE | @shoyer I just ran into this issue again (with 8000 files, each 50 kB), I'm using xarray 0.9.6 and work on some performance tests. Is there any upper limit of number of files?
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
288868053 | https://github.com/pydata/xarray/issues/463#issuecomment-288868053 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDI4ODg2ODA1Mw== | ajoros 2615433 | 2017-03-23T21:37:19Z | 2017-03-23T21:37:19Z | NONE | Yessir @pwolfram we are in business.! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
288835940 | https://github.com/pydata/xarray/issues/463#issuecomment-288835940 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDI4ODgzNTk0MA== | ajoros 2615433 | 2017-03-23T19:34:33Z | 2017-03-23T19:34:33Z | NONE | Thanks @pwolfram ... shot you a follow up email at your Gmail... |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
288829145 | https://github.com/pydata/xarray/issues/463#issuecomment-288829145 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDI4ODgyOTE0NQ== | ajoros 2615433 | 2017-03-23T19:08:37Z | 2017-03-23T19:08:37Z | NONE | Not sure this is good feedback at all but I just wanted to provide an additional problematic case, from my end, that is returning this "too many files" problem: NOTE: I have the latest xarray package. I have about 365 1.7MB Netcdf files that I am trying to read using open_mfdataset() and it continuously gives me the "too many files" error and completely hangs jupyter notebooks to the point where I have to ctrl+C out of it. Note that each netcdf contains a Dataset that is 195x195x1. Obviously it's not a file-size issue as I'm not dealing with multiple gigs worth of data. Should I increase the OSX open max file limit, or will that not solve anything in my case? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
224049602 | https://github.com/pydata/xarray/issues/463#issuecomment-224049602 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDIyNDA0OTYwMg== | darothen 4992424 | 2016-06-06T18:42:06Z | 2016-06-06T18:42:06Z | NONE | @mangecoeur, although it's not an xarray-based solution, I've found that by far the best solution to this problem is to transform your dataset from the "timeslice" format (which is convenient for models to write out - all the data at a given point in time, often in separate files for each time step) to "timeseries" format - a continuous format, where you have all the data for a single variable in a single (or much smaller collection of) files. NCAR published a great utility for converting batches of NetCDF output from timeslice to timeseries format here; it's significantly faster than any shell-script/CDO/NCO solution I've ever encountered, and it parallelizes extremely easily. Adding a simple post-processing step to convert my simulation output to timeseries format dramatically reduced my overall work time. Before, I had a separate handler which re-implemented open_mfdataset(), performed an intermediate reduction (usually extracting a variable), and then concatenated within xarray. This could get around the open file limit, but it wasn't fast. My pre-processed data is often still big - barely fitting within memory - but it's far easier to handle, and you can throw dask at it no problem to get huge speedups in analysis. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
143373357 | https://github.com/pydata/xarray/issues/463#issuecomment-143373357 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDE0MzM3MzM1Nw== | cpaulik 380927 | 2015-09-25T23:11:39Z | 2015-09-25T23:11:39Z | NONE | OK, I'll try. Thanks. But I originally tested if netCDF4 can work with a closed/reopened variable like this: ``` python In [1]: import netCDF4 In [2]: a = netCDF4.Dataset("temp.nc", mode="w") In [3]: a.createDimension("lon") Out[3]: <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lon', size = 0 In [4]: a.createVariable("lon", "f8", dimensions=("lon")) Out[4]: <class 'netCDF4._netCDF4.Variable'> float64 lon(lon) unlimited dimensions: lon current shape = (0,) filling on, default _FillValue of 9.969209968386869e+36 used In [5]: v = a.variables['lon'] In [6]: v Out[6]: <class 'netCDF4._netCDF4.Variable'> float64 lon(lon) unlimited dimensions: lon current shape = (0,) filling on, default _FillValue of 9.969209968386869e+36 used In [7]: a.close() In [8]: v Out[8]: --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/core/formatters.py in call(self, obj) 695 type_pprinters=self.type_printers, 696 deferred_pprinters=self.deferred_printers) --> 697 printer.pretty(obj) 698 printer.flush() 699 return stream.getvalue() /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj) 381 if callable(meth): 382 return meth(obj, self, cycle) --> 383 return default_pprint(obj, self, cycle) 384 finally: 385 self.end_group() /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle) 501 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs: 502 # A user-provided repr. Find newlines and replace them with p.break() --> 503 repr_pprint(obj, p, cycle) 504 return 505 p.begin_group(1, '<') /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle) 683 """A pprint that just redirects to the normal repr function.""" 684 # Find newlines and replace them with p.break() --> 685 output = repr(obj) 686 for idx,output_line in enumerate(output.splitlines()): 687 if idx: netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.repr (netCDF4/_netCDF4.c:25045)() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.unicode (netCDF4/_netCDF4.c:25243)() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.dimensions.get (netCDF4/_netCDF4.c:27486)() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._getdims (netCDF4/_netCDF4.c:26297)() RuntimeError: NetCDF: Not a valid ID In [9]: a = netCDF4.Dataset("temp.nc") In [10]: v Out[10]: class 'netCDF4._netCDF4.Variable'> lon(lon) dimensions: lon shape = (0,) on, default _FillValue of 9.969209968386869e+36 used ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
143338384 | https://github.com/pydata/xarray/issues/463#issuecomment-143338384 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDE0MzMzODM4NA== | cpaulik 380927 | 2015-09-25T20:02:42Z | 2015-09-25T20:02:42Z | NONE | I've only put the try - except there to conditionally set the breakpoint. How does it make a difference if the self.store.close is called? It it is not called then the dataset remains opened which should not cause the weird behaviour reported above? Nevertheless I have updated my branch to use a contextmanager because it is a better solution but I still have this strange behaviour of only printing the variable altering the test outcome. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
143222580 | https://github.com/pydata/xarray/issues/463#issuecomment-143222580 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDE0MzIyMjU4MA== | cpaulik 380927 | 2015-09-25T13:27:59Z | 2015-09-25T13:27:59Z | NONE | I've pushed a few commits trying this out to https://github.com/cpaulik/xray/tree/closing_netcdf_backend . I can open a WIP PR if this would be easier to discuss there. There are however a few tests that keep failing and I can not figure out why. e.g.: If I set a breakpoint at line 941 of dataset.py and just continue the test fails. If I however evaluate The error I get when running the test without interference is: ``` shell test_backends.py::NetCDF4ViaDaskDataTest::test_compression_encoding FAILED ====================================================== FAILURES ======================================================= ______ NetCDF4ViaDaskDataTest.test_compression_encoding _________ self = <xray.test.test_backends.NetCDF4ViaDaskDataTest testMethod=test_compression_encoding>
/usr/lib/python2.7/contextlib.py:17: in enter return self.gen.next() test_backends.py:596: in roundtrip yield ds.chunk() ../core/dataset.py:942: in chunk for k, v in self.variables.items()]) ../core/dataset.py:935: in maybe_chunk token2 = tokenize(name, token if token else var._data) /home/cpa/.virtualenvs/xray/local/lib/python2.7/site-packages/dask/base.py:152: in tokenize return md5(str(tuple(map(normalize_token, args))).encode()).hexdigest() ../core/indexing.py:301: in repr (type(self).name, self.array, self.key)) ../core/utils.py:377: in repr return '%s(array=%r)' % (type(self).name, self.array) ../core/indexing.py:301: in repr (type(self).name, self.array, self.key)) ../core/utils.py:377: in repr return '%s(array=%r)' % (type(self).name, self.array) netCDF4/_netCDF4.pyx:2931: in netCDF4._netCDF4.Variable.repr (netCDF4/_netCDF4.c:25068) ??? netCDF4/_netCDF4.pyx:2938: in netCDF4._netCDF4.Variable.unicode (netCDF4/_netCDF4.c:25243) ??? netCDF4/_netCDF4.pyx:3059: in netCDF4._netCDF4.Variable.dimensions.get (netCDF4/_netCDF4.c:27486) ???
netCDF4/_netCDF4.pyx:2994: RuntimeError ============================================== 1 failed in 0.50 seconds =============================================== ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 | |
142637232 | https://github.com/pydata/xarray/issues/463#issuecomment-142637232 | https://api.github.com/repos/pydata/xarray/issues/463 | MDEyOklzc3VlQ29tbWVudDE0MjYzNzIzMg== | cpaulik 380927 | 2015-09-23T15:19:36Z | 2015-09-23T15:19:36Z | NONE | I've run into the same problem and have been looking at the netCDF backend. A solution does not seem to be so easy as to open and close the file in the Short of decorating all the functions of the netCDF4 package I can not think of a workable solution to this. But maybe I'm overlooking something fundamental. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset too many files 94328498 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4