home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where author_association = "MEMBER", issue = 94328498 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 15 ✖

issue 1

  • open_mfdataset too many files · 15 ✖

author_association 1

  • MEMBER · 15 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
347157526 https://github.com/pydata/xarray/issues/463#issuecomment-347157526 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDM0NzE1NzUyNg== shoyer 1217238 2017-11-27T11:40:35Z 2017-11-27T11:40:35Z MEMBER

Using autoclose=True should also fix this. On Mon, Nov 27, 2017 at 10:26 AM Sebastian Hahn notifications@github.com wrote:

Ok, I found my problem. I had to increase ulimit -n

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/463#issuecomment-347140117, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1mu2bDkvJoV-fAz8DVAKp22bOMATks5s6o5xgaJpZM4FWKen .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
288832922 https://github.com/pydata/xarray/issues/463#issuecomment-288832922 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDI4ODgzMjkyMg== shoyer 1217238 2017-03-23T19:22:43Z 2017-03-23T19:22:43Z MEMBER

OK, I'm closing this issue as "Fixed" by #1198. Feel free to open new issue for any follow-up concerns.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
263734251 https://github.com/pydata/xarray/issues/463#issuecomment-263734251 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDI2MzczNDI1MQ== shoyer 1217238 2016-11-29T23:30:02Z 2016-11-29T23:30:02Z MEMBER

if I understand correctly the best approach as you see it to build on opener via #1128, recognizing this will be essentially "upgraded" sometime in the future, right?

Yes, exactly. I plan to merge that PR very shortly, after a few fixes for the failing tests on Windows (less than an hour of work).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
263706346 https://github.com/pydata/xarray/issues/463#issuecomment-263706346 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDI2MzcwNjM0Ng== shoyer 1217238 2016-11-29T21:35:06Z 2016-11-29T21:35:06Z MEMBER

@pwolfram NcML is just an XML specification for how variables in a set of NetCDF files can be combined into a single virtual NetCDF file. This would be useful because it would allow building a version of open_mfdataset that doesn't need to inspect every single file. So this is definitely independent of the other options.

I suspect that even the LRU cache approach would build on opener from #1128. From a design perspective in the DataStore subclasses, I would guess that both the LRU cache and my latest suggestion should look pretty similar: the appropriate methods on DataStore and the data store Array subclasses will need to use something like with self._ensure_open(): block to guard all access to underlying file objects.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
263652409 https://github.com/pydata/xarray/issues/463#issuecomment-263652409 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDI2MzY1MjQwOQ== shoyer 1217238 2016-11-29T18:17:17Z 2016-11-29T18:17:17Z MEMBER

@shoyer is it ever feasible to read the first NetCDF file in a sequence and assume that they are all the same except to increment a datetime dimension by increasing days?

Sure. This should probably be a different wrapper function than open_mfdataset, though, one that can make stronger assumptions. For example, one might make a wrapper function for handling NcML.

@kmpaul thanks for sharing! This is useful background.

There is at least one other option worth considering. Instead of using the open file LRU cache, a simpler option could be to add an optional argument to xarray backends (building on opener from https://github.com/pydata/xarray/pull/1128) that switches them to open/close files every time data is accessed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
263437709 https://github.com/pydata/xarray/issues/463#issuecomment-263437709 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDI2MzQzNzcwOQ== shoyer 1217238 2016-11-29T00:19:53Z 2016-11-29T00:19:53Z MEMBER

if I understand correctly, incorporation of the LRU cache could help with this problem assuming time series were sliced into small chunks for access, correct? We would still run into problems, however, if there were say 10^6 files and we wanted to get a time-series spanning these files, right?

The LRU cache solution proposed in https://github.com/pydata/xarray/issues/798 would work in either case. It just would have poor performance when accessing a small piece of each of 10^6 files, both to build the graph (because xarray needs to open each file to read the metadata) and to do the actual computation (again, because of the need to open so many files). If you only need a small amount of data from many files, you probably want to reshape your data to minimize the amount of necessary file access no matter what, whether you do that reshaping with PyReshaper or xarray/dask.array/dask-distributed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
223838593 https://github.com/pydata/xarray/issues/463#issuecomment-223838593 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDIyMzgzODU5Mw== shoyer 1217238 2016-06-05T21:23:41Z 2016-06-05T21:23:41Z MEMBER

@mangecoeur I can take a look. Can you share an example of how you use the with block? Are you using any special options to open_mfdataset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
223663026 https://github.com/pydata/xarray/issues/463#issuecomment-223663026 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDIyMzY2MzAyNg== shoyer 1217238 2016-06-03T18:53:22Z 2016-06-03T18:53:22Z MEMBER

I suspect you hit this in IPython after rerunning cells, because file handles are only automatically closed when programs exit. You might find it a good idea to explicitly close files by calling .close() (or using a "with" statement) on Datasets opened with open_mfdataset.

On Fri, Jun 3, 2016 at 11:08 AM, mangecoeur notifications@github.com wrote:

I'm also running into this error - but strangely it only happens when using IPython interactive backend. I have some tests which work fine, but doing the same in IPython fails.

I'm opening a few hundred files (about 10Mb each, one per month across a few variables). I'm using the default NetCDF backend.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/463#issuecomment-223651454, or mute the thread https://github.com/notifications/unsubscribe/ABKS1sOTvuTtWVVFM7tnP7tnuGKvI-MBks5qIG2YgaJpZM4FWKen .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
143382040 https://github.com/pydata/xarray/issues/463#issuecomment-143382040 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDE0MzM4MjA0MA== shoyer 1217238 2015-09-26T00:22:51Z 2015-09-26T00:22:51Z MEMBER

OK, I think you could also just add an ensured_open() to the repr() method. Right now that class is inheriting it from NDArrayMixin.

On Fri, Sep 25, 2015 at 5:11 PM, Christoph Paulik notifications@github.com wrote:

OK, I'll try. Thanks. But I originally tested if netCDF4 can work with a closed/reopened variable like this:

``` python In [1]: import netCDF4 In [2]: a = netCDF4.Dataset("temp.nc", mode="w") In [3]: a.createDimension("lon") Out[3]: <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'lon', size = 0 In [4]: a.createVariable("lon", "f8", dimensions=("lon")) Out[4]: <class 'netCDF4._netCDF4.Variable'> float64 lon(lon) unlimited dimensions: lon current shape = (0,) filling on, default FillValue of 9.969209968386869e+36 used In [5]: v = a.variables['lon'] In [6]: v Out[6]: <class 'netCDF4._netCDF4.Variable'> float64 lon(lon) unlimited dimensions: lon current shape = (0,) filling on, default _FillValue of 9.969209968386869e+36 used In [7]: a.close() In [8]: v Out[8]: --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj) 695 type_pprinters=self.type_printers, 696 deferred_pprinters=self.deferred_printers) --> 697 printer.pretty(obj) 698 printer.flush() 699 return stream.getvalue() /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj) 381 if callable(meth): 382 return meth(obj, self, cycle) --> 383 return _default_pprint(obj, self, cycle) 384 finally: 385 self.end_group() /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _default_pprint(obj, p, cycle) 501 if _safe_getattr(klass, '__repr__', None) not in _baseclass_reprs: 502 # A user-provided repr. Find newlines and replace them with p.break() --> 503 repr_pprint(obj, p, cycle) 504 return 505 p.begin_group(1, '<') /home/cp/.pyenv/versions/miniconda3-3.16.0/envs/xray-3.5.0/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle) 683 """A pprint that just redirects to the normal repr function.""" 684 # Find newlines and replace them with p.break() --> 685 output = repr(obj) 686 for idx,output_line in enumerate(output.splitlines()): 687 if idx: netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.repr (netCDF4/_netCDF4.c:25045)() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.unicode (netCDF4/_netCDF4.c:25243)() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable.dimensions.get (netCDF4/_netCDF4.c:27486)() netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Variable._getdims (netCDF4/_netCDF4.c:26297)() RuntimeError: NetCDF: Not a valid ID In [9]: a = netCDF4.Dataset("temp.nc") In [10]: v Out[10]: class 'netCDF4._netCDF4.Variable'> lon(lon) dimensions: lon shape = (0,) on, default _FillValue of 9.969209968386869e+36 used


Reply to this email directly or view it on GitHub: https://github.com/xray/xray/issues/463#issuecomment-143373357 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
143347373 https://github.com/pydata/xarray/issues/463#issuecomment-143347373 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDE0MzM0NzM3Mw== shoyer 1217238 2015-09-25T20:35:38Z 2015-09-25T20:35:38Z MEMBER

OK, so the problem is that self.array on NetCDF4ArrayWrapper is retaining a reference to netCDF4.Variable object on the closed dataset. It's not enough to merely ensure that a netCDF4 dataset is opened -- you also need to ensure that no references to variables on the old dataset are still around. So get_variables/open_store_variable may need a refactor to deal with this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
143325053 https://github.com/pydata/xarray/issues/463#issuecomment-143325053 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDE0MzMyNTA1Mw== shoyer 1217238 2015-09-25T19:06:51Z 2015-09-25T19:06:51Z MEMBER

@cpaulik I wonder if the issue is this section in your __getitem__ method:

python data = getitem(self.array, key) try: self.store.ensure_open() data = getitem(self.array, key) except RuntimeError as e: raise e pass if self.ndim == 0: # work around for netCDF4-python's broken handling of 0-d # arrays (slicing them always returns a 1-dimensional array): # https://github.com/Unidata/netcdf4-python/pull/220 data = np.asscalar(data) self.store.close() return data

I would put self.store.close() in a finally clause following the getitem clause.

Actually, you probably want to put this in a context manager that automatically closes the file, something like:

python with self.store.opened(): data = getitem(self.array, key)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
142675701 https://github.com/pydata/xarray/issues/463#issuecomment-142675701 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDE0MjY3NTcwMQ== shoyer 1217238 2015-09-23T17:41:49Z 2015-09-23T17:41:49Z MEMBER

I think we can actually read all the variable metadata (shape and dtype) in when we open the file -- we already do that for reading in attributes. Something like this prototype, which would also be useful for reading compressed netCDF4 files with multiprocessing: https://github.com/blaze/dask/pull/457#issuecomment-123512166

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
120666380 https://github.com/pydata/xarray/issues/463#issuecomment-120666380 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDY2NjM4MA== shoyer 1217238 2015-07-11T22:36:30Z 2015-07-11T22:36:30Z MEMBER

Hmm. How big are each of your netCDF files?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
120448308 https://github.com/pydata/xarray/issues/463#issuecomment-120448308 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDQ0ODMwOA== shoyer 1217238 2015-07-10T16:12:52Z 2015-07-10T16:12:52Z MEMBER

Sure, you could do this on the scipy backend -- the logic will be essentially the same on both backends.

I believe your issue with netCDF4 backend is the same as this one: https://github.com/xray/xray/issues/444. This will be fixed in the next release.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498
120443929 https://github.com/pydata/xarray/issues/463#issuecomment-120443929 https://api.github.com/repos/pydata/xarray/issues/463 MDEyOklzc3VlQ29tbWVudDEyMDQ0MzkyOQ== shoyer 1217238 2015-07-10T15:58:41Z 2015-07-10T15:58:41Z MEMBER

Yes, this is a known issue, and I agree that it is annoying. We could work around this by opening up (and closing) netCDF files inside the __getitem__ call. If you're interested in possibly working on this, take a look at the netCDF4 backend for xray: https://github.com/xray/xray/blob/master/xray/backends/netCDF4_.py

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset too many files 94328498

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 236.057ms · About: xarray-datasette