home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

12 rows where author_association = "MEMBER" and issue = 277538485 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • shoyer 10
  • crusaderky 2

issue 1

  • open_mfdataset() memory error in v0.10 · 12 ✖

author_association 1

  • MEMBER · 12 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
356390513 https://github.com/pydata/xarray/issues/1745#issuecomment-356390513 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1NjM5MDUxMw== shoyer 1217238 2018-01-09T19:36:10Z 2018-01-09T19:36:10Z MEMBER

Both the warning message and the upstream anaconda issue seem like good ideas to me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
352152392 https://github.com/pydata/xarray/issues/1745#issuecomment-352152392 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1MjE1MjM5Mg== shoyer 1217238 2017-12-16T01:58:02Z 2017-12-16T01:58:02Z MEMBER

If upgrating to a newer version of netcdf4-python isn't an option we might need to figure out a workaround for xarray....

It seems that anaconda is still distributing netCDF4 1.2.4, which doesn't help here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
351788352 https://github.com/pydata/xarray/issues/1745#issuecomment-351788352 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1MTc4ODM1Mg== shoyer 1217238 2017-12-14T17:58:05Z 2017-12-14T17:58:05Z MEMBER

Can you reproduce this just using netCDF4-python?

Try: ``` import netCDF4 ds = netCDF4.Dataset(path)

print(ds)

print(ds.filepath()) ```

If so, it would be good to file a bug upstream.

Actually, it looks like this might be https://github.com/Unidata/netcdf4-python/issues/506

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
351783850 https://github.com/pydata/xarray/issues/1745#issuecomment-351783850 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1MTc4Mzg1MA== shoyer 1217238 2017-12-14T17:41:05Z 2017-12-14T17:41:11Z MEMBER

I think there is probably a bug buried inside the netCDF4.Dataset.filepath() method somewhere. For example, on netCDF4-python 1.2.4, this would crash if you have any non-ASCII characters in the path. But that doesn't seem to be the issue here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
351780487 https://github.com/pydata/xarray/issues/1745#issuecomment-351780487 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1MTc4MDQ4Nw== shoyer 1217238 2017-12-14T17:28:37Z 2017-12-14T17:28:37Z MEMBER

@braaannigan can you try adding print(repr(path)) to is_remote_uri() so we can see exactly what these offending strings look like?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
351779445 https://github.com/pydata/xarray/issues/1745#issuecomment-351779445 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1MTc3OTQ0NQ== shoyer 1217238 2017-12-14T17:24:40Z 2017-12-14T17:24:40Z MEMBER

re.match(pattern, string) is equivalent to re.search('^' + pattern, string), so arguably this is a cleaner solution anyways. But ideally I'd like to understand why this is a problem for you, so we can fix the underlying cause and not do it again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
351765967 https://github.com/pydata/xarray/issues/1745#issuecomment-351765967 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1MTc2NTk2Nw== shoyer 1217238 2017-12-14T16:41:19Z 2017-12-14T16:41:19Z MEMBER

@braaannigan what about replacing re.search('^https?\://', path) with re.match('https?\://', path)? Can you share the output of running python -c 'import sys; print(sys.getfilesystemencoding())' at the command line? Also, please try engine='scipy' or engine='h5netcdf' with open_dataset. The output of xarray.show_versions() would also be helpful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
351470450 https://github.com/pydata/xarray/issues/1745#issuecomment-351470450 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM1MTQ3MDQ1MA== shoyer 1217238 2017-12-13T17:54:54Z 2017-12-13T17:54:54Z MEMBER

@braaannigan Can you share the name of your problematic file?

One possibility is that re.search() is not thread-safe, even though I don't think we call is_remote_uri from multiple threads. We can test that by adding a lock, and seeing if that resolves the issue. Try replacing is_remote_uri with: ```python import threading

LOCK = threading.Lock()

def is_remote_uri(path): with LOCK: return bool(re.search('^https?\://', path)) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
347856861 https://github.com/pydata/xarray/issues/1745#issuecomment-347856861 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM0Nzg1Njg2MQ== crusaderky 6213168 2017-11-29T13:15:29Z 2017-11-29T13:15:29Z MEMBER

Only if the coords are tridimensional..

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
347819491 https://github.com/pydata/xarray/issues/1745#issuecomment-347819491 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM0NzgxOTQ5MQ== shoyer 1217238 2017-11-29T10:34:25Z 2017-11-29T10:34:25Z MEMBER

(405*282*37)*20*8 bytes = 676 MB, so running out of memory here seems plausible to me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
347815737 https://github.com/pydata/xarray/issues/1745#issuecomment-347815737 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM0NzgxNTczNw== crusaderky 6213168 2017-11-29T10:19:52Z 2017-11-29T10:33:15Z MEMBER

It sounds weird. Even if all the 20 variables he's dropping were coords on the longest dim, and the code was loading them up into memory and then dropping them (that would be wrong - but I didn't check the code yet to verify if that's the case), then we're talking about... 405*20*73=~690k points? That's about 5mb of RAM if they're float64?

@njweber2 how large are these files? Is it feasible to upload them somewhere? If not, could you write a script that generates equivalent dummy data and reproduce the problem with that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485
347811473 https://github.com/pydata/xarray/issues/1745#issuecomment-347811473 https://api.github.com/repos/pydata/xarray/issues/1745 MDEyOklzc3VlQ29tbWVudDM0NzgxMTQ3Mw== shoyer 1217238 2017-11-29T10:03:51Z 2017-11-29T10:03:51Z MEMBER

I think this was introduced by https://github.com/pydata/xarray/pull/1551, where we started loading coordinates that are compared for equality into memory. This speeds up open_mfdataset, but does increase memory usage.

We might consider adding an option for reduced memory usage at the price of speed. @crusaderky @jhamman @rabernat any thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() memory error in v0.10 277538485

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 30.794ms · About: xarray-datasette