home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where issue = 142498006 and user = 4295853 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • pwolfram · 15 ✖

issue 1

  • Integration with dask/distributed (xarray backend design) · 15 ✖

author_association 1

  • CONTRIBUTOR 15
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
288414396 https://github.com/pydata/xarray/issues/798#issuecomment-288414396 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI4ODQxNDM5Ng== pwolfram 4295853 2017-03-22T14:23:45Z 2017-03-22T14:23:45Z CONTRIBUTOR

@mrocklin and @shoyer, we now have dask.distributed and xarray support. Should this issue be closed?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
255192875 https://github.com/pydata/xarray/issues/798#issuecomment-255192875 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI1NTE5Mjg3NQ== pwolfram 4295853 2016-10-20T18:44:03Z 2016-10-20T18:44:03Z CONTRIBUTOR

@mrocklin, I would be happy to chat because I am interested in seeing this happen (e.g., eventually contributing code). The question is whether we need additional expertise from @shoyer, @jhamman, @rabernat etc who likely have a greater in-depth understanding of xarray than me. Perhaps this warrants an email to the wider list?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
255188697 https://github.com/pydata/xarray/issues/798#issuecomment-255188697 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDI1NTE4ODY5Nw== pwolfram 4295853 2016-10-20T18:28:24Z 2016-10-20T18:28:24Z CONTRIBUTOR

@kynan, I'm still interested in this but have not had time to advance this further. Are you interested in contributing to this too?

I view this as a key component of future climate analysis workflows. This may also be something that is addressed at the upcoming hackathon at Columbia with @rabernat early next month.

Also, I suspect that both @mrocklin and @shoyer would be willing to continue to provide key advice because this appears to be aligned with their interests too (please correct me if I'm wrong in this assessment).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
205481557 https://github.com/pydata/xarray/issues/798#issuecomment-205481557 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwNTQ4MTU1Nw== pwolfram 4295853 2016-04-04T20:32:23Z 2016-04-04T20:32:23Z CONTRIBUTOR

@shoyer, if if we are happy to open all netCDF files and read out the metadata from a master process that would imply that we would open a file, read the metadata, and then close it, correct?

Array access should then follow something like the @mrocklin's netcdf_Dataset approach, right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
205481269 https://github.com/pydata/xarray/issues/798#issuecomment-205481269 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwNTQ4MTI2OQ== pwolfram 4295853 2016-04-04T20:31:24Z 2016-04-04T20:31:24Z CONTRIBUTOR

@fmaussion, for 1. The LRU cache should be used serially for the read initially, but something more like @mrocklin's netcdf_Dataset appears to be needed as @shoyer points out. I need to think about this more. 2. I was thinking we would keep track of the file name outside the LRU and only use the filename to open up datasets inside the LRU if they aren't already open. Agreed that if file in LRU should designate whether the file is open.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
205478991 https://github.com/pydata/xarray/issues/798#issuecomment-205478991 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwNTQ3ODk5MQ== pwolfram 4295853 2016-04-04T20:24:41Z 2016-04-04T20:24:41Z CONTRIBUTOR

Just to be clear, we are talking about this https://github.com/mrocklin/hdf5lazy/blob/master/hdf5lazy/core.py#L83 for @mrocklin's netcdf_Dataset, right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
205133433 https://github.com/pydata/xarray/issues/798#issuecomment-205133433 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwNTEzMzQzMw== pwolfram 4295853 2016-04-04T04:35:09Z 2016-04-04T04:35:09Z CONTRIBUTOR

Thanks @mrocklin! This has been really helpful and was what I needed to get going.

A prelim design I'm seeing is to modify the NetCDF4DataStore class https://github.com/pydata/xarray/blob/master/xarray/backends/netCDF4_.py#L170 to meet these requirements: 1. At __init__, try to open file via the LRU cache. I think the LRU dict has to be a global because because the file restriction is an attribute of the system, correct? 2. For each read from a file, ensure it hasn't been closed via a @ds.getter property method. If so, reopen it via the LRU cache. This is ok because for a read the file is essentially read-only. The LRU closes out stale entries to prevent the too many open file errors. Checking this should be fast. 3. sync is only for a write but seems like it should following the above approach.

A clean way to do this is just to make sure that each time self.ds is called, it is re-validated via the LRU cache. This should be able to be implemented via property getter methods https://docs.python.org/2/library/functions.html#property.

Unless I'm missing something big, I don't think this change will require at large refactor but it is quite possible I overlooked something important. @shoyer and @mrocklin, do you see any obvious pitfalls in this scope? If not, it shouldn't be too hard to implement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
204770198 https://github.com/pydata/xarray/issues/798#issuecomment-204770198 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwNDc3MDE5OA== pwolfram 4295853 2016-04-02T18:25:18Z 2016-04-02T18:25:18Z CONTRIBUTOR

Another note in support of this PR, especially "robustly support HDF/NetCDF reads": I am having problems with NetCDF: HDF error as previously reported by @rabernat in https://github.com/pydata/xarray/issues/463. Thus, a solution here will save time and may arguably be on the critical path of some workflows because fewer jobs will fail and require baby-sitting/restarts, especially when dealing with running multiple jobs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
202696169 https://github.com/pydata/xarray/issues/798#issuecomment-202696169 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwMjY5NjE2OQ== pwolfram 4295853 2016-03-29T03:49:11Z 2016-03-29T14:24:02Z CONTRIBUTOR

Thanks @shoyer. If you can provide some guidance on bounds for the reorganization that would be really great. I want your and @jhamman's feedback on this before I try a solution. The trick is just to make the time, as always, and I may have some time this coming weekend.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
200878845 https://github.com/pydata/xarray/issues/798#issuecomment-200878845 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwMDg3ODg0NQ== pwolfram 4295853 2016-03-24T15:09:42Z 2016-03-24T15:13:18Z CONTRIBUTOR

This issue of connecting to dask/distributed may also be connected with https://github.com/pydata/xarray/issues/463, https://github.com/pydata/xarray/issues/591, and https://github.com/pydata/xarray/pull/524.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
200633312 https://github.com/pydata/xarray/issues/798#issuecomment-200633312 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwMDYzMzMxMg== pwolfram 4295853 2016-03-24T03:04:25Z 2016-03-24T03:04:25Z CONTRIBUTOR

Repeating @mrocklin:

Dask.array writes data to any object that supports numpy style setitem syntax like the following:

dataset[my_slice] = my_numpy_array

Objects like h5py.Dataset and netcdf objects support this syntax.

So dask.array would work today without modification if we had such an object that represented many netcdf files at once and supported numpy-style setitem syntax, placing the numpy array properly across the right files. This work could happen easily without deep knowledge of either project.

Alternatively, we could make the dask.array.store function optionally lazy so that users (or xarray) could call store many times before triggering execution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
200632521 https://github.com/pydata/xarray/issues/798#issuecomment-200632521 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDIwMDYzMjUyMQ== pwolfram 4295853 2016-03-24T03:02:19Z 2016-03-24T03:02:19Z CONTRIBUTOR

@shoyer and @mrocklin, I've updated the summary above in the PR description with a to do list. Do either of you see any obvious tasks I missed on the list in the PR description? If so, can you please update the to do list so that I can see what needs done to modify the backend for the dask/distributed integration?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
199547374 https://github.com/pydata/xarray/issues/798#issuecomment-199547374 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDE5OTU0NzM3NA== pwolfram 4295853 2016-03-22T00:01:55Z 2016-03-22T00:02:11Z CONTRIBUTOR

Here is an example of a use case for a nanmean over ensembles in collaboration with @mrocklin and following http://matthewrocklin.com/blog/work/2016/02/26/dask-distributed-part-3: https://gist.github.com/mrocklin/566a8d5c3f6721abf36f

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
199544731 https://github.com/pydata/xarray/issues/798#issuecomment-199544731 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDE5OTU0NDczMQ== pwolfram 4295853 2016-03-21T23:57:27Z 2016-03-21T23:57:27Z CONTRIBUTOR

See also https://github.com/dask/dask/issues/922

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006
199532452 https://github.com/pydata/xarray/issues/798#issuecomment-199532452 https://api.github.com/repos/pydata/xarray/issues/798 MDEyOklzc3VlQ29tbWVudDE5OTUzMjQ1Mg== pwolfram 4295853 2016-03-21T23:21:07Z 2016-03-21T23:21:07Z CONTRIBUTOR

The full mailing list discussion is at https://groups.google.com/d/msgid/xarray/CAJ8oX-E7Xx6NT4F6J8B4__Q-kBazoob9_qe_oFLi5hany9-%3DKQ%40mail.gmail.com?utm_medium=email&utm_source=footer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Integration with dask/distributed (xarray backend design) 142498006

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.223ms · About: xarray-datasette