home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where author_association = "MEMBER", issue = 283388962 and user = 2443309 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

These facets timed out: author_association, issue

user 1

  • jhamman · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
371710575 https://github.com/pydata/xarray/pull/1793#issuecomment-371710575 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM3MTcxMDU3NQ== jhamman 2443309 2018-03-09T04:31:05Z 2018-03-09T04:31:05Z MEMBER

Any final comments on this? If not, I'll probably merge this in the next day or two.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
371345709 https://github.com/pydata/xarray/pull/1793#issuecomment-371345709 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM3MTM0NTcwOQ== jhamman 2443309 2018-03-08T01:26:27Z 2018-03-08T01:26:27Z MEMBER

All the test are passing here. I would appreciate another round of reviews.

@shoyer - all of your previous comments have been addressed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
369078817 https://github.com/pydata/xarray/pull/1793#issuecomment-369078817 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2OTA3ODgxNw== jhamman 2443309 2018-02-28T00:38:59Z 2018-02-28T00:38:59Z MEMBER

I've added some additional tests and cleaned up the implementation a bit. I'd like to get reviews from a few folks and hopefully get this merged later this week.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
367493976 https://github.com/pydata/xarray/pull/1793#issuecomment-367493976 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2NzQ5Mzk3Ng== jhamman 2443309 2018-02-21T22:15:09Z 2018-02-21T22:15:09Z MEMBER

Thanks all for the comments. I will clean this up a bit and request a full review later this week.

A few things to note:

  1. I have not tested save_mfdataset yet. In theory, it should work now but it will require some testing. I'll save that for another PR.
  2. I will raise an informative error message when either h5netcdf or scipy are used to write files along with distributed.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
367232132 https://github.com/pydata/xarray/pull/1793#issuecomment-367232132 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2NzIzMjEzMg== jhamman 2443309 2018-02-21T07:02:30Z 2018-02-21T07:02:30Z MEMBER

The battle of inches continues. Turning off HDF5's file locking fixes all the tests for netCDF4 (🎉 ). Scipy is not working and h5netcdf doesn't support autoclose so it isn't expected to work.

@shoyer - I don't totally understand the scipy constraints on incremental writes but could that be playing a factor here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
366605287 https://github.com/pydata/xarray/pull/1793#issuecomment-366605287 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2NjYwNTI4Nw== jhamman 2443309 2018-02-19T07:11:37Z 2018-02-19T07:11:37Z MEMBER

I've this down to 4 test failures:

test_dask_distributed_netcdf_integration_test[NETCDF3_CLASSIC-True-scipy] test_dask_distributed_netcdf_integration_test[NETCDF3_CLASSIC-False-scipy] test_dask_distributed_netcdf_integration_test[NETCDF4_CLASSIC-False-netcdf4] test_dask_distributed_netcdf_integration_test[NETCDF4-False-netcdf4]

I think I'm ready for an initial review. I've made some changes to autoclose and sync so I'd like to get feedback on my approach before I spend too much time sorting out the last few failures.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
366585598 https://github.com/pydata/xarray/pull/1793#issuecomment-366585598 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2NjU4NTU5OA== jhamman 2443309 2018-02-19T04:21:37Z 2018-02-19T04:21:37Z MEMBER

This is mostly working now. I'm getting a test failure from open_dataset + distributed + autoclose so there is something to sort out there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
366557470 https://github.com/pydata/xarray/pull/1793#issuecomment-366557470 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2NjU1NzQ3MA== jhamman 2443309 2018-02-18T23:18:28Z 2018-02-18T23:18:28Z MEMBER

@shoyer - I have this working with the netcdf4 backend for with the NETCDF3_CLASSIC file format. I'm still having some locking issues with the HDF library and I'm not sure why.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
363273602 https://github.com/pydata/xarray/pull/1793#issuecomment-363273602 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MzI3MzYwMg== jhamman 2443309 2018-02-06T00:57:05Z 2018-02-06T00:57:05Z MEMBER

I think we're getting close. We're currently failing during the sync step and I'm hypothesizing that it is due to the file not being closed after the setup steps. That said, I wasn't able to pinpoint why/where we're missing a close. I think this traceback is pretty informative:

Traceback (most recent call last): File "/Users/jhamman/anaconda/envs/xarray36/lib/python3.6/site-packages/distributed/worker.py", line 1255, in add_task self.tasks[key] = _deserialize(function, args, kwargs, task) File "/Users/jhamman/anaconda/envs/xarray36/lib/python3.6/site-packages/distributed/worker.py", line 641, in _deserialize args = pickle.loads(args) File "/Users/jhamman/anaconda/envs/xarray36/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads return pickle.loads(x) File "/Users/jhamman/Dropbox/src/xarray/xarray/backends/common.py", line 445, in __setstate__ self.ds = self._opener(mode=self._mode) File "/Users/jhamman/Dropbox/src/xarray/xarray/backends/netCDF4_.py", line 204, in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs) File "netCDF4/_netCDF4.pyx", line 2015, in netCDF4._netCDF4.Dataset.__init__ File "netCDF4/_netCDF4.pyx", line 1636, in netCDF4._netCDF4._ensure_nc_success OSError: [Errno -101] NetCDF: HDF error: b'/var/folders/v0/qnh7jvgx5gnglpxfztxdlhk00000gn/T/tmpn_mo662_/temp-0.nc' distributed.scheduler - ERROR - error from worker tcp://127.0.0.1:63248: [Errno -101] NetCDF: HDF error: b'/var/folders/v0/qnh7jvgx5gnglpxfztxdlhk00000gn/T/tmpn_mo662_/temp-0.nc' HDF5-DIAG: Error detected in HDF5 (1.10.1) thread 140736093991744: #000: H5F.c line 586 in H5Fopen(): unable to open file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1305 in H5F_open(): unable to lock the file major: File accessibilty minor: Unable to open file #002: H5FD.c line 1839 in H5FD_lock(): driver lock request failed major: Virtual File Layer minor: Can't update object #003: H5FDsec2.c line 940 in H5FD_sec2_lock(): unable to lock file, errno = 35, error message = 'Resource temporarily unavailable' major: File accessibilty minor: Bad file ID accessed

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
362721418 https://github.com/pydata/xarray/pull/1793#issuecomment-362721418 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MjcyMTQxOA== jhamman 2443309 2018-02-02T22:05:57Z 2018-02-02T22:05:57Z MEMBER

@mrocklin - What is the preferred method for determining which scheduler is being used?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
362657475 https://github.com/pydata/xarray/pull/1793#issuecomment-362657475 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MjY1NzQ3NQ== jhamman 2443309 2018-02-02T17:56:05Z 2018-02-02T17:56:05Z MEMBER

The tests failure indicates that the netcdf4/h5netcdf libraries cannot open the file in write/append mode, and it seems that is because the file is already open (by another process).

Two questions:

  1. autoclose is False to_netcdf. That generally makes sense to me but I'm concerned that we're not being explicit enough about closing the file after each process is done interacting with it. Do we have a way to lock until the file is closed?
  2. The lock we're using is dask's SerializableLock. Is that the correct Lock to be using? There is also the distributed.Lock.

xref: https://github.com/dask/dask/issues/1892

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
362644064 https://github.com/pydata/xarray/pull/1793#issuecomment-362644064 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MjY0NDA2NA== jhamman 2443309 2018-02-02T17:03:59Z 2018-02-02T17:37:49Z MEMBER

Thanks @mrocklin for taking a look here. I reworked the tests a bit more to put the to_netcdf inside the distributed cluster section.

Bad news is that the tests are failing again. The good news is we have a semi-informative error message that indicates we're missing a lock somewhere.

Link to most descriptive failing test: https://travis-ci.org/pydata/xarray/jobs/336643000#L5076

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
361106590 https://github.com/pydata/xarray/pull/1793#issuecomment-361106590 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MTEwNjU5MA== jhamman 2443309 2018-01-28T23:31:15Z 2018-01-28T23:31:15Z MEMBER

xref: https://github.com/pydata/xarray/issues/798 and https://github.com/dask/dask/issues/2488 which are both seem to be relevant to this discussion.

I'm also remembering @pwolfram was quite involved with the original distributed integration so pinging him to see if he is interested in this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
360659245 https://github.com/pydata/xarray/pull/1793#issuecomment-360659245 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MDY1OTI0NQ== jhamman 2443309 2018-01-26T01:43:52Z 2018-01-26T01:43:52Z MEMBER

Yes, the zarr backend here in xarray is also using dask.array.store and seems to work with distributed just fine.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
360328682 https://github.com/pydata/xarray/pull/1793#issuecomment-360328682 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MDMyODY4Mg== jhamman 2443309 2018-01-25T01:14:05Z 2018-01-25T01:15:10Z MEMBER

I've just taken another swing at this and come up empty. I open to ideas in the following areas:

  1. scipy backend is failing to roundtrip a length 1 datetime array: https://travis-ci.org/pydata/xarray/jobs/333068098#L4504
  2. scipy, netcdf4, and h5netcdf backends are all failing inside dask-distributed: https://travis-ci.org/pydata/xarray/jobs/333068098#L4919

The good news here is that only 8 tests are failing after applying the array wrapper so I suspect we're quite close. I'm hoping @shoyer may have some ideas on (1) since I think he had implemented some scipy workarounds in the past. @mrocklin, I'm hoping you can point me in the right direction.

All of these tests are reproducible locally.

(BTW, I have a use case that is going to need this functionality so I'm personally motivated to see it across the finish line)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
357069258 https://github.com/pydata/xarray/pull/1793#issuecomment-357069258 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM1NzA2OTI1OA== jhamman 2443309 2018-01-11T21:37:43Z 2018-01-11T21:37:43Z MEMBER

@mrocklin -

I have a test failing here with a familiar message.

E       TypeError: 'Future' object is not iterable

We saw this last week when debugging some pangeo things. Can you remind me what our solution was?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3913.857ms · About: xarray-datasette