home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

18 rows where issue = 709187212 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • martindurant 6
  • rabernat 3
  • rsignell-usgs 2
  • dcherian 2
  • keewis 2
  • shoyer 1
  • rafa-guedes 1
  • pep8speaks 1

author_association 3

  • MEMBER 8
  • CONTRIBUTOR 7
  • NONE 3

issue 1

  • Allow fsspec/zarr/mfdataset · 18 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
741889071 https://github.com/pydata/xarray/pull/4461#issuecomment-741889071 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTg4OTA3MQ== rsignell-usgs 1872600 2020-12-09T16:31:37Z 2021-01-19T14:46:49Z NONE

I'm really looking forward to getting this merged so I can open the National Water Model Zarr I created last week thusly:
python ds = xr.open_dataset(s3://noaa-nwm-retro-v2.0-zarr-pds', engine='zarr', backend_kwargs={'consolidated':True, "storage_options": {'anon':True}}) @martindurant tells me this takes only 3 s with the new async capability!

That would be pretty awesome, because now it takes 1min 15s to open this dataset!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
762428604 https://github.com/pydata/xarray/pull/4461#issuecomment-762428604 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc2MjQyODYwNA== martindurant 6042212 2021-01-18T19:15:25Z 2021-01-18T19:15:25Z CONTRIBUTOR

All interested parties, please see new attempt at https://github.com/pydata/xarray/pull/4823

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
748554375 https://github.com/pydata/xarray/pull/4461#issuecomment-748554375 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0ODU1NDM3NQ== rafa-guedes 7799184 2020-12-20T02:35:40Z 2020-12-20T09:10:27Z CONTRIBUTOR

@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?

@rsignell-usgs one other thing that can largely speed up loading of metadata / coordinates is ensuring coordinate variables are stored in one single chunk. For this particular dataset, chunk size for time coordinate is 672 yielding 339 chunks, which can take a while to load from remote bucket stores. If you rewrite time coordinate setting dset.time.encoding["chunks"] = (227904,) you should see a very large performance increase. One thing we have been doing for the cases of zarr archives that are appended in time, is defining time coordinate with a very large chunk size (e.g., dset.time.encoding["chunks"] = (10000000,)) when we first write the store. This ensures time coordinate will still fit in one single chunk after appending over time dimension, and does not affect chunking of the actual data variables.

One thing we have been having performance issues with is with loading coordinates / metadata from zarr archives that have too many chunks (millions), even when metadata is consolidated and coordinates are in one single chunk. There is an open issue in dask about this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
743287803 https://github.com/pydata/xarray/pull/4461#issuecomment-743287803 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MzI4NzgwMw== martindurant 6042212 2020-12-11T16:19:26Z 2020-12-11T16:19:26Z CONTRIBUTOR

Martin has gained by implementing this PR is transferrable

I'm not sure, it's been a while now...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
741949159 https://github.com/pydata/xarray/pull/4461#issuecomment-741949159 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTk0OTE1OQ== rabernat 1197350 2020-12-09T18:02:03Z 2020-12-09T18:02:11Z MEMBER

I think @shoyer has laid out the options in a very clear way.

I weakly favor option 2, as I think it preferable in terms of software architecture and our broader roadmap for Xarray. However, I am cognizant of the significant effort that @martindurant has put into this, and I don't want his effort to go to waste.

Some mitigating factors are: - The example I gave above (https://github.com/pydata/xarray/pull/4461#issuecomment-741939277) shows that one high-impact feature that users want (async capabilities in Zarr) already works, albiet with a different syntax. So this PR is more about convenience. - Presumably the knowledge about Xarray that Martin has gained by implementing this PR is transferrable to a different context, and so we would not be starting from scratch if we went with 2.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
741942375 https://github.com/pydata/xarray/pull/4461#issuecomment-741942375 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTk0MjM3NQ== rsignell-usgs 1872600 2020-12-09T17:50:04Z 2020-12-09T17:50:04Z NONE

@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
741939277 https://github.com/pydata/xarray/pull/4461#issuecomment-741939277 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTkzOTI3Nw== rabernat 1197350 2020-12-09T17:44:55Z 2020-12-09T17:44:55Z MEMBER

@rsignell-usgs: note that your example works without this PR (but with the just released zarr 2.6.1) as follows python mapper = fsspec.get_mapper('s3://noaa-nwm-retro-v2.0-zarr-pds') ds = xr.open_zarr(mapper, consolidated=True)

Took 4s on my laptop (outside of AWS).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
741928992 https://github.com/pydata/xarray/pull/4461#issuecomment-741928992 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTkyODk5Mg== shoyer 1217238 2020-12-09T17:37:01Z 2020-12-09T17:37:01Z MEMBER

We are excited about adding this feature! We love fsspec and think this would be very useful for xarray's users. In the long term, we would love to support fsspec for all the file formats that can handle file objects, e.g., including engine='h5netcdf' and engine='scipy'.

The concern right now is that this adds special case logic for zarr in open_dataset(), which @alexamici and @aurghs are presently (simultaneously!) trying to remove as part of paying down technical debt in the ongoing backends refactor.

I see two potential paths forwards:

  1. Merge this as is. It has good test coverage and porting should (hopefully!) be relatively straightforward to port.
  2. Insert this into the new backend API code instead, and require using the v2 backend API instead for this feature.

@alexamici could you please take a look and weigh in here? In particular, it would be helpful if you could point to where this would belong in the new refactor. This is also a good motivation for deleting the "v1" API code as soon as possible in favor of the "v2" code -- nothing is worse than needing to implement a new feature twice!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
741881966 https://github.com/pydata/xarray/pull/4461#issuecomment-741881966 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTg4MTk2Ng== martindurant 6042212 2020-12-09T16:20:33Z 2020-12-09T16:20:33Z CONTRIBUTOR

ping again

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
738214364 https://github.com/pydata/xarray/pull/4461#issuecomment-738214364 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDczODIxNDM2NA== dcherian 2448579 2020-12-03T18:48:25Z 2020-12-03T18:48:25Z MEMBER

We need to resolve this discussion in order to decide what to do about this PR. Any more thoughts from other devs.

ping @pydata/xarray

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
712144995 https://github.com/pydata/xarray/pull/4461#issuecomment-712144995 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDcxMjE0NDk5NQ== pep8speaks 24736507 2020-10-19T13:09:32Z 2020-11-30T14:39:18Z NONE

Hello @martindurant! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2020-11-30 14:39:18 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
735814666 https://github.com/pydata/xarray/pull/4461#issuecomment-735814666 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDczNTgxNDY2Ng== rabernat 1197350 2020-11-30T14:21:17Z 2020-11-30T14:21:17Z MEMBER

We let this go stale again. I just resolve the conflicts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
721365827 https://github.com/pydata/xarray/pull/4461#issuecomment-721365827 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDcyMTM2NTgyNw== martindurant 6042212 2020-11-03T20:46:57Z 2020-11-03T20:46:57Z CONTRIBUTOR

One completely unrelated failure (test_polyfit_warnings). Can I please get a final say here (@max-sixty @alexamici ?)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
712218057 https://github.com/pydata/xarray/pull/4461#issuecomment-712218057 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDcxMjIxODA1Nw== keewis 14808389 2020-10-19T14:48:23Z 2020-10-19T14:48:23Z MEMBER

(failures look like something in pandas dev)

yep, that's #4516

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
712194464 https://github.com/pydata/xarray/pull/4461#issuecomment-712194464 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDcxMjE5NDQ2NA== martindurant 6042212 2020-10-19T14:22:23Z 2020-10-19T14:22:23Z CONTRIBUTOR

(failures look like something in pandas dev)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
699179096 https://github.com/pydata/xarray/pull/4461#issuecomment-699179096 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDY5OTE3OTA5Ng== dcherian 2448579 2020-09-25T22:07:47Z 2020-09-25T22:26:45Z MEMBER

We'll have to maintain backward compatibility with older zarr versions for a bit so you'll have to skip the tests appropriately using a version check

EDIT: I didn't realize there are no tests in this PR yet. We definitely want current CI tests passing with older zarr versions.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
699174009 https://github.com/pydata/xarray/pull/4461#issuecomment-699174009 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDY5OTE3NDAwOQ== keewis 14808389 2020-09-25T21:54:06Z 2020-09-25T21:54:06Z MEMBER

is it reasonable to install from master

you might want to take a look at the upstream-dev CI which installs zarr from github (and is currently passing)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
699155033 https://github.com/pydata/xarray/pull/4461#issuecomment-699155033 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDY5OTE1NTAzMw== martindurant 6042212 2020-09-25T21:05:42Z 2020-09-25T21:05:42Z CONTRIBUTOR

Question: to eventually get tests to pass, will need changes only just now going into zarr. Those may be released some time soon, but in the meantime is it reasonable to install from master?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.521ms · About: xarray-datasette