home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

17 rows where issue = 613012939 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • shoyer 9
  • rabernat 3
  • nbren12 2
  • rafa-guedes 1
  • tomdurrant 1
  • zflamig 1

author_association 3

  • MEMBER 12
  • CONTRIBUTOR 3
  • NONE 2

issue 1

  • Support parallel writes to regions of zarr stores · 17 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
721534803 https://github.com/pydata/xarray/pull/4035#issuecomment-721534803 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcyMTUzNDgwMw== shoyer 1217238 2020-11-04T06:18:35Z 2020-11-04T06:18:35Z MEMBER

Should this checking be performed for all variables, or only for data_variables?

I agree that this requirement is a little surprising. The error is because otherwise you might be surprised that the array values for "latitude" and "longtitude" get overriden, rather than being checked for consistency. At least if you have to explicitly drop these variables (with the suggested call to .drop()) it is clear that they will neither be checked nor overriden in the Zarr store.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
721504192 https://github.com/pydata/xarray/pull/4035#issuecomment-721504192 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcyMTUwNDE5Mg== rafa-guedes 7799184 2020-11-04T04:23:58Z 2020-11-04T04:23:58Z CONTRIBUTOR

@shoyer thanks for implementing this, it is going to be very useful. I am trying to write this dataset below:

dsregion: ``` <xarray.Dataset> Dimensions: (latitude: 2041, longitude: 4320, time: 31) Coordinates: * latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0 * time (time) datetime64[ns] 2008-10-01T12:00:00 ... 2008-10-31T12:00:00 * longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667 Data variables: vo (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> uo (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> sst (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> ssh (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray>

```

As a region of this other dataset:

dset: <xarray.Dataset> Dimensions: (latitude: 2041, longitude: 4320, time: 9490) Coordinates: * latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0 * longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667 * time (time) datetime64[ns] 1993-01-01T12:00:00 ... 2018-12-25T12:00:00 Data variables: ssh (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> sst (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> uo (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> vo (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray>

Using the following call:

dsregion.to_zarr(dset_url, region={"time": slice(5752, 5783)})

But I got stuck on the conditional below within xarray/backends/api.py:

1347 non_matching_vars = [ 1348 k 1349 for k, v in ds_to_append.variables.items() 1350 if not set(region).intersection(v.dims) 1351 ] 1352 import ipdb; ipdb.set_trace() -> 1353 if non_matching_vars: 1354 raise ValueError( 1355 f"when setting `region` explicitly in to_zarr(), all " 1356 f"variables in the dataset to write must have at least " 1357 f"one dimension in common with the region's dimensions " 1358 f"{list(region.keys())}, but that is not " 1359 f"the case for some variables here. To drop these variables " 1360 f"from this dataset before exporting to zarr, write: " 1361 f".drop({non_matching_vars!r})" 1362 )

Apparently because time is not a dimension in coordinate variables ["longitude", "latitude"]:

ipdb> p non_matching_vars ['latitude', 'longitude'] ipdb> p set(region) {'time'}

Should this checking be performed for all variables, or only for data_variables?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
719081563 https://github.com/pydata/xarray/pull/4035#issuecomment-719081563 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcxOTA4MTU2Mw== shoyer 1217238 2020-10-29T23:30:48Z 2020-10-29T23:30:48Z MEMBER

If there are no additional reviews or objections, I will merge this tomorrow.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
716041586 https://github.com/pydata/xarray/pull/4035#issuecomment-716041586 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcxNjA0MTU4Ng== shoyer 1217238 2020-10-24T19:12:33Z 2020-10-24T19:12:33Z MEMBER

Anyone else want to take a look at this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
712649636 https://github.com/pydata/xarray/pull/4035#issuecomment-712649636 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcxMjY0OTYzNg== shoyer 1217238 2020-10-20T07:21:29Z 2020-10-20T07:21:29Z MEMBER

OK, I think this is ready for a final review.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
711502852 https://github.com/pydata/xarray/pull/4035#issuecomment-711502852 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcxMTUwMjg1Mg== shoyer 1217238 2020-10-19T04:01:54Z 2020-10-19T04:01:54Z MEMBER

But yes, we've also been successfully using this for parallel writes for a few months now (aside from the race condition).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
711501546 https://github.com/pydata/xarray/pull/4035#issuecomment-711501546 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcxMTUwMTU0Ng== shoyer 1217238 2020-10-19T04:00:48Z 2020-10-19T04:00:48Z MEMBER

I just fixed a race condition with writing attributes. Let me spend a little bit of time responding to Ryan's review, and then I think we can submit it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
711482135 https://github.com/pydata/xarray/pull/4035#issuecomment-711482135 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcxMTQ4MjEzNQ== tomdurrant 11531133 2020-10-19T02:55:52Z 2020-10-19T02:55:52Z NONE

This is a very desirable feature for us. We have been using this branch in development, and it is working great for our use case. We are reluctant to put into production until it is merged and released - is there any expected timeline for that to occur?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
656637518 https://github.com/pydata/xarray/pull/4035#issuecomment-656637518 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDY1NjYzNzUxOA== rabernat 1197350 2020-07-10T11:57:40Z 2020-07-10T11:57:40Z MEMBER

Zac, you may be interested in this thread

https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/32

Tom White managed to integrate dask with pywren via dask executor. This allows you to read / write zarr with lambda.

Sent from my iPhone

On Jul 9, 2020, at 6:41 PM, Stephan Hoyer notifications@github.com wrote:

 This looks nice. Is there a thought if this would work with functions as a service (GCP cloud functions, AWS Lambda, etc) for supporting parallel transformation from netcdf to zarr?

I haven't used function as a service before, but yes, I imagine this might be useful for that sort of thing. As long as you can figure out the structure of the overall Zarr datasets ahead of time, you could use region to fill out different parts entirely independently.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
656385475 https://github.com/pydata/xarray/pull/4035#issuecomment-656385475 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDY1NjM4NTQ3NQ== shoyer 1217238 2020-07-09T22:40:54Z 2020-07-09T22:43:34Z MEMBER

This looks nice. Is there a thought if this would work with functions as a service (GCP cloud functions, AWS Lambda, etc) for supporting parallel transformation from netcdf to zarr?

I haven't used functions as a service before, but yes, I imagine this might be useful for that sort of thing. As long as you can figure out the structure of the overall Zarr datasets ahead of time, you could use region to fill out different parts entirely independently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
656361151 https://github.com/pydata/xarray/pull/4035#issuecomment-656361151 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDY1NjM2MTE1MQ== zflamig 20603302 2020-07-09T21:30:51Z 2020-07-09T21:30:51Z NONE

This looks nice. Is there a thought if this would work with functions as a service (GCP cloud functions, AWS Lambda, etc) for supporting parallel transformation from netcdf to zarr?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
646216297 https://github.com/pydata/xarray/pull/4035#issuecomment-646216297 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDY0NjIxNjI5Nw== shoyer 1217238 2020-06-18T17:53:03Z 2020-06-18T17:53:03Z MEMBER

I've add error checking, tests and documentation, so this is ready for review now!

Take a look here for a rendered version of the new docs section: https://xray--4035.org.readthedocs.build/en/4035/io.html#appending-to-existing-zarr-stores

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
627799236 https://github.com/pydata/xarray/pull/4035#issuecomment-627799236 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDYyNzc5OTIzNg== nbren12 1386642 2020-05-13T07:22:40Z 2020-05-13T07:22:40Z CONTRIBUTOR

@rabernat I learn something new everyday. sorry for cluttering up this PR with my ignorance haha.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
627318136 https://github.com/pydata/xarray/pull/4035#issuecomment-627318136 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDYyNzMxODEzNg== rabernat 1197350 2020-05-12T12:42:12Z 2020-05-12T12:42:37Z MEMBER

A similar neat feature would be to read xarray datasets from regions of zarr groups w/o dask arrays.

@nbren12 - this has always been supported. Just call open_zarr(..., chunks=False) and then subset using sel / isel.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  Support parallel writes to regions of zarr stores 613012939
627090332 https://github.com/pydata/xarray/pull/4035#issuecomment-627090332 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDYyNzA5MDMzMg== nbren12 1386642 2020-05-12T03:44:14Z 2020-05-12T03:44:14Z CONTRIBUTOR

@rabernat pointed this PR out to me, and this is great progress towards allowing more database-like CRUD operations on zarr datasets. A similar neat feature would be to read xarray datasets from regions of zarr groups w/o dask arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
626207771 https://github.com/pydata/xarray/pull/4035#issuecomment-626207771 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDYyNjIwNzc3MQ== shoyer 1217238 2020-05-09T17:14:46Z 2020-05-09T17:14:46Z MEMBER

I'm curious how this interacts with dimension coordinates. Your example bypasses this. But what if dimension coordinates are present. How do we handle alignment issues? For example, what if I call ds.to_zarr(path , region=selection), but the dimension coordinates of ds don't align with the dimension coordinates of the store at path"

It’s entirely unsafe. Currently the coordinates would be overridden with the new values , which is consistent with how to_netcdf() with mode=‘a’ works.

This is probably another good reason for requiring users to explicitly drop variables that don’t include a dimension in the selected region, because at least in that case there can be no user expectations about alignment with coordinates that don’t exist.

In the long term, it might make sense to make both to_netcdf and to_zarr check coordinates by alignment by default, but we wouldn’t want that in all cases, because sometimes users really do want to update variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
625865523 https://github.com/pydata/xarray/pull/4035#issuecomment-625865523 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDYyNTg2NTUyMw== rabernat 1197350 2020-05-08T15:16:54Z 2020-05-08T15:16:54Z MEMBER

Stephan, this seems like a great addition. Thanks for getting it started!

I'm curious how this interacts with dimension coordinates. Your example bypasses this. But what if dimension coordinates are present. How do we handle alignment issues? For example, what if I call ds.to_zarr(path , region=selection), but the dimension coordinates of ds don't align with the dimension coordinates of the store at path"

  1. Officially document that the compute argument only controls writing array values, not metadata (at least for zarr).

:+1:

4. Like (2), but raise an error instead of a warning. Require the user to explicitly drop them with .drop(). This is probably the safest behavior.

:+1:

I think only advanced users will want to use this feature.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 726.479ms · About: xarray-datasette