html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/4035#issuecomment-721534803,https://api.github.com/repos/pydata/xarray/issues/4035,721534803,MDEyOklzc3VlQ29tbWVudDcyMTUzNDgwMw==,1217238,2020-11-04T06:18:35Z,2020-11-04T06:18:35Z,MEMBER,"> Should this checking be performed for all variables, or only for data_variables? I agree that this requirement is a little surprising. The error is because otherwise you might be surprised that the array values for ""latitude"" and ""longtitude"" get overriden, rather than being checked for consistency. At least if you have to explicitly drop these variables (with the suggested call to `.drop()`) it is clear that they will neither be checked nor overriden in the Zarr store.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-721504192,https://api.github.com/repos/pydata/xarray/issues/4035,721504192,MDEyOklzc3VlQ29tbWVudDcyMTUwNDE5Mg==,7799184,2020-11-04T04:23:58Z,2020-11-04T04:23:58Z,CONTRIBUTOR,"@shoyer thanks for implementing this, it is going to be very useful. I am trying to write this dataset below: dsregion: ``` Dimensions: (latitude: 2041, longitude: 4320, time: 31) Coordinates: * latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0 * time (time) datetime64[ns] 2008-10-01T12:00:00 ... 2008-10-31T12:00:00 * longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667 Data variables: vo (time, latitude, longitude) float32 dask.array uo (time, latitude, longitude) float32 dask.array sst (time, latitude, longitude) float32 dask.array ssh (time, latitude, longitude) float32 dask.array ``` As a region of this other dataset: dset: ``` Dimensions: (latitude: 2041, longitude: 4320, time: 9490) Coordinates: * latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0 * longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667 * time (time) datetime64[ns] 1993-01-01T12:00:00 ... 2018-12-25T12:00:00 Data variables: ssh (time, latitude, longitude) float64 dask.array sst (time, latitude, longitude) float64 dask.array uo (time, latitude, longitude) float64 dask.array vo (time, latitude, longitude) float64 dask.array ``` Using the following call: ``` dsregion.to_zarr(dset_url, region={""time"": slice(5752, 5783)}) ``` But I got stuck on the conditional below within `xarray/backends/api.py`: ``` 1347 non_matching_vars = [ 1348 k 1349 for k, v in ds_to_append.variables.items() 1350 if not set(region).intersection(v.dims) 1351 ] 1352 import ipdb; ipdb.set_trace() -> 1353 if non_matching_vars: 1354 raise ValueError( 1355 f""when setting `region` explicitly in to_zarr(), all "" 1356 f""variables in the dataset to write must have at least "" 1357 f""one dimension in common with the region's dimensions "" 1358 f""{list(region.keys())}, but that is not "" 1359 f""the case for some variables here. To drop these variables "" 1360 f""from this dataset before exporting to zarr, write: "" 1361 f"".drop({non_matching_vars!r})"" 1362 ) ``` Apparently because `time` is not a dimension in coordinate variables [""longitude"", ""latitude""]: ``` ipdb> p non_matching_vars ['latitude', 'longitude'] ipdb> p set(region) {'time'} ``` Should this checking be performed for all variables, or only for data_variables? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-719081563,https://api.github.com/repos/pydata/xarray/issues/4035,719081563,MDEyOklzc3VlQ29tbWVudDcxOTA4MTU2Mw==,1217238,2020-10-29T23:30:48Z,2020-10-29T23:30:48Z,MEMBER,"If there are no additional reviews or objections, I will merge this tomorrow.","{""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-716041586,https://api.github.com/repos/pydata/xarray/issues/4035,716041586,MDEyOklzc3VlQ29tbWVudDcxNjA0MTU4Ng==,1217238,2020-10-24T19:12:33Z,2020-10-24T19:12:33Z,MEMBER,Anyone else want to take a look at this?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-712649636,https://api.github.com/repos/pydata/xarray/issues/4035,712649636,MDEyOklzc3VlQ29tbWVudDcxMjY0OTYzNg==,1217238,2020-10-20T07:21:29Z,2020-10-20T07:21:29Z,MEMBER,"OK, I think this is ready for a final review.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-711502852,https://api.github.com/repos/pydata/xarray/issues/4035,711502852,MDEyOklzc3VlQ29tbWVudDcxMTUwMjg1Mg==,1217238,2020-10-19T04:01:54Z,2020-10-19T04:01:54Z,MEMBER,"But yes, we've also been successfully using this for parallel writes for a few months now (aside from the race condition).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-711501546,https://api.github.com/repos/pydata/xarray/issues/4035,711501546,MDEyOklzc3VlQ29tbWVudDcxMTUwMTU0Ng==,1217238,2020-10-19T04:00:48Z,2020-10-19T04:00:48Z,MEMBER,"I just fixed a race condition with writing attributes. Let me spend a little bit of time responding to Ryan's review, and then I think we can submit it.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-711482135,https://api.github.com/repos/pydata/xarray/issues/4035,711482135,MDEyOklzc3VlQ29tbWVudDcxMTQ4MjEzNQ==,11531133,2020-10-19T02:55:52Z,2020-10-19T02:55:52Z,NONE,"This is a very desirable feature for us. We have been using this branch in development, and it is working great for our use case. We are reluctant to put into production until it is merged and released - is there any expected timeline for that to occur? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-656637518,https://api.github.com/repos/pydata/xarray/issues/4035,656637518,MDEyOklzc3VlQ29tbWVudDY1NjYzNzUxOA==,1197350,2020-07-10T11:57:40Z,2020-07-10T11:57:40Z,MEMBER,"Zac, you may be interested in this thread https://discourse.pangeo.io/t/best-practices-to-go-from-1000s-of-netcdf-files-to-analyses-on-a-hpc-cluster/588/32 Tom White managed to integrate dask with pywren via dask executor. This allows you to read / write zarr with lambda. Sent from my iPhone > On Jul 9, 2020, at 6:41 PM, Stephan Hoyer wrote: > >  > This looks nice. Is there a thought if this would work with functions as a service (GCP cloud functions, AWS Lambda, etc) for supporting parallel transformation from netcdf to zarr? > > I haven't used function as a service before, but yes, I imagine this might be useful for that sort of thing. As long as you can figure out the structure of the overall Zarr datasets ahead of time, you could use region to fill out different parts entirely independently. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-656385475,https://api.github.com/repos/pydata/xarray/issues/4035,656385475,MDEyOklzc3VlQ29tbWVudDY1NjM4NTQ3NQ==,1217238,2020-07-09T22:40:54Z,2020-07-09T22:43:34Z,MEMBER,"> This looks nice. Is there a thought if this would work with functions as a service (GCP cloud functions, AWS Lambda, etc) for supporting parallel transformation from netcdf to zarr? I haven't used functions as a service before, but yes, I imagine this might be useful for that sort of thing. As long as you can figure out the structure of the overall Zarr datasets ahead of time, you could use `region` to fill out different parts entirely independently.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-656361151,https://api.github.com/repos/pydata/xarray/issues/4035,656361151,MDEyOklzc3VlQ29tbWVudDY1NjM2MTE1MQ==,20603302,2020-07-09T21:30:51Z,2020-07-09T21:30:51Z,NONE,"This looks nice. Is there a thought if this would work with functions as a service (GCP cloud functions, AWS Lambda, etc) for supporting parallel transformation from netcdf to zarr?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-646216297,https://api.github.com/repos/pydata/xarray/issues/4035,646216297,MDEyOklzc3VlQ29tbWVudDY0NjIxNjI5Nw==,1217238,2020-06-18T17:53:03Z,2020-06-18T17:53:03Z,MEMBER,"I've add error checking, tests and documentation, so this is ready for review now! Take a look here for a rendered version of the new docs section: https://xray--4035.org.readthedocs.build/en/4035/io.html#appending-to-existing-zarr-stores","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-627799236,https://api.github.com/repos/pydata/xarray/issues/4035,627799236,MDEyOklzc3VlQ29tbWVudDYyNzc5OTIzNg==,1386642,2020-05-13T07:22:40Z,2020-05-13T07:22:40Z,CONTRIBUTOR,@rabernat I learn something new everyday. sorry for cluttering up this PR with my ignorance haha.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-627318136,https://api.github.com/repos/pydata/xarray/issues/4035,627318136,MDEyOklzc3VlQ29tbWVudDYyNzMxODEzNg==,1197350,2020-05-12T12:42:12Z,2020-05-12T12:42:37Z,MEMBER,"> A similar neat feature would be to read xarray datasets from regions of zarr groups w/o dask arrays. @nbren12 - this has always been supported. Just call `open_zarr(..., chunks=False)` and then subset using `sel` / `isel`. ","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-627090332,https://api.github.com/repos/pydata/xarray/issues/4035,627090332,MDEyOklzc3VlQ29tbWVudDYyNzA5MDMzMg==,1386642,2020-05-12T03:44:14Z,2020-05-12T03:44:14Z,CONTRIBUTOR,"@rabernat pointed this PR out to me, and this is great progress towards allowing more database-like CRUD operations on zarr datasets. A similar neat feature would be to read xarray datasets from regions of zarr groups w/o dask arrays.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-626207771,https://api.github.com/repos/pydata/xarray/issues/4035,626207771,MDEyOklzc3VlQ29tbWVudDYyNjIwNzc3MQ==,1217238,2020-05-09T17:14:46Z,2020-05-09T17:14:46Z,MEMBER,"> I'm curious how this interacts with dimension coordinates. Your example bypasses this. But what if dimension coordinates are present. How do we handle alignment issues? For example, what if I call `ds.to_zarr(path , region=selection)`, but the dimension coordinates of `ds` don't align with the dimension coordinates of the store at `path`"" It’s entirely unsafe. Currently the coordinates would be overridden with the new values , which is consistent with how to_netcdf() with mode=‘a’ works. This is probably another good reason for requiring users to explicitly drop variables that don’t include a dimension in the selected region, because at least in that case there can be no user expectations about alignment with coordinates that don’t exist. In the long term, it might make sense to make both to_netcdf and to_zarr check coordinates by alignment by default, but we wouldn’t want that in all cases, because sometimes users really do want to update variables. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939 https://github.com/pydata/xarray/pull/4035#issuecomment-625865523,https://api.github.com/repos/pydata/xarray/issues/4035,625865523,MDEyOklzc3VlQ29tbWVudDYyNTg2NTUyMw==,1197350,2020-05-08T15:16:54Z,2020-05-08T15:16:54Z,MEMBER,"Stephan, this seems like a great addition. Thanks for getting it started! I'm curious how this interacts with dimension coordinates. Your example bypasses this. But what if dimension coordinates are present. How do we handle alignment issues? For example, what if I call `ds.to_zarr(path , region=selection)`, but the dimension coordinates of `ds` don't align with the dimension coordinates of the store at `path`"" > 1. Officially document that the `compute` argument only controls writing array values, not metadata (at least for zarr). :+1: > 4\. Like (2), but raise an error instead of a warning. Require the user to explicitly drop them with `.drop()`. This is probably the safest behavior. :+1: I think only advanced users will want to use this feature.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939