html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/2706#issuecomment-506994997,https://api.github.com/repos/pydata/xarray/issues/2706,506994997,MDEyOklzc3VlQ29tbWVudDUwNjk5NDk5Nw==,1217238,2019-06-29T23:43:50Z,2019-06-29T23:43:50Z,MEMBER,"OK, thank you @shikharsg, @jendrikjoe and everyone else who worked on this !","{""total_count"": 4, ""+1"": 4, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-457368077,https://api.github.com/repos/pydata/xarray/issues/2706,457368077,MDEyOklzc3VlQ29tbWVudDQ1NzM2ODA3Nw==,24736507,2019-01-24T21:43:09Z,2019-06-29T22:49:41Z,NONE,"Hello @jendrikjoe! Thanks for updating this PR. We checked the lines you've touched for [PEP 8](https://www.python.org/dev/peps/pep-0008) issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers: ##### Comment last updated at 2019-06-29 22:49:41 UTC","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506984973,https://api.github.com/repos/pydata/xarray/issues/2706,506984973,MDEyOklzc3VlQ29tbWVudDUwNjk4NDk3Mw==,1217238,2019-06-29T20:27:10Z,2019-06-29T20:27:10Z,MEMBER,@shikharsg the test failure should be fixed on master (by https://github.com/pydata/xarray/pull/3059).,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506958854,https://api.github.com/repos/pydata/xarray/issues/2706,506958854,MDEyOklzc3VlQ29tbWVudDUwNjk1ODg1NA==,8643775,2019-06-29T13:57:31Z,2019-06-29T13:57:31Z,NONE,"I have implemented all the changes suggested and refactored the append tests as all tests were previously crammed into `test_write_persistence_modes` I'm not sure why the build fails. In all the failed checks, it is these two tests that are failing: ``` ================================== FAILURES =================================== _________________________ test_rolling_properties[1] __________________________ da = array([[[0.561926, 0.243845, 0.601879, 0.733398], [0.500418, 0.84942...ordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-21 Dimensions without coordinates: a, x def test_rolling_properties(da): rolling_obj = da.rolling(time=4) assert rolling_obj.obj.get_axis_num('time') == 1 # catching invalid args with pytest.raises(ValueError) as exception: da.rolling(time=7, x=2) > assert 'exactly one dim/window should' in str(exception) E AssertionError: assert 'exactly one dim/window should' in '' E + where '' = str() xarray\tests\test_dataarray.py:3715: AssertionError _________________________ test_rolling_properties[1] __________________________ ds = Dimensions: (time: 10, x: 8, y: 2) Coordinates: * x (x) float64 0.0 0.1429 0.2857 0.4286 0....-1.152 -0.6704 ... -0.9796 -1.884 0.4049 z2 (time, y) float64 -1.218 -0.9627 -1.398 ... -0.3552 0.1446 0.3392 def test_rolling_properties(ds): # catching invalid args with pytest.raises(ValueError) as exception: ds.rolling(time=7, x=2) > assert 'exactly one dim/window should' in str(exception) E AssertionError: assert 'exactly one dim/window should' in '' E + where '' = str() xarray\tests\test_dataset.py:4845: AssertionError ============================== warnings summary =============================== ``` I have no idea why as the same two tests pass on my local machine","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506418064,https://api.github.com/repos/pydata/xarray/issues/2706,506418064,MDEyOklzc3VlQ29tbWVudDUwNjQxODA2NA==,8643775,2019-06-27T16:27:58Z,2019-06-27T16:27:58Z,NONE,it's done. I fixed it by opening the zarr dataset beforehand using `xr.open_zarr`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506403594,https://api.github.com/repos/pydata/xarray/issues/2706,506403594,MDEyOklzc3VlQ29tbWVudDUwNjQwMzU5NA==,8643775,2019-06-27T15:50:36Z,2019-06-27T15:50:36Z,NONE,"> > adding a new variable currently errors if we don't provide the `append_dim` argument: > > Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code. @rabernat , the scenario I am talking about is adding a new `DataArray` to an existing `Dataset`(in which case we do not have to specify an `append_dim` argument). Yes it is covered by tests, specifically see the `with` clause here: https://github.com/pydata/xarray/pull/2706/files#diff-df47fcb9c2f1f7dfc0c6032d97072af2R1636 > Just to be clear, we do always requiring writing `append_dim` if you want to append values along a dimension, right? And we raise an informative error if you write `append_dim='not-a-valid-dimension'`? @shoyer We do always require `append_dim` when appending to an existing array, but I just realized that it does not raise an error when `append_dim='not-valid'`, but silently fails to append to the existing array. Let me write a test for that and push","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506390414,https://api.github.com/repos/pydata/xarray/issues/2706,506390414,MDEyOklzc3VlQ29tbWVudDUwNjM5MDQxNA==,1217238,2019-06-27T15:18:00Z,2019-06-27T15:18:00Z,MEMBER,"Just to be clear, we do always requiring writing `append_dim` if you want to append values along a dimension, right? And we raise an informative error if you write `append_dim='not-a-valid-dimension'`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506383042,https://api.github.com/repos/pydata/xarray/issues/2706,506383042,MDEyOklzc3VlQ29tbWVudDUwNjM4MzA0Mg==,1197350,2019-06-27T14:59:52Z,2019-06-27T14:59:52Z,MEMBER,"> adding a new variable currently errors if we don't provide the `append_dim` argument: Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506363156,https://api.github.com/repos/pydata/xarray/issues/2706,506363156,MDEyOklzc3VlQ29tbWVudDUwNjM2MzE1Ng==,8643775,2019-06-27T14:11:55Z,2019-06-27T14:11:55Z,NONE,"I have fixed the above error now and all comments have now been addressed. @rabernat @shoyer ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506288004,https://api.github.com/repos/pydata/xarray/issues/2706,506288004,MDEyOklzc3VlQ29tbWVudDUwNjI4ODAwNA==,8643775,2019-06-27T10:21:53Z,2019-06-27T10:21:53Z,NONE,"adding a new variable currently errors if we don't provide the `append_dim` argument: ```python >>> import xarray as xr >>> import pandas as pd >>> ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> store = dict() >>> ds0.to_zarr(store, mode='w') >>> ds1.to_zarr(store, mode='a') Traceback (most recent call last): File """", line 1, in File ""/home/shikhar/code/xarray/xarray/core/dataset.py"", line 1374, in to_zarr consolidated=consolidated, append_dim=append_dim) File ""/home/shikhar/code/xarray/xarray/backends/api.py"", line 1071, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File ""/home/shikhar/code/xarray/xarray/backends/api.py"", line 928, in dump_to_store unlimited_dims=unlimited_dims) File ""/home/shikhar/code/xarray/xarray/backends/zarr.py"", line 366, in store unlimited_dims=unlimited_dims) File ""/home/shikhar/code/xarray/xarray/backends/zarr.py"", line 406, in set_variables ""was not set"".format(name) ValueError: variable 'time' already exists, but append_dim was not set ``` this works: ```python >>> import xarray as xr >>> import pandas as pd >>> ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> store = dict() >>> ds0.to_zarr(store, mode='w') >>> ds1.to_zarr(store, mode='a', append_dim='asdfasdf') >>> xr.open_zarr(store) Dimensions: (time: 3) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: pressure (time) int64 dask.array temperature (time) int64 dask.array ``` will push a fix for this in a bit","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-505958006,https://api.github.com/repos/pydata/xarray/issues/2706,505958006,MDEyOklzc3VlQ29tbWVudDUwNTk1ODAwNg==,1217238,2019-06-26T16:52:59Z,2019-06-26T16:52:59Z,MEMBER,"The Appveyor build failures are definitely not your fault, please ignore them","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-505957913,https://api.github.com/repos/pydata/xarray/issues/2706,505957913,MDEyOklzc3VlQ29tbWVudDUwNTk1NzkxMw==,1217238,2019-06-26T16:52:43Z,2019-06-26T16:52:43Z,MEMBER,"+1 for saving as native variable length strings in zarr. I didn't realize that was an option when we originally wrote xarray's zarr support, but its definitely a much cleaner way to do things in most cases.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-505953845,https://api.github.com/repos/pydata/xarray/issues/2706,505953845,MDEyOklzc3VlQ29tbWVudDUwNTk1Mzg0NQ==,8643775,2019-06-26T16:41:33Z,2019-06-26T16:41:33Z,NONE,also any idea why all the AppVeyor builds are failing since yesterday? I did not change any build file in any of my commits. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-505953301,https://api.github.com/repos/pydata/xarray/issues/2706,505953301,MDEyOklzc3VlQ29tbWVudDUwNTk1MzMwMQ==,8643775,2019-06-26T16:40:02Z,2019-06-26T16:40:02Z,NONE,"Thanks @shoyer @rabernat for the detailed review. All the comments have been addresses except the removal of the `encode_utf8` function. In this last commit: https://github.com/pydata/xarray/pull/2706/commits/a6ff49436a620eb4071c0529d785370b2eb739b0, I have tried to do that. In the same commit I have also tried to take a shot at variable length strings using @shoyer's comments from here: https://github.com/pydata/xarray/issues/2724#issuecomment-458808896. Please let me know if this is acceptable or should I revert this commit or something.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-505903058,https://api.github.com/repos/pydata/xarray/issues/2706,505903058,MDEyOklzc3VlQ29tbWVudDUwNTkwMzA1OA==,1197350,2019-06-26T14:34:27Z,2019-06-26T14:34:27Z,MEMBER,Thanks @shoyer for your more careful review of this PR. As usual you pick up on all the important details.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-504477302,https://api.github.com/repos/pydata/xarray/issues/2706,504477302,MDEyOklzc3VlQ29tbWVudDUwNDQ3NzMwMg==,8643775,2019-06-21T15:56:01Z,2019-06-21T15:56:01Z,NONE,"> @shikharsg - are the issues you found in [#2706 (comment)](https://github.com/pydata/xarray/pull/2706#issuecomment-498194520) now resolved and covered by tests? @rabernat these are now resolved","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-504184558,https://api.github.com/repos/pydata/xarray/issues/2706,504184558,MDEyOklzc3VlQ29tbWVudDUwNDE4NDU1OA==,4711805,2019-06-20T21:11:57Z,2019-06-20T21:11:57Z,CONTRIBUTOR,"You're right @shikharsg, the `chunk_dim` argument can be removed. I was not very happy with the complexity it brought as well.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-504115125,https://api.github.com/repos/pydata/xarray/issues/2706,504115125,MDEyOklzc3VlQ29tbWVudDUwNDExNTEyNQ==,8643775,2019-06-20T17:33:04Z,2019-06-20T17:33:04Z,NONE,"I have fixed the compute=False for appending to zarr store, but there are two issues that remain - when appending to an existing array, I resize the array first, and the return a `dask.delayed` object which fills up the new region of the array when `compute` is called on it. So if the delayed object does not get computed for whatever reason, the new portion of the array will end up with nonsense values. For this reason I was wondering if the resize function should be in the delayed object itself so the array is not resized in advance. - the `compute=False` will not work when the `chunk_dim` argument is set, i.e. instead of lazily appending when the compute method is called on the delayed object, it will directly append to the target store when the `to_zarr` method with `mode='a'` is called. The reason is because when the `chunk_dim` argument is set, it reads the original array from memory, appends to that array in memory, and overwrites the appended array to the target store. I understand that this was done because of the concern that @davidbrochart raised about doing very frequent appends to the array(for example hourly or six hourly as happens in climate modelling), and the resulting smallness of the chunk size of the dimension being appended to. But @davidbrochart, I would almost recommend removing this `chunk_dim` argument because the concern you raised can be overcome as follows: suppose you have a Dataset as follows: ```python temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] ds = xr.Dataset({'temperature': (['x', 'y', 'time'], temp), 'precipitation': (['x', 'y', 'time'], precip)}, coords={'lon': (['x', 'y'], lon), 'lat': (['x', 'y'], lat), 'time': pd.date_range('2014-09-06', periods=3), 'reference_time': pd.Timestamp('2014-09-05')}) ``` and want to append it to very often. When calling the `to_zarr` function the first time call it like so: ```python >>> store = dict() >>> ds.to_zarr(store, encoding={'temperature': {'chunks':(100,100,100)}, 'precipitation': {'chunks':(100,100,100)}}) >>> import zarr >>> zarr.open_group(store)['temperature'].info Name : /temperature Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33903 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 >>> zarr.open_group(store)['precipitation'].info Name : /precipitation Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33906 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 ``` and then this large chunk size remains (100,100,100) (or whatever other large numbers you may want) The `chunk_dim` functionality, as it works now, is not feasible for very large arrays, since it is essentially reading the entire array into memory(and we may not have too much memory) and then overwriting the target store, because it essentially ""rechunks"" the array. Thoughts please","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-503502211,https://api.github.com/repos/pydata/xarray/issues/2706,503502211,MDEyOklzc3VlQ29tbWVudDUwMzUwMjIxMQ==,8643775,2019-06-19T10:24:47Z,2019-06-19T10:24:47Z,NONE,"Hi all, sorry for the delay. I was on break for 10 days. I have started working on it now and should be able to do this in a couple of days. A quick thing I noticed as I started working on this: currently the [zarr.Array.append](https://zarr.readthedocs.io/en/stable/api/core.html#zarr.core.Array.append) function(which handles the resize of the array) is being used to append to the array, and as of now, it's not being done asynchronously using a dask delayed object(as I pointed out in my earlier comment). So to do it asynchronously, my plan is to resize the target array, and then the delayed object would write to the appropriate region of the resized array([dask.array.store](https://docs.dask.org/en/latest/array-api.html#dask.array.store) is currently being used for this). But if for whatever reason, the delayed object does not end up being called, we would end up with nonsense values in the resized portion of the array(or whatever is the fill value). For this reason I wonder if I should put the resize in the delayed object too. Thoughts?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-502827736,https://api.github.com/repos/pydata/xarray/issues/2706,502827736,MDEyOklzc3VlQ29tbWVudDUwMjgyNzczNg==,9658781,2019-06-17T19:56:23Z,2019-06-17T19:56:23Z,CONTRIBUTOR,"I build a filter that is raising a value error as soon as any variable has a dtype different from any subclass of np.number or np.string_. I as well build test for that and added a function to manually convert dynamic sized string arrays to fixed sized ones. I as well wrote a test for @shikharsg issue and can reproduce it. The test is currently commented to not fail the pipeline as I wanted to discuss if this is a blocking issue or if we should merge it and raise a new issue for it. It seems to be originating from the fact that we moved away from using writer.add and instead are actually calling the zarr functions directly. There should be a way to change this back to do it lazily, but that will probably take time. What do you think?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-502754545,https://api.github.com/repos/pydata/xarray/issues/2706,502754545,MDEyOklzc3VlQ29tbWVudDUwMjc1NDU0NQ==,9658781,2019-06-17T16:24:46Z,2019-06-17T16:24:46Z,CONTRIBUTOR,"> @jendrikjoe - thanks for digging in and finding this important issue! > > This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose. > > * Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings) > > * Raise an error if append is called on the incompatible datatypes > > * Move forward with this PR, which is otherwise very nearly ready > > * Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr > > > How does that sound to everyone? This sounds like a plan. I will try to work on getting this ready tonight and tmrw. Let us see how far I can get.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-502700081,https://api.github.com/repos/pydata/xarray/issues/2706,502700081,MDEyOklzc3VlQ29tbWVudDUwMjcwMDA4MQ==,1197350,2019-06-17T14:13:45Z,2019-06-17T14:13:45Z,MEMBER,@shikharsg - are the issues you found in https://github.com/pydata/xarray/pull/2706#issuecomment-498194520 now resolved and covered by tests?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-502699808,https://api.github.com/repos/pydata/xarray/issues/2706,502699808,MDEyOklzc3VlQ29tbWVudDUwMjY5OTgwOA==,1197350,2019-06-17T14:12:59Z,2019-06-17T14:12:59Z,MEMBER,"@jendrikjoe - thanks for digging in and finding this important issue! This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose. - Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings) - Raise an error if append is called on the incompatible datatypes - Move forward with this PR, which is otherwise very nearly ready - Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr How does that sound to everyone? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-502481584,https://api.github.com/repos/pydata/xarray/issues/2706,502481584,MDEyOklzc3VlQ29tbWVudDUwMjQ4MTU4NA==,9658781,2019-06-16T20:05:04Z,2019-06-16T20:23:54Z,CONTRIBUTOR,"Hey there everyone, sorry for not working on this for so long from my side. I just picked it up again and realised that the way the encoding works, all the datatypes and the maximum string lengths in the first xarray have to be representative for all others. Otherwise the following cuts away every char after the second: ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds1 = xr.Dataset({'temperature': (['time'], ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') It is solvable when explicitly setting the type before writing: ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds0['temperature'] = ds0.temperature.astype(np.dtype('S5')) ds1 = xr.Dataset({'temperature': (['time'], ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') It becomes however worse when using non-ascii characters, as they get encoded in [zarr.py l:218](https://github.com/pydata/xarray/blob/442e938c2c5dcc0f192f0db2348cd679d07c16cb/xarray/backends/zarr.py#L218), but with the next chunk that is coming in the check in [conventions.py l:86](https://github.com/pydata/xarray/blob/442e938c2c5dcc0f192f0db2348cd679d07c16cb/xarray/conventions.py#L86) fails. So I think we actually have to resolve the the TODO in [zarr.py l:215](https://github.com/pydata/xarray/blob/442e938c2c5dcc0f192f0db2348cd679d07c16cb/xarray/backends/zarr.py#L215) before this is able to be merged. Otherwise, the following leads to multiple issues: ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds1 = xr.Dataset({'temperature': (['time'], ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') xr.open_zarr('temp').temperature.values The only way to work around this issue is to explicitly encode the data beforehand to utf-8: from xarray.coding.variables import safe_setitem, unpack_for_encoding from xarray.coding.strings import encode_string_array from xarray.core.variable import Variable def encode_utf8(var, string_max_length): dims, data, attrs, encoding = unpack_for_encoding(var) safe_setitem(attrs, '_Encoding', 'utf-8') data = encode_string_array(data, 'utf-8') data = data.astype(np.dtype(f""S{string_max_length*2}"")) return Variable(dims, data, attrs, encoding) ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds0['temperature'] = encode_utf8(ds0.temperature, 2) ds1 = xr.Dataset({'temperature': (['time'], ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]}) ds1['temperature'] = encode_utf8(ds1.temperature, 2) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') xr.open_zarr('temp').temperature.values Even though this is doable if it is known in advance, we should definitely mention this in the documentation or fix this by fixing the encoding itself. What do you think? Cheers, Jendrik","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498583209,https://api.github.com/repos/pydata/xarray/issues/2706,498583209,MDEyOklzc3VlQ29tbWVudDQ5ODU4MzIwOQ==,8643775,2019-06-04T08:52:28Z,2019-06-04T08:52:28Z,NONE,will do,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498227133,https://api.github.com/repos/pydata/xarray/issues/2706,498227133,MDEyOklzc3VlQ29tbWVudDQ5ODIyNzEzMw==,1197350,2019-06-03T11:58:19Z,2019-06-03T11:58:19Z,MEMBER,"Let’s make sure this new scenario is covered by tests! Sent from my iPhone > On Jun 3, 2019, at 6:40 AM, Jendrik Jördening wrote: > > Gave you the permissions @shikharsg > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498205860,https://api.github.com/repos/pydata/xarray/issues/2706,498205860,MDEyOklzc3VlQ29tbWVudDQ5ODIwNTg2MA==,9658781,2019-06-03T10:40:28Z,2019-06-03T10:40:28Z,CONTRIBUTOR,Gave you the permissions @shikharsg ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498199759,https://api.github.com/repos/pydata/xarray/issues/2706,498199759,MDEyOklzc3VlQ29tbWVudDQ5ODE5OTc1OQ==,4711805,2019-06-03T10:19:15Z,2019-06-03T10:19:15Z,CONTRIBUTOR,It would be great if you could fix it. @jendrikjoe can give you the permission to push to the branch in his fork.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498194520,https://api.github.com/repos/pydata/xarray/issues/2706,498194520,MDEyOklzc3VlQ29tbWVudDQ5ODE5NDUyMA==,8643775,2019-06-03T10:03:01Z,2019-06-03T10:03:54Z,NONE,"I think I figured out the problem. Previously the `ZarrStore` defaults to the `store` method of the parent class `AbstractWritableDataStore`. The `store` method of `AbstractWritableDataStore` uses `set_variables`(which belongs to `AbstractWritableDataStore`) which adds the ""source and target"" variables to the `ArrayWriter` class, i.e. the array is not written until the call to `_finalize_store` which happens in the `xarray.backends.api.to_zarr` function(or not in `compute=False`). Instead your PR implements another `store` function in `ZarrStore` which uses `set_variables`(belonging to `ZarrStore`) which directly writes the data to the target array, ignoring the `ArrayWriter` class. I think if you just use `ArrayWriter` in `ZarrStore.set_variables` it should fix the problem. I will be happy to push a fix, if permissions are given.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498166896,https://api.github.com/repos/pydata/xarray/issues/2706,498166896,MDEyOklzc3VlQ29tbWVudDQ5ODE2Njg5Ng==,4711805,2019-06-03T08:41:01Z,2019-06-03T08:41:01Z,CONTRIBUTOR,"Thanks @shikharsg for looking into that. This PR and master have diverged quite a bit, I will need to merge the changes, I will let you know.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498072960,https://api.github.com/repos/pydata/xarray/issues/2706,498072960,MDEyOklzc3VlQ29tbWVudDQ5ODA3Mjk2MA==,8643775,2019-06-02T23:07:41Z,2019-06-02T23:07:41Z,NONE,"Hi all, not sure if I could be doing something wrong myself, but the below might be a bug in this PR. I checked out the `jendrik_joe:append_zarr` branch. When calling `to_zarr` on a `Dataset` with the `compute=False` argument, the data array should not be populated in the target `zarr` store right? But it seems to be doing just that(I though only metadata should be initialised). ```python >>> import zarr >>> import xarray >>> ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) >>> store = dict() >>> ds = ds.chunk(dict(dim1=1, dim2=1)) >>> ds Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array >>> store = dict() >>> ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-72c3ec95-6ef1-473a-a0de-e81c98eb9576') >>> xarray.open_zarr(store)['arr1'].values array([[10, 11], [12, 13]]) ``` Instead this does not happen in the xarray master branch: ```python >>> import zarr >>> import xarray >>> ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) >>> ds = ds.chunk(dict(dim1=1, dim2=1)) >>> ds Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array >>> store = dict() >>> ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-965b0a89-bff6-4adc-8bab-e47a77e762f3') >>> xarray.open_zarr(store)['arr1'].values array([[4611686018427387904, 4611686018427387904], [4611686018427387904, 4611686018427387904]]) ``` Python: 3.6.7 ```python >>> xarray.__version__ '0.12.1+45.g519b3986' >>> zarr.__version__ '2.2.1.dev185' >>> ``` OS: ``` Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic ``` I am also trying to figure out why this is happening(assuming this is a bug) and will post updates soon. @davidbrochart ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-494340006,https://api.github.com/repos/pydata/xarray/issues/2706,494340006,MDEyOklzc3VlQ29tbWVudDQ5NDM0MDAwNg==,4711805,2019-05-21T10:46:20Z,2019-05-21T10:46:20Z,CONTRIBUTOR,"No problem @rabernat, and thanks a lot for your time in reviewing this PR. I added a test for `chunk_dim`. Please let me know if this is clearer, and if I should explain further in the docs.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-494062757,https://api.github.com/repos/pydata/xarray/issues/2706,494062757,MDEyOklzc3VlQ29tbWVudDQ5NDA2Mjc1Nw==,1197350,2019-05-20T16:38:24Z,2019-05-20T16:38:24Z,MEMBER,"Hi @davidbrochart. I'm really sorry it takes me so long between reviews of your PR. It is very important work, and I appreciate your continued patience. I looked at your new code, and I noticed that `chunk_dim` does not appear in the tests. I think it is important to test this parameter and verify that it works as expected. (This would also help me understand how it works, since it's not totally clear from the docs.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-485340727,https://api.github.com/repos/pydata/xarray/issues/2706,485340727,MDEyOklzc3VlQ29tbWVudDQ4NTM0MDcyNw==,4711805,2019-04-22T06:38:04Z,2019-04-22T06:38:04Z,CONTRIBUTOR,I added a `chunk_dim` parameter which allows to rechunk the appended coordinate. I think it is ready for a final review now.,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-484551274,https://api.github.com/repos/pydata/xarray/issues/2706,484551274,MDEyOklzc3VlQ29tbWVudDQ4NDU1MTI3NA==,4711805,2019-04-18T15:14:24Z,2019-04-18T15:14:24Z,CONTRIBUTOR,"I don't think it's ready yet. I think I should address the chunking issue of the appended dimension, as explained in https://medium.com/pangeo/continuously-extending-zarr-datasets-c54fbad3967d. For instance if we append along a time dimension, the time coordinate (which is a 1-D array) will have very small chunks, instead of maybe only one.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-484542216,https://api.github.com/repos/pydata/xarray/issues/2706,484542216,MDEyOklzc3VlQ29tbWVudDQ4NDU0MjIxNg==,1197350,2019-04-18T14:49:54Z,2019-04-18T14:49:54Z,MEMBER,Where do we stand on this PR? @davidbrochart - do you feel this is ready for a final review? Or do you want advice or feedback on anything?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-479932785,https://api.github.com/repos/pydata/xarray/issues/2706,479932785,MDEyOklzc3VlQ29tbWVudDQ3OTkzMjc4NQ==,4711805,2019-04-04T14:57:45Z,2019-04-04T14:57:45Z,CONTRIBUTOR,"@rabernat you're right, I took your suggestion into account in my last commit. I also rewrote the test.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-479802142,https://api.github.com/repos/pydata/xarray/issues/2706,479802142,MDEyOklzc3VlQ29tbWVudDQ3OTgwMjE0Mg==,9658781,2019-04-04T08:28:56Z,2019-04-04T08:28:56Z,CONTRIBUTOR,"Nice :+1: On Apr 4, 2019 21:24, David Brochart wrote: Thanks @jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-479800472,https://api.github.com/repos/pydata/xarray/issues/2706,479800472,MDEyOklzc3VlQ29tbWVudDQ3OTgwMDQ3Mg==,4711805,2019-04-04T08:24:01Z,2019-04-04T08:24:01Z,CONTRIBUTOR,"Thanks @jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-479798342,https://api.github.com/repos/pydata/xarray/issues/2706,479798342,MDEyOklzc3VlQ29tbWVudDQ3OTc5ODM0Mg==,9658781,2019-04-04T08:17:43Z,2019-04-04T08:17:43Z,CONTRIBUTOR,I added you to the fork :) But feel free to do whatever is easiest for you :) ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-479788744,https://api.github.com/repos/pydata/xarray/issues/2706,479788744,MDEyOklzc3VlQ29tbWVudDQ3OTc4ODc0NA==,4711805,2019-04-04T07:46:52Z,2019-04-04T07:46:52Z,CONTRIBUTOR,Or should I open a new PR?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-479480023,https://api.github.com/repos/pydata/xarray/issues/2706,479480023,MDEyOklzc3VlQ29tbWVudDQ3OTQ4MDAyMw==,4711805,2019-04-03T13:04:24Z,2019-04-03T13:04:24Z,CONTRIBUTOR,@jendrikjoe I think you need to give me the permission to push to the branch in your fork.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-478399527,https://api.github.com/repos/pydata/xarray/issues/2706,478399527,MDEyOklzc3VlQ29tbWVudDQ3ODM5OTUyNw==,9658781,2019-04-01T00:19:11Z,2019-04-01T00:19:11Z,CONTRIBUTOR,"Sure everyone feel welcome to join in! Sorry for the long silence. Kind of a busy time right now 😉 On Apr 1, 2019 08:47, Ryan Abernathey wrote: @davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoe would not mind if we make it a team effort! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-478374296,https://api.github.com/repos/pydata/xarray/issues/2706,478374296,MDEyOklzc3VlQ29tbWVudDQ3ODM3NDI5Ng==,1197350,2019-03-31T19:47:49Z,2019-03-31T19:47:49Z,MEMBER,@davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoe would not mind if we make it a team effort!,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-478365516,https://api.github.com/repos/pydata/xarray/issues/2706,478365516,MDEyOklzc3VlQ29tbWVudDQ3ODM2NTUxNg==,4711805,2019-03-31T18:22:53Z,2019-03-31T18:22:53Z,CONTRIBUTOR,May I try and take this work over?,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-475558223,https://api.github.com/repos/pydata/xarray/issues/2706,475558223,MDEyOklzc3VlQ29tbWVudDQ3NTU1ODIyMw==,4711805,2019-03-22T09:53:43Z,2019-03-22T09:53:43Z,CONTRIBUTOR,"Hi @jendrikjoe, do you plan to work on this PR again in the future? I think it would be a great contribution to xarray.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-459813678,https://api.github.com/repos/pydata/xarray/issues/2706,459813678,MDEyOklzc3VlQ29tbWVudDQ1OTgxMzY3OA==,1197350,2019-02-01T18:07:26Z,2019-02-01T18:07:26Z,MEMBER,"> We should definitely always make sure that we write data consistently (e.g., for dates), but checking for alignment of all coordinates could be expensive/slow. This implies we should be checking for attributes compatibility before calling `zarr.append`. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-459667067,https://api.github.com/repos/pydata/xarray/issues/2706,459667067,MDEyOklzc3VlQ29tbWVudDQ1OTY2NzA2Nw==,4711805,2019-02-01T09:51:55Z,2019-02-01T09:51:55Z,CONTRIBUTOR,"When we use this feature e.g. to store data that is produced every day, we might start with a data set that has a small size on the time dimension, and thus the chunks will be chosen according to this initial shape. When we append to this data set, will the chunks be kept as in the initial zarr archive? If so, we might end up with a lot of small chunks on the time dimension, where ideally we would have chosen only one chunk.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-459637253,https://api.github.com/repos/pydata/xarray/issues/2706,459637253,MDEyOklzc3VlQ29tbWVudDQ1OTYzNzI1Mw==,1217238,2019-02-01T07:54:30Z,2019-02-01T07:54:30Z,MEMBER,"We should definitely always make sure that we write data consistently (e.g., for dates), but checking for alignment of all coordinates could be expensive/slow. Potentially a keyword argument `ignore_alignment=True` would be a good way for user to opt-out of checking index coordinates for consistency.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-459177873,https://api.github.com/repos/pydata/xarray/issues/2706,459177873,MDEyOklzc3VlQ29tbWVudDQ1OTE3Nzg3Mw==,1197350,2019-01-31T01:24:11Z,2019-01-31T01:24:11Z,MEMBER,"So the problem in @davidbrochart's example is that there are different encodings on the time variables in the two datasets. When writing datetimes, xarray automatically picks an encoding (i.e. `days since 2000-01-01 00:00:00`) based on some heuristics. When serializing the dataset, this encoding is used to encode the `datetime64[ns]` dtype into a different dtype, and the encoding is placed in the attributes of the store. When you open the dataset, the encoding is automatically decoded according to CF conventions. This can be disabled by using `decode_cf=False` or `decode_times=False` when you open the dataset. In this case, xarray's heuristics are picking different encodings for the two dates. You could make this example work by manually specifying encoding on the appended dataset to be the same as the original. This example illustrates the need for some sort of compatibility checks between the target dataset and the appended dataset. For example, checking for attribute compatibility would have caught this error. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-458896024,https://api.github.com/repos/pydata/xarray/issues/2706,458896024,MDEyOklzc3VlQ29tbWVudDQ1ODg5NjAyNA==,9658781,2019-01-30T10:37:56Z,2019-01-30T10:37:56Z,CONTRIBUTOR,I will check as well how xarry stores times to check if we have to add the offset to the xarray first or if this can be resolved with a PR to zarr :) ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-458866421,https://api.github.com/repos/pydata/xarray/issues/2706,458866421,MDEyOklzc3VlQ29tbWVudDQ1ODg2NjQyMQ==,4711805,2019-01-30T09:05:53Z,2019-01-30T09:05:53Z,CONTRIBUTOR,"zarr stores the reference in the `.zattrs` file: ``` { ""_ARRAY_DIMENSIONS"": [ ""time"" ], ""calendar"": ""proleptic_gregorian"", ""units"": ""days since 2000-01-01 00:00:00"" } ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-458736067,https://api.github.com/repos/pydata/xarray/issues/2706,458736067,MDEyOklzc3VlQ29tbWVudDQ1ODczNjA2Nw==,9658781,2019-01-29T22:39:00Z,2019-01-29T22:39:00Z,CONTRIBUTOR,"Hey @davidbrochart, thanks for all your input and as well for the resarch on how zarr stores the data. I would actually claim that the calculation of the accurate relative time should be handled by the zarr append function. An exception would be of course if xarray is storing the data with deltas to a reference as well? Then I would try collecting the minimum and offsetting the input by this. @rabernat can you provide input on that? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-458730247,https://api.github.com/repos/pydata/xarray/issues/2706,458730247,MDEyOklzc3VlQ29tbWVudDQ1ODczMDI0Nw==,4711805,2019-01-29T22:19:07Z,2019-01-29T22:19:07Z,CONTRIBUTOR,"To make it work, time dimensions would have to be treated separately because zarr doesn't encode absolute time values but deltas relative to a reference (see https://github.com/davidbrochart/pangeo_upload/blob/master/py/trmm2pangeo.py#L108).","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-458720920,https://api.github.com/repos/pydata/xarray/issues/2706,458720920,MDEyOklzc3VlQ29tbWVudDQ1ODcyMDkyMA==,4711805,2019-01-29T21:50:14Z,2019-01-29T21:50:14Z,CONTRIBUTOR,"Hi @jendrikjoe, Thanks for your PR, I am very interested in it because this is something I was hacking around (see [here](https://github.com/davidbrochart/pangeo_upload/blob/master/py/trmm2pangeo.py)). In my particular case, I want to append along a time dimension, but it looks like your PR currently doesn't support it. In the following example `ds2` should have a time dimension ranging from 2000-01-01 to 2000-01-06: ```python import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': pd.date_range('2000-01-04', periods=3)}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') ds2 = xr.open_zarr('temp') ``` But it's not the case: ``` ds2.time array(['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000', '2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-03 ``` Maybe it's not intended to work with time dimensions yet?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-458694955,https://api.github.com/repos/pydata/xarray/issues/2706,458694955,MDEyOklzc3VlQ29tbWVudDQ1ODY5NDk1NQ==,9658781,2019-01-29T20:29:05Z,2019-01-29T20:31:59Z,CONTRIBUTOR,"You are definitely right, that there are no checks regarding the alignment. However, if another shape than the append_dim does not align zarr will raise an error. If the coordinate differs that could be definitely an issue. I did not think about that as I am dumping reshaped dask.dataframe partitions with the append mode. Therefore, I am anyway not allowed to have a name twice. Might be interesting for other users indeed. Similar point for the attributes. I could try figuring that out as well, but that might take a while. The place where the ValueError is raised should allow to add other variables, as those are added in the KeyError exception above :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-458692011,https://api.github.com/repos/pydata/xarray/issues/2706,458692011,MDEyOklzc3VlQ29tbWVudDQ1ODY5MjAxMQ==,1197350,2019-01-29T20:19:58Z,2019-01-29T20:19:58Z,MEMBER,"Ok, with the example, I can see a bit better how this works. Here is my main concern: there doesn't appear to be any alignment checking between the target dataset and the new data. The only check that happens is whether a variable with the same name already exists in the target store, if so, append is used (rather than creating a new array). What if the coordinates differ? What if the attributes differ? I'm not sure this is a deal-breaker. But we should be very clear about this in the docs.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-457827734,https://api.github.com/repos/pydata/xarray/issues/2706,457827734,MDEyOklzc3VlQ29tbWVudDQ1NzgyNzczNA==,9658781,2019-01-26T12:35:28Z,2019-01-26T12:35:28Z,CONTRIBUTOR,"Hi @rabernat, happy to help! I love using xarray. I added the test for the append mode. One is making sure, that it behaves like the 'w' mode, if no data exist at the target path. The other one is testing what you described. The append_dim argument is actually the same as the dim argument for concat. Hope that helps clarifying my code :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-457741867,https://api.github.com/repos/pydata/xarray/issues/2706,457741867,MDEyOklzc3VlQ29tbWVudDQ1Nzc0MTg2Nw==,1197350,2019-01-25T21:49:26Z,2019-01-25T21:49:26Z,MEMBER,"Hi @jendrikjoe -- thanks for submitting a PR to address one of the most important issues in xarray (IMHO)! I am very excited about your contribution and am looking forward to getting this feature merged. I have many questions about how this works. I think the best way to move forward is to wait until we have a test for the append feature which involves the following steps: - Write a dataset to a zarr store - Open the store in append mode - Append data along a particular dimension Seeing the code that accomplishes this will help clarify for me what is happening. Thanks again for your contribution, and welcome to xarray!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148