html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/pull/2706#issuecomment-457368077,https://api.github.com/repos/pydata/xarray/issues/2706,457368077,MDEyOklzc3VlQ29tbWVudDQ1NzM2ODA3Nw==,24736507,2019-01-24T21:43:09Z,2019-06-29T22:49:41Z,NONE,"Hello @jendrikjoe! Thanks for updating this PR. We checked the lines you've touched for [PEP 8](https://www.python.org/dev/peps/pep-0008) issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers: ##### Comment last updated at 2019-06-29 22:49:41 UTC","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506958854,https://api.github.com/repos/pydata/xarray/issues/2706,506958854,MDEyOklzc3VlQ29tbWVudDUwNjk1ODg1NA==,8643775,2019-06-29T13:57:31Z,2019-06-29T13:57:31Z,NONE,"I have implemented all the changes suggested and refactored the append tests as all tests were previously crammed into `test_write_persistence_modes` I'm not sure why the build fails. In all the failed checks, it is these two tests that are failing: ``` ================================== FAILURES =================================== _________________________ test_rolling_properties[1] __________________________ da = array([[[0.561926, 0.243845, 0.601879, 0.733398], [0.500418, 0.84942...ordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-21 Dimensions without coordinates: a, x def test_rolling_properties(da): rolling_obj = da.rolling(time=4) assert rolling_obj.obj.get_axis_num('time') == 1 # catching invalid args with pytest.raises(ValueError) as exception: da.rolling(time=7, x=2) > assert 'exactly one dim/window should' in str(exception) E AssertionError: assert 'exactly one dim/window should' in '' E + where '' = str() xarray\tests\test_dataarray.py:3715: AssertionError _________________________ test_rolling_properties[1] __________________________ ds = Dimensions: (time: 10, x: 8, y: 2) Coordinates: * x (x) float64 0.0 0.1429 0.2857 0.4286 0....-1.152 -0.6704 ... -0.9796 -1.884 0.4049 z2 (time, y) float64 -1.218 -0.9627 -1.398 ... -0.3552 0.1446 0.3392 def test_rolling_properties(ds): # catching invalid args with pytest.raises(ValueError) as exception: ds.rolling(time=7, x=2) > assert 'exactly one dim/window should' in str(exception) E AssertionError: assert 'exactly one dim/window should' in '' E + where '' = str() xarray\tests\test_dataset.py:4845: AssertionError ============================== warnings summary =============================== ``` I have no idea why as the same two tests pass on my local machine","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506418064,https://api.github.com/repos/pydata/xarray/issues/2706,506418064,MDEyOklzc3VlQ29tbWVudDUwNjQxODA2NA==,8643775,2019-06-27T16:27:58Z,2019-06-27T16:27:58Z,NONE,it's done. I fixed it by opening the zarr dataset beforehand using `xr.open_zarr`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506403594,https://api.github.com/repos/pydata/xarray/issues/2706,506403594,MDEyOklzc3VlQ29tbWVudDUwNjQwMzU5NA==,8643775,2019-06-27T15:50:36Z,2019-06-27T15:50:36Z,NONE,"> > adding a new variable currently errors if we don't provide the `append_dim` argument: > > Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code. @rabernat , the scenario I am talking about is adding a new `DataArray` to an existing `Dataset`(in which case we do not have to specify an `append_dim` argument). Yes it is covered by tests, specifically see the `with` clause here: https://github.com/pydata/xarray/pull/2706/files#diff-df47fcb9c2f1f7dfc0c6032d97072af2R1636 > Just to be clear, we do always requiring writing `append_dim` if you want to append values along a dimension, right? And we raise an informative error if you write `append_dim='not-a-valid-dimension'`? @shoyer We do always require `append_dim` when appending to an existing array, but I just realized that it does not raise an error when `append_dim='not-valid'`, but silently fails to append to the existing array. Let me write a test for that and push","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506363156,https://api.github.com/repos/pydata/xarray/issues/2706,506363156,MDEyOklzc3VlQ29tbWVudDUwNjM2MzE1Ng==,8643775,2019-06-27T14:11:55Z,2019-06-27T14:11:55Z,NONE,"I have fixed the above error now and all comments have now been addressed. @rabernat @shoyer ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-506288004,https://api.github.com/repos/pydata/xarray/issues/2706,506288004,MDEyOklzc3VlQ29tbWVudDUwNjI4ODAwNA==,8643775,2019-06-27T10:21:53Z,2019-06-27T10:21:53Z,NONE,"adding a new variable currently errors if we don't provide the `append_dim` argument: ```python >>> import xarray as xr >>> import pandas as pd >>> ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> store = dict() >>> ds0.to_zarr(store, mode='w') >>> ds1.to_zarr(store, mode='a') Traceback (most recent call last): File """", line 1, in File ""/home/shikhar/code/xarray/xarray/core/dataset.py"", line 1374, in to_zarr consolidated=consolidated, append_dim=append_dim) File ""/home/shikhar/code/xarray/xarray/backends/api.py"", line 1071, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File ""/home/shikhar/code/xarray/xarray/backends/api.py"", line 928, in dump_to_store unlimited_dims=unlimited_dims) File ""/home/shikhar/code/xarray/xarray/backends/zarr.py"", line 366, in store unlimited_dims=unlimited_dims) File ""/home/shikhar/code/xarray/xarray/backends/zarr.py"", line 406, in set_variables ""was not set"".format(name) ValueError: variable 'time' already exists, but append_dim was not set ``` this works: ```python >>> import xarray as xr >>> import pandas as pd >>> ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) >>> store = dict() >>> ds0.to_zarr(store, mode='w') >>> ds1.to_zarr(store, mode='a', append_dim='asdfasdf') >>> xr.open_zarr(store) Dimensions: (time: 3) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: pressure (time) int64 dask.array temperature (time) int64 dask.array ``` will push a fix for this in a bit","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-505953845,https://api.github.com/repos/pydata/xarray/issues/2706,505953845,MDEyOklzc3VlQ29tbWVudDUwNTk1Mzg0NQ==,8643775,2019-06-26T16:41:33Z,2019-06-26T16:41:33Z,NONE,also any idea why all the AppVeyor builds are failing since yesterday? I did not change any build file in any of my commits. ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-505953301,https://api.github.com/repos/pydata/xarray/issues/2706,505953301,MDEyOklzc3VlQ29tbWVudDUwNTk1MzMwMQ==,8643775,2019-06-26T16:40:02Z,2019-06-26T16:40:02Z,NONE,"Thanks @shoyer @rabernat for the detailed review. All the comments have been addresses except the removal of the `encode_utf8` function. In this last commit: https://github.com/pydata/xarray/pull/2706/commits/a6ff49436a620eb4071c0529d785370b2eb739b0, I have tried to do that. In the same commit I have also tried to take a shot at variable length strings using @shoyer's comments from here: https://github.com/pydata/xarray/issues/2724#issuecomment-458808896. Please let me know if this is acceptable or should I revert this commit or something.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-504477302,https://api.github.com/repos/pydata/xarray/issues/2706,504477302,MDEyOklzc3VlQ29tbWVudDUwNDQ3NzMwMg==,8643775,2019-06-21T15:56:01Z,2019-06-21T15:56:01Z,NONE,"> @shikharsg - are the issues you found in [#2706 (comment)](https://github.com/pydata/xarray/pull/2706#issuecomment-498194520) now resolved and covered by tests? @rabernat these are now resolved","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-504115125,https://api.github.com/repos/pydata/xarray/issues/2706,504115125,MDEyOklzc3VlQ29tbWVudDUwNDExNTEyNQ==,8643775,2019-06-20T17:33:04Z,2019-06-20T17:33:04Z,NONE,"I have fixed the compute=False for appending to zarr store, but there are two issues that remain - when appending to an existing array, I resize the array first, and the return a `dask.delayed` object which fills up the new region of the array when `compute` is called on it. So if the delayed object does not get computed for whatever reason, the new portion of the array will end up with nonsense values. For this reason I was wondering if the resize function should be in the delayed object itself so the array is not resized in advance. - the `compute=False` will not work when the `chunk_dim` argument is set, i.e. instead of lazily appending when the compute method is called on the delayed object, it will directly append to the target store when the `to_zarr` method with `mode='a'` is called. The reason is because when the `chunk_dim` argument is set, it reads the original array from memory, appends to that array in memory, and overwrites the appended array to the target store. I understand that this was done because of the concern that @davidbrochart raised about doing very frequent appends to the array(for example hourly or six hourly as happens in climate modelling), and the resulting smallness of the chunk size of the dimension being appended to. But @davidbrochart, I would almost recommend removing this `chunk_dim` argument because the concern you raised can be overcome as follows: suppose you have a Dataset as follows: ```python temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] ds = xr.Dataset({'temperature': (['x', 'y', 'time'], temp), 'precipitation': (['x', 'y', 'time'], precip)}, coords={'lon': (['x', 'y'], lon), 'lat': (['x', 'y'], lat), 'time': pd.date_range('2014-09-06', periods=3), 'reference_time': pd.Timestamp('2014-09-05')}) ``` and want to append it to very often. When calling the `to_zarr` function the first time call it like so: ```python >>> store = dict() >>> ds.to_zarr(store, encoding={'temperature': {'chunks':(100,100,100)}, 'precipitation': {'chunks':(100,100,100)}}) >>> import zarr >>> zarr.open_group(store)['temperature'].info Name : /temperature Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33903 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 >>> zarr.open_group(store)['precipitation'].info Name : /precipitation Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33906 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 ``` and then this large chunk size remains (100,100,100) (or whatever other large numbers you may want) The `chunk_dim` functionality, as it works now, is not feasible for very large arrays, since it is essentially reading the entire array into memory(and we may not have too much memory) and then overwriting the target store, because it essentially ""rechunks"" the array. Thoughts please","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-503502211,https://api.github.com/repos/pydata/xarray/issues/2706,503502211,MDEyOklzc3VlQ29tbWVudDUwMzUwMjIxMQ==,8643775,2019-06-19T10:24:47Z,2019-06-19T10:24:47Z,NONE,"Hi all, sorry for the delay. I was on break for 10 days. I have started working on it now and should be able to do this in a couple of days. A quick thing I noticed as I started working on this: currently the [zarr.Array.append](https://zarr.readthedocs.io/en/stable/api/core.html#zarr.core.Array.append) function(which handles the resize of the array) is being used to append to the array, and as of now, it's not being done asynchronously using a dask delayed object(as I pointed out in my earlier comment). So to do it asynchronously, my plan is to resize the target array, and then the delayed object would write to the appropriate region of the resized array([dask.array.store](https://docs.dask.org/en/latest/array-api.html#dask.array.store) is currently being used for this). But if for whatever reason, the delayed object does not end up being called, we would end up with nonsense values in the resized portion of the array(or whatever is the fill value). For this reason I wonder if I should put the resize in the delayed object too. Thoughts?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498583209,https://api.github.com/repos/pydata/xarray/issues/2706,498583209,MDEyOklzc3VlQ29tbWVudDQ5ODU4MzIwOQ==,8643775,2019-06-04T08:52:28Z,2019-06-04T08:52:28Z,NONE,will do,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498194520,https://api.github.com/repos/pydata/xarray/issues/2706,498194520,MDEyOklzc3VlQ29tbWVudDQ5ODE5NDUyMA==,8643775,2019-06-03T10:03:01Z,2019-06-03T10:03:54Z,NONE,"I think I figured out the problem. Previously the `ZarrStore` defaults to the `store` method of the parent class `AbstractWritableDataStore`. The `store` method of `AbstractWritableDataStore` uses `set_variables`(which belongs to `AbstractWritableDataStore`) which adds the ""source and target"" variables to the `ArrayWriter` class, i.e. the array is not written until the call to `_finalize_store` which happens in the `xarray.backends.api.to_zarr` function(or not in `compute=False`). Instead your PR implements another `store` function in `ZarrStore` which uses `set_variables`(belonging to `ZarrStore`) which directly writes the data to the target array, ignoring the `ArrayWriter` class. I think if you just use `ArrayWriter` in `ZarrStore.set_variables` it should fix the problem. I will be happy to push a fix, if permissions are given.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148 https://github.com/pydata/xarray/pull/2706#issuecomment-498072960,https://api.github.com/repos/pydata/xarray/issues/2706,498072960,MDEyOklzc3VlQ29tbWVudDQ5ODA3Mjk2MA==,8643775,2019-06-02T23:07:41Z,2019-06-02T23:07:41Z,NONE,"Hi all, not sure if I could be doing something wrong myself, but the below might be a bug in this PR. I checked out the `jendrik_joe:append_zarr` branch. When calling `to_zarr` on a `Dataset` with the `compute=False` argument, the data array should not be populated in the target `zarr` store right? But it seems to be doing just that(I though only metadata should be initialised). ```python >>> import zarr >>> import xarray >>> ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) >>> store = dict() >>> ds = ds.chunk(dict(dim1=1, dim2=1)) >>> ds Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array >>> store = dict() >>> ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-72c3ec95-6ef1-473a-a0de-e81c98eb9576') >>> xarray.open_zarr(store)['arr1'].values array([[10, 11], [12, 13]]) ``` Instead this does not happen in the xarray master branch: ```python >>> import zarr >>> import xarray >>> ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) >>> ds = ds.chunk(dict(dim1=1, dim2=1)) >>> ds Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array >>> store = dict() >>> ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-965b0a89-bff6-4adc-8bab-e47a77e762f3') >>> xarray.open_zarr(store)['arr1'].values array([[4611686018427387904, 4611686018427387904], [4611686018427387904, 4611686018427387904]]) ``` Python: 3.6.7 ```python >>> xarray.__version__ '0.12.1+45.g519b3986' >>> zarr.__version__ '2.2.1.dev185' >>> ``` OS: ``` Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic ``` I am also trying to figure out why this is happening(assuming this is a bug) and will post updates soon. @davidbrochart ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148