home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

59 rows where issue = 402908148 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 6

  • davidbrochart 16
  • shikharsg 13
  • rabernat 12
  • jendrikjoe 11
  • shoyer 6
  • pep8speaks 1

author_association 3

  • CONTRIBUTOR 27
  • MEMBER 18
  • NONE 14

issue 1

  • Appending to zarr store · 59 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
506994997 https://github.com/pydata/xarray/pull/2706#issuecomment-506994997 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjk5NDk5Nw== shoyer 1217238 2019-06-29T23:43:50Z 2019-06-29T23:43:50Z MEMBER

OK, thank you @shikharsg, @jendrikjoe and everyone else who worked on this !

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
457368077 https://github.com/pydata/xarray/pull/2706#issuecomment-457368077 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1NzM2ODA3Nw== pep8speaks 24736507 2019-01-24T21:43:09Z 2019-06-29T22:49:41Z NONE

Hello @jendrikjoe! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2019-06-29 22:49:41 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506984973 https://github.com/pydata/xarray/pull/2706#issuecomment-506984973 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjk4NDk3Mw== shoyer 1217238 2019-06-29T20:27:10Z 2019-06-29T20:27:10Z MEMBER

@shikharsg the test failure should be fixed on master (by https://github.com/pydata/xarray/pull/3059).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506958854 https://github.com/pydata/xarray/pull/2706#issuecomment-506958854 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjk1ODg1NA== shikharsg 8643775 2019-06-29T13:57:31Z 2019-06-29T13:57:31Z NONE

I have implemented all the changes suggested and refactored the append tests as all tests were previously crammed into test_write_persistence_modes

I'm not sure why the build fails. In all the failed checks, it is these two tests that are failing: ``` ================================== FAILURES =================================== ____ testrolling_properties[1] ______

da = <xarray.DataArray (a: 3, time: 21, x: 4)> array([[[0.561926, 0.243845, 0.601879, 0.733398], [0.500418, 0.84942...ordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-21 Dimensions without coordinates: a, x

def test_rolling_properties(da):
    rolling_obj = da.rolling(time=4)

    assert rolling_obj.obj.get_axis_num('time') == 1

    # catching invalid args
    with pytest.raises(ValueError) as exception:
        da.rolling(time=7, x=2)
  assert 'exactly one dim/window should' in str(exception)

E AssertionError: assert 'exactly one dim/window should' in '<ExceptionInfo ValueError tblen=4>' E + where '<ExceptionInfo ValueError tblen=4>' = str(<ExceptionInfo ValueError tblen=4>)

xarray\tests\test_dataarray.py:3715: AssertionError ____ testrolling_properties[1] ______

ds = <xarray.Dataset> Dimensions: (time: 10, x: 8, y: 2) Coordinates: * x (x) float64 0.0 0.1429 0.2857 0.4286 0....-1.152 -0.6704 ... -0.9796 -1.884 0.4049 z2 (time, y) float64 -1.218 -0.9627 -1.398 ... -0.3552 0.1446 0.3392

def test_rolling_properties(ds):
    # catching invalid args
    with pytest.raises(ValueError) as exception:
        ds.rolling(time=7, x=2)
  assert 'exactly one dim/window should' in str(exception)

E AssertionError: assert 'exactly one dim/window should' in '<ExceptionInfo ValueError tblen=4>' E + where '<ExceptionInfo ValueError tblen=4>' = str(<ExceptionInfo ValueError tblen=4>)

xarray\tests\test_dataset.py:4845: AssertionError ============================== warnings summary =============================== ```

I have no idea why as the same two tests pass on my local machine

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506418064 https://github.com/pydata/xarray/pull/2706#issuecomment-506418064 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjQxODA2NA== shikharsg 8643775 2019-06-27T16:27:58Z 2019-06-27T16:27:58Z NONE

it's done. I fixed it by opening the zarr dataset beforehand using xr.open_zarr

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506403594 https://github.com/pydata/xarray/pull/2706#issuecomment-506403594 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjQwMzU5NA== shikharsg 8643775 2019-06-27T15:50:36Z 2019-06-27T15:50:36Z NONE

adding a new variable currently errors if we don't provide the append_dim argument:

Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code.

@rabernat , the scenario I am talking about is adding a new DataArray to an existing Dataset(in which case we do not have to specify an append_dim argument). Yes it is covered by tests, specifically see the with clause here: https://github.com/pydata/xarray/pull/2706/files#diff-df47fcb9c2f1f7dfc0c6032d97072af2R1636

Just to be clear, we do always requiring writing append_dim if you want to append values along a dimension, right? And we raise an informative error if you write append_dim='not-a-valid-dimension'?

@shoyer We do always require append_dim when appending to an existing array, but I just realized that it does not raise an error when append_dim='not-valid', but silently fails to append to the existing array. Let me write a test for that and push

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506390414 https://github.com/pydata/xarray/pull/2706#issuecomment-506390414 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjM5MDQxNA== shoyer 1217238 2019-06-27T15:18:00Z 2019-06-27T15:18:00Z MEMBER

Just to be clear, we do always requiring writing append_dim if you want to append values along a dimension, right? And we raise an informative error if you write append_dim='not-a-valid-dimension'?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506383042 https://github.com/pydata/xarray/pull/2706#issuecomment-506383042 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjM4MzA0Mg== rabernat 1197350 2019-06-27T14:59:52Z 2019-06-27T14:59:52Z MEMBER

adding a new variable currently errors if we don't provide the append_dim argument:

Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506363156 https://github.com/pydata/xarray/pull/2706#issuecomment-506363156 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjM2MzE1Ng== shikharsg 8643775 2019-06-27T14:11:55Z 2019-06-27T14:11:55Z NONE

I have fixed the above error now and all comments have now been addressed.

@rabernat @shoyer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506288004 https://github.com/pydata/xarray/pull/2706#issuecomment-506288004 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjI4ODAwNA== shikharsg 8643775 2019-06-27T10:21:53Z 2019-06-27T10:21:53Z NONE

adding a new variable currently errors if we don't provide the append_dim argument: ```python

import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) store = dict() ds0.to_zarr(store, mode='w') <xarray.backends.zarr.ZarrStore object at 0x7fae926505c0> ds1.to_zarr(store, mode='a') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/shikhar/code/xarray/xarray/core/dataset.py", line 1374, in to_zarr consolidated=consolidated, append_dim=append_dim) File "/home/shikhar/code/xarray/xarray/backends/api.py", line 1071, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/shikhar/code/xarray/xarray/backends/api.py", line 928, in dump_to_store unlimited_dims=unlimited_dims) File "/home/shikhar/code/xarray/xarray/backends/zarr.py", line 366, in store unlimited_dims=unlimited_dims) File "/home/shikhar/code/xarray/xarray/backends/zarr.py", line 406, in set_variables "was not set".format(name) ValueError: variable 'time' already exists, but append_dim was not set ```

this works: ```python

import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) store = dict() ds0.to_zarr(store, mode='w') <xarray.backends.zarr.ZarrStore object at 0x7fae926505c0> ds1.to_zarr(store, mode='a', append_dim='asdfasdf') xr.open_zarr(store) <xarray.Dataset> Dimensions: (time: 3) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: pressure (time) int64 dask.array<shape=(3,), chunksize=(3,)> temperature (time) int64 dask.array<shape=(3,), chunksize=(3,)> ```

will push a fix for this in a bit

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
505958006 https://github.com/pydata/xarray/pull/2706#issuecomment-505958006 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNTk1ODAwNg== shoyer 1217238 2019-06-26T16:52:59Z 2019-06-26T16:52:59Z MEMBER

The Appveyor build failures are definitely not your fault, please ignore them

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
505957913 https://github.com/pydata/xarray/pull/2706#issuecomment-505957913 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNTk1NzkxMw== shoyer 1217238 2019-06-26T16:52:43Z 2019-06-26T16:52:43Z MEMBER

+1 for saving as native variable length strings in zarr. I didn't realize that was an option when we originally wrote xarray's zarr support, but its definitely a much cleaner way to do things in most cases.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
505953845 https://github.com/pydata/xarray/pull/2706#issuecomment-505953845 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNTk1Mzg0NQ== shikharsg 8643775 2019-06-26T16:41:33Z 2019-06-26T16:41:33Z NONE

also any idea why all the AppVeyor builds are failing since yesterday? I did not change any build file in any of my commits.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
505953301 https://github.com/pydata/xarray/pull/2706#issuecomment-505953301 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNTk1MzMwMQ== shikharsg 8643775 2019-06-26T16:40:02Z 2019-06-26T16:40:02Z NONE

Thanks @shoyer @rabernat for the detailed review. All the comments have been addresses except the removal of the encode_utf8 function. In this last commit: https://github.com/pydata/xarray/pull/2706/commits/a6ff49436a620eb4071c0529d785370b2eb739b0, I have tried to do that. In the same commit I have also tried to take a shot at variable length strings using @shoyer's comments from here: https://github.com/pydata/xarray/issues/2724#issuecomment-458808896. Please let me know if this is acceptable or should I revert this commit or something.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
505903058 https://github.com/pydata/xarray/pull/2706#issuecomment-505903058 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNTkwMzA1OA== rabernat 1197350 2019-06-26T14:34:27Z 2019-06-26T14:34:27Z MEMBER

Thanks @shoyer for your more careful review of this PR. As usual you pick up on all the important details.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
504477302 https://github.com/pydata/xarray/pull/2706#issuecomment-504477302 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNDQ3NzMwMg== shikharsg 8643775 2019-06-21T15:56:01Z 2019-06-21T15:56:01Z NONE

@shikharsg - are the issues you found in #2706 (comment) now resolved and covered by tests?

@rabernat these are now resolved

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
504184558 https://github.com/pydata/xarray/pull/2706#issuecomment-504184558 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNDE4NDU1OA== davidbrochart 4711805 2019-06-20T21:11:57Z 2019-06-20T21:11:57Z CONTRIBUTOR

You're right @shikharsg, the chunk_dim argument can be removed. I was not very happy with the complexity it brought as well.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
504115125 https://github.com/pydata/xarray/pull/2706#issuecomment-504115125 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNDExNTEyNQ== shikharsg 8643775 2019-06-20T17:33:04Z 2019-06-20T17:33:04Z NONE

I have fixed the compute=False for appending to zarr store, but there are two issues that remain

  • when appending to an existing array, I resize the array first, and the return a dask.delayed object which fills up the new region of the array when compute is called on it. So if the delayed object does not get computed for whatever reason, the new portion of the array will end up with nonsense values. For this reason I was wondering if the resize function should be in the delayed object itself so the array is not resized in advance.
  • the compute=False will not work when the chunk_dim argument is set, i.e. instead of lazily appending when the compute method is called on the delayed object, it will directly append to the target store when the to_zarr method with mode='a' is called. The reason is because when the chunk_dim argument is set, it reads the original array from memory, appends to that array in memory, and overwrites the appended array to the target store. I understand that this was done because of the concern that @davidbrochart raised about doing very frequent appends to the array(for example hourly or six hourly as happens in climate modelling), and the resulting smallness of the chunk size of the dimension being appended to. But @davidbrochart, I would almost recommend removing this chunk_dim argument because the concern you raised can be overcome as follows: suppose you have a Dataset as follows: ```python temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] ds = xr.Dataset({'temperature': (['x', 'y', 'time'], temp), 'precipitation': (['x', 'y', 'time'], precip)}, coords={'lon': (['x', 'y'], lon), 'lat': (['x', 'y'], lat), 'time': pd.date_range('2014-09-06', periods=3), 'reference_time': pd.Timestamp('2014-09-05')})

and want to append it to very often. When calling the `to_zarr` function the first time call it like so:python

store = dict() ds.to_zarr(store, encoding={'temperature': {'chunks':(100,100,100)}, 'precipitation': {'chunks':(100,100,100)}}) <xarray.backends.zarr.ZarrStore object at 0x7f7027778d68> import zarr zarr.open_group(store)['temperature'].info Name : /temperature Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33903 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 zarr.open_group(store)['precipitation'].info Name : /precipitation Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33906 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 `` and then this large chunk size remains (100,100,100) (or whatever other large numbers you may want) Thechunk_dim` functionality, as it works now, is not feasible for very large arrays, since it is essentially reading the entire array into memory(and we may not have too much memory) and then overwriting the target store, because it essentially "rechunks" the array.

Thoughts please

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
503502211 https://github.com/pydata/xarray/pull/2706#issuecomment-503502211 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMzUwMjIxMQ== shikharsg 8643775 2019-06-19T10:24:47Z 2019-06-19T10:24:47Z NONE

Hi all, sorry for the delay. I was on break for 10 days. I have started working on it now and should be able to do this in a couple of days.

A quick thing I noticed as I started working on this: currently the zarr.Array.append function(which handles the resize of the array) is being used to append to the array, and as of now, it's not being done asynchronously using a dask delayed object(as I pointed out in my earlier comment). So to do it asynchronously, my plan is to resize the target array, and then the delayed object would write to the appropriate region of the resized array(dask.array.store is currently being used for this). But if for whatever reason, the delayed object does not end up being called, we would end up with nonsense values in the resized portion of the array(or whatever is the fill value). For this reason I wonder if I should put the resize in the delayed object too. Thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
502827736 https://github.com/pydata/xarray/pull/2706#issuecomment-502827736 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjgyNzczNg== jendrikjoe 9658781 2019-06-17T19:56:23Z 2019-06-17T19:56:23Z CONTRIBUTOR

I build a filter that is raising a value error as soon as any variable has a dtype different from any subclass of np.number or np.string_. I as well build test for that and added a function to manually convert dynamic sized string arrays to fixed sized ones.

I as well wrote a test for @shikharsg issue and can reproduce it. The test is currently commented to not fail the pipeline as I wanted to discuss if this is a blocking issue or if we should merge it and raise a new issue for it. It seems to be originating from the fact that we moved away from using writer.add and instead are actually calling the zarr functions directly. There should be a way to change this back to do it lazily, but that will probably take time. What do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
502754545 https://github.com/pydata/xarray/pull/2706#issuecomment-502754545 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjc1NDU0NQ== jendrikjoe 9658781 2019-06-17T16:24:46Z 2019-06-17T16:24:46Z CONTRIBUTOR

@jendrikjoe - thanks for digging in and finding this important issue!

This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose.

* Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings)

* Raise an error if append is called on the incompatible datatypes

* Move forward with this PR, which is otherwise very nearly ready

* Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr

How does that sound to everyone?

This sounds like a plan. I will try to work on getting this ready tonight and tmrw. Let us see how far I can get.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
502700081 https://github.com/pydata/xarray/pull/2706#issuecomment-502700081 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjcwMDA4MQ== rabernat 1197350 2019-06-17T14:13:45Z 2019-06-17T14:13:45Z MEMBER

@shikharsg - are the issues you found in https://github.com/pydata/xarray/pull/2706#issuecomment-498194520 now resolved and covered by tests?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
502699808 https://github.com/pydata/xarray/pull/2706#issuecomment-502699808 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjY5OTgwOA== rabernat 1197350 2019-06-17T14:12:59Z 2019-06-17T14:12:59Z MEMBER

@jendrikjoe - thanks for digging in and finding this important issue!

This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose. - Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings) - Raise an error if append is called on the incompatible datatypes - Move forward with this PR, which is otherwise very nearly ready - Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr

How does that sound to everyone?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
502481584 https://github.com/pydata/xarray/pull/2706#issuecomment-502481584 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjQ4MTU4NA== jendrikjoe 9658781 2019-06-16T20:05:04Z 2019-06-16T20:23:54Z CONTRIBUTOR

Hey there everyone, sorry for not working on this for so long from my side. I just picked it up again and realised that the way the encoding works, all the datatypes and the maximum string lengths in the first xarray have to be representative for all others. Otherwise the following cuts away every char after the second:

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds1 = xr.Dataset({'temperature': (['time'],  ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')

It is solvable when explicitly setting the type before writing:

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds0['temperature'] = ds0.temperature.astype(np.dtype('S5'))
ds1 = xr.Dataset({'temperature': (['time'],  ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')

It becomes however worse when using non-ascii characters, as they get encoded in zarr.py l:218, but with the next chunk that is coming in the check in conventions.py l:86 fails. So I think we actually have to resolve the the TODO in zarr.py l:215 before this is able to be merged. Otherwise, the following leads to multiple issues:

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds1 = xr.Dataset({'temperature': (['time'],  ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
xr.open_zarr('temp').temperature.values

The only way to work around this issue is to explicitly encode the data beforehand to utf-8:

from xarray.coding.variables import safe_setitem, unpack_for_encoding
from xarray.coding.strings import encode_string_array
from xarray.core.variable import Variable

def encode_utf8(var, string_max_length):
    dims, data, attrs, encoding = unpack_for_encoding(var)
    safe_setitem(attrs, '_Encoding', 'utf-8')
    data = encode_string_array(data, 'utf-8')
    data = data.astype(np.dtype(f"S{string_max_length*2}"))
    return Variable(dims, data, attrs, encoding)

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds0['temperature'] = encode_utf8(ds0.temperature, 2)
ds1 = xr.Dataset({'temperature': (['time'],  ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]})
ds1['temperature'] = encode_utf8(ds1.temperature, 2)
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
xr.open_zarr('temp').temperature.values

Even though this is doable if it is known in advance, we should definitely mention this in the documentation or fix this by fixing the encoding itself. What do you think?

Cheers,

Jendrik

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498583209 https://github.com/pydata/xarray/pull/2706#issuecomment-498583209 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODU4MzIwOQ== shikharsg 8643775 2019-06-04T08:52:28Z 2019-06-04T08:52:28Z NONE

will do

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498227133 https://github.com/pydata/xarray/pull/2706#issuecomment-498227133 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODIyNzEzMw== rabernat 1197350 2019-06-03T11:58:19Z 2019-06-03T11:58:19Z MEMBER

Let’s make sure this new scenario is covered by tests!

Sent from my iPhone

On Jun 3, 2019, at 6:40 AM, Jendrik Jördening notifications@github.com wrote:

Gave you the permissions @shikharsg

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498205860 https://github.com/pydata/xarray/pull/2706#issuecomment-498205860 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODIwNTg2MA== jendrikjoe 9658781 2019-06-03T10:40:28Z 2019-06-03T10:40:28Z CONTRIBUTOR

Gave you the permissions @shikharsg

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498199759 https://github.com/pydata/xarray/pull/2706#issuecomment-498199759 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODE5OTc1OQ== davidbrochart 4711805 2019-06-03T10:19:15Z 2019-06-03T10:19:15Z CONTRIBUTOR

It would be great if you could fix it. @jendrikjoe can give you the permission to push to the branch in his fork.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498194520 https://github.com/pydata/xarray/pull/2706#issuecomment-498194520 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODE5NDUyMA== shikharsg 8643775 2019-06-03T10:03:01Z 2019-06-03T10:03:54Z NONE

I think I figured out the problem. Previously the ZarrStore defaults to the store method of the parent class AbstractWritableDataStore. The store method of AbstractWritableDataStore uses set_variables(which belongs to AbstractWritableDataStore) which adds the "source and target" variables to the ArrayWriter class, i.e. the array is not written until the call to _finalize_store which happens in the xarray.backends.api.to_zarr function(or not in compute=False). Instead your PR implements another store function in ZarrStore which uses set_variables(belonging to ZarrStore) which directly writes the data to the target array, ignoring the ArrayWriter class. I think if you just use ArrayWriter in ZarrStore.set_variables it should fix the problem.

I will be happy to push a fix, if permissions are given.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498166896 https://github.com/pydata/xarray/pull/2706#issuecomment-498166896 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODE2Njg5Ng== davidbrochart 4711805 2019-06-03T08:41:01Z 2019-06-03T08:41:01Z CONTRIBUTOR

Thanks @shikharsg for looking into that. This PR and master have diverged quite a bit, I will need to merge the changes, I will let you know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498072960 https://github.com/pydata/xarray/pull/2706#issuecomment-498072960 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODA3Mjk2MA== shikharsg 8643775 2019-06-02T23:07:41Z 2019-06-02T23:07:41Z NONE

Hi all, not sure if I could be doing something wrong myself, but the below might be a bug in this PR.

I checked out the jendrik_joe:append_zarr branch. When calling to_zarr on a Dataset with the compute=False argument, the data array should not be populated in the target zarr store right? But it seems to be doing just that(I though only metadata should be initialised). ```python

import zarr import xarray ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) store = dict() ds = ds.chunk(dict(dim1=1, dim2=1)) ds <xarray.Dataset> Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array<shape=(2, 2), chunksize=(1, 1)> store = dict() ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-72c3ec95-6ef1-473a-a0de-e81c98eb9576') xarray.open_zarr(store)['arr1'].values array([[10, 11], [12, 13]]) ```

Instead this does not happen in the xarray master branch: ```python

import zarr import xarray ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) ds = ds.chunk(dict(dim1=1, dim2=1)) ds <xarray.Dataset> Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array<shape=(2, 2), chunksize=(1, 1)> store = dict() ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-965b0a89-bff6-4adc-8bab-e47a77e762f3') xarray.open_zarr(store)['arr1'].values array([[4611686018427387904, 4611686018427387904], [4611686018427387904, 4611686018427387904]]) Python: 3.6.7python xarray.version '0.12.1+45.g519b3986' zarr.version '2.2.1.dev185'

OS: Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic ```

I am also trying to figure out why this is happening(assuming this is a bug) and will post updates soon.

@davidbrochart

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
494340006 https://github.com/pydata/xarray/pull/2706#issuecomment-494340006 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5NDM0MDAwNg== davidbrochart 4711805 2019-05-21T10:46:20Z 2019-05-21T10:46:20Z CONTRIBUTOR

No problem @rabernat, and thanks a lot for your time in reviewing this PR.

I added a test for chunk_dim. Please let me know if this is clearer, and if I should explain further in the docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
494062757 https://github.com/pydata/xarray/pull/2706#issuecomment-494062757 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5NDA2Mjc1Nw== rabernat 1197350 2019-05-20T16:38:24Z 2019-05-20T16:38:24Z MEMBER

Hi @davidbrochart. I'm really sorry it takes me so long between reviews of your PR. It is very important work, and I appreciate your continued patience.

I looked at your new code, and I noticed that chunk_dim does not appear in the tests. I think it is important to test this parameter and verify that it works as expected. (This would also help me understand how it works, since it's not totally clear from the docs.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
485340727 https://github.com/pydata/xarray/pull/2706#issuecomment-485340727 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ4NTM0MDcyNw== davidbrochart 4711805 2019-04-22T06:38:04Z 2019-04-22T06:38:04Z CONTRIBUTOR

I added a chunk_dim parameter which allows to rechunk the appended coordinate. I think it is ready for a final review now.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
484551274 https://github.com/pydata/xarray/pull/2706#issuecomment-484551274 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ4NDU1MTI3NA== davidbrochart 4711805 2019-04-18T15:14:24Z 2019-04-18T15:14:24Z CONTRIBUTOR

I don't think it's ready yet. I think I should address the chunking issue of the appended dimension, as explained in https://medium.com/pangeo/continuously-extending-zarr-datasets-c54fbad3967d. For instance if we append along a time dimension, the time coordinate (which is a 1-D array) will have very small chunks, instead of maybe only one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
484542216 https://github.com/pydata/xarray/pull/2706#issuecomment-484542216 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ4NDU0MjIxNg== rabernat 1197350 2019-04-18T14:49:54Z 2019-04-18T14:49:54Z MEMBER

Where do we stand on this PR? @davidbrochart - do you feel this is ready for a final review? Or do you want advice or feedback on anything?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479932785 https://github.com/pydata/xarray/pull/2706#issuecomment-479932785 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTkzMjc4NQ== davidbrochart 4711805 2019-04-04T14:57:45Z 2019-04-04T14:57:45Z CONTRIBUTOR

@rabernat you're right, I took your suggestion into account in my last commit. I also rewrote the test.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479802142 https://github.com/pydata/xarray/pull/2706#issuecomment-479802142 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTgwMjE0Mg== jendrikjoe 9658781 2019-04-04T08:28:56Z 2019-04-04T08:28:56Z CONTRIBUTOR

Nice :+1:

On Apr 4, 2019 21:24, David Brochart notifications@github.com wrote:

Thanks @jendrikjoehttps://github.com/jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pydata/xarray/pull/2706#issuecomment-479800472, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJNhnai3TAdnmcRLsMnUXXRsMx7jcf3Vks5vdbakgaJpZM4aRxJT.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479800472 https://github.com/pydata/xarray/pull/2706#issuecomment-479800472 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTgwMDQ3Mg== davidbrochart 4711805 2019-04-04T08:24:01Z 2019-04-04T08:24:01Z CONTRIBUTOR

Thanks @jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479798342 https://github.com/pydata/xarray/pull/2706#issuecomment-479798342 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTc5ODM0Mg== jendrikjoe 9658781 2019-04-04T08:17:43Z 2019-04-04T08:17:43Z CONTRIBUTOR

I added you to the fork :) But feel free to do whatever is easiest for you :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479788744 https://github.com/pydata/xarray/pull/2706#issuecomment-479788744 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTc4ODc0NA== davidbrochart 4711805 2019-04-04T07:46:52Z 2019-04-04T07:46:52Z CONTRIBUTOR

Or should I open a new PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479480023 https://github.com/pydata/xarray/pull/2706#issuecomment-479480023 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTQ4MDAyMw== davidbrochart 4711805 2019-04-03T13:04:24Z 2019-04-03T13:04:24Z CONTRIBUTOR

@jendrikjoe I think you need to give me the permission to push to the branch in your fork.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
478399527 https://github.com/pydata/xarray/pull/2706#issuecomment-478399527 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3ODM5OTUyNw== jendrikjoe 9658781 2019-04-01T00:19:11Z 2019-04-01T00:19:11Z CONTRIBUTOR

Sure everyone feel welcome to join in! Sorry for the long silence. Kind of a busy time right now 😉

On Apr 1, 2019 08:47, Ryan Abernathey notifications@github.com wrote:

@davidbrocharthttps://github.com/davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoehttps://github.com/jendrikjoe would not mind if we make it a team effort!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pydata/xarray/pull/2706#issuecomment-478374296, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJNhnVvW1PlIIRFRQ-nW6QM7gd5JMAi5ks5vcRDogaJpZM4aRxJT.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
478374296 https://github.com/pydata/xarray/pull/2706#issuecomment-478374296 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3ODM3NDI5Ng== rabernat 1197350 2019-03-31T19:47:49Z 2019-03-31T19:47:49Z MEMBER

@davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoe would not mind if we make it a team effort!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
478365516 https://github.com/pydata/xarray/pull/2706#issuecomment-478365516 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3ODM2NTUxNg== davidbrochart 4711805 2019-03-31T18:22:53Z 2019-03-31T18:22:53Z CONTRIBUTOR

May I try and take this work over?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
475558223 https://github.com/pydata/xarray/pull/2706#issuecomment-475558223 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3NTU1ODIyMw== davidbrochart 4711805 2019-03-22T09:53:43Z 2019-03-22T09:53:43Z CONTRIBUTOR

Hi @jendrikjoe, do you plan to work on this PR again in the future? I think it would be a great contribution to xarray.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
459813678 https://github.com/pydata/xarray/pull/2706#issuecomment-459813678 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1OTgxMzY3OA== rabernat 1197350 2019-02-01T18:07:26Z 2019-02-01T18:07:26Z MEMBER

We should definitely always make sure that we write data consistently (e.g., for dates), but checking for alignment of all coordinates could be expensive/slow.

This implies we should be checking for attributes compatibility before calling zarr.append.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
459667067 https://github.com/pydata/xarray/pull/2706#issuecomment-459667067 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1OTY2NzA2Nw== davidbrochart 4711805 2019-02-01T09:51:55Z 2019-02-01T09:51:55Z CONTRIBUTOR

When we use this feature e.g. to store data that is produced every day, we might start with a data set that has a small size on the time dimension, and thus the chunks will be chosen according to this initial shape. When we append to this data set, will the chunks be kept as in the initial zarr archive? If so, we might end up with a lot of small chunks on the time dimension, where ideally we would have chosen only one chunk.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
459637253 https://github.com/pydata/xarray/pull/2706#issuecomment-459637253 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1OTYzNzI1Mw== shoyer 1217238 2019-02-01T07:54:30Z 2019-02-01T07:54:30Z MEMBER

We should definitely always make sure that we write data consistently (e.g., for dates), but checking for alignment of all coordinates could be expensive/slow. Potentially a keyword argument ignore_alignment=True would be a good way for user to opt-out of checking index coordinates for consistency.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
459177873 https://github.com/pydata/xarray/pull/2706#issuecomment-459177873 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1OTE3Nzg3Mw== rabernat 1197350 2019-01-31T01:24:11Z 2019-01-31T01:24:11Z MEMBER

So the problem in @davidbrochart's example is that there are different encodings on the time variables in the two datasets.

When writing datetimes, xarray automatically picks an encoding (i.e. days since 2000-01-01 00:00:00) based on some heuristics. When serializing the dataset, this encoding is used to encode the datetime64[ns] dtype into a different dtype, and the encoding is placed in the attributes of the store. When you open the dataset, the encoding is automatically decoded according to CF conventions. This can be disabled by using decode_cf=False or decode_times=False when you open the dataset.

In this case, xarray's heuristics are picking different encodings for the two dates. You could make this example work by manually specifying encoding on the appended dataset to be the same as the original.

This example illustrates the need for some sort of compatibility checks between the target dataset and the appended dataset. For example, checking for attribute compatibility would have caught this error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458896024 https://github.com/pydata/xarray/pull/2706#issuecomment-458896024 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODg5NjAyNA== jendrikjoe 9658781 2019-01-30T10:37:56Z 2019-01-30T10:37:56Z CONTRIBUTOR

I will check as well how xarry stores times to check if we have to add the offset to the xarray first or if this can be resolved with a PR to zarr :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458866421 https://github.com/pydata/xarray/pull/2706#issuecomment-458866421 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODg2NjQyMQ== davidbrochart 4711805 2019-01-30T09:05:53Z 2019-01-30T09:05:53Z CONTRIBUTOR

zarr stores the reference in the .zattrs file: { "_ARRAY_DIMENSIONS": [ "time" ], "calendar": "proleptic_gregorian", "units": "days since 2000-01-01 00:00:00" }

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458736067 https://github.com/pydata/xarray/pull/2706#issuecomment-458736067 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODczNjA2Nw== jendrikjoe 9658781 2019-01-29T22:39:00Z 2019-01-29T22:39:00Z CONTRIBUTOR

Hey @davidbrochart, thanks for all your input and as well for the resarch on how zarr stores the data. I would actually claim that the calculation of the accurate relative time should be handled by the zarr append function. An exception would be of course if xarray is storing the data with deltas to a reference as well? Then I would try collecting the minimum and offsetting the input by this. @rabernat can you provide input on that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458730247 https://github.com/pydata/xarray/pull/2706#issuecomment-458730247 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODczMDI0Nw== davidbrochart 4711805 2019-01-29T22:19:07Z 2019-01-29T22:19:07Z CONTRIBUTOR

To make it work, time dimensions would have to be treated separately because zarr doesn't encode absolute time values but deltas relative to a reference (see https://github.com/davidbrochart/pangeo_upload/blob/master/py/trmm2pangeo.py#L108).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458720920 https://github.com/pydata/xarray/pull/2706#issuecomment-458720920 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODcyMDkyMA== davidbrochart 4711805 2019-01-29T21:50:14Z 2019-01-29T21:50:14Z CONTRIBUTOR

Hi @jendrikjoe,

Thanks for your PR, I am very interested in it because this is something I was hacking around (see here). In my particular case, I want to append along a time dimension, but it looks like your PR currently doesn't support it. In the following example ds2 should have a time dimension ranging from 2000-01-01 to 2000-01-06: ```python import xarray as xr import pandas as pd

ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': pd.date_range('2000-01-04', periods=3)})

ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time')

ds2 = xr.open_zarr('temp') But it's not the case: ds2.time <xarray.DataArray 'time' (time: 6)> array(['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000', '2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-03 ``` Maybe it's not intended to work with time dimensions yet?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458694955 https://github.com/pydata/xarray/pull/2706#issuecomment-458694955 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODY5NDk1NQ== jendrikjoe 9658781 2019-01-29T20:29:05Z 2019-01-29T20:31:59Z CONTRIBUTOR

You are definitely right, that there are no checks regarding the alignment. However, if another shape than the append_dim does not align zarr will raise an error. If the coordinate differs that could be definitely an issue. I did not think about that as I am dumping reshaped dask.dataframe partitions with the append mode. Therefore, I am anyway not allowed to have a name twice. Might be interesting for other users indeed. Similar point for the attributes. I could try figuring that out as well, but that might take a while. The place where the ValueError is raised should allow to add other variables, as those are added in the KeyError exception above :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458692011 https://github.com/pydata/xarray/pull/2706#issuecomment-458692011 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODY5MjAxMQ== rabernat 1197350 2019-01-29T20:19:58Z 2019-01-29T20:19:58Z MEMBER

Ok, with the example, I can see a bit better how this works.

Here is my main concern: there doesn't appear to be any alignment checking between the target dataset and the new data. The only check that happens is whether a variable with the same name already exists in the target store, if so, append is used (rather than creating a new array). What if the coordinates differ? What if the attributes differ?

I'm not sure this is a deal-breaker. But we should be very clear about this in the docs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
457827734 https://github.com/pydata/xarray/pull/2706#issuecomment-457827734 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1NzgyNzczNA== jendrikjoe 9658781 2019-01-26T12:35:28Z 2019-01-26T12:35:28Z CONTRIBUTOR

Hi @rabernat,

happy to help! I love using xarray. I added the test for the append mode. One is making sure, that it behaves like the 'w' mode, if no data exist at the target path. The other one is testing what you described. The append_dim argument is actually the same as the dim argument for concat. Hope that helps clarifying my code :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
457741867 https://github.com/pydata/xarray/pull/2706#issuecomment-457741867 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1Nzc0MTg2Nw== rabernat 1197350 2019-01-25T21:49:26Z 2019-01-25T21:49:26Z MEMBER

Hi @jendrikjoe -- thanks for submitting a PR to address one of the most important issues in xarray (IMHO)! I am very excited about your contribution and am looking forward to getting this feature merged.

I have many questions about how this works. I think the best way to move forward is to wait until we have a test for the append feature which involves the following steps: - Write a dataset to a zarr store - Open the store in append mode - Append data along a particular dimension

Seeing the code that accomplishes this will help clarify for me what is happening.

Thanks again for your contribution, and welcome to xarray!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.174ms · About: xarray-datasette