github: issue_comments: 59 rows where issue = 402908148 sorted by updated

59 rows where issue = 402908148 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
506994997	https://github.com/pydata/xarray/pull/2706#issuecomment-506994997	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjk5NDk5Nw==	shoyer 1217238	2019-06-29T23:43:50Z	2019-06-29T23:43:50Z	MEMBER	OK, thank you @shikharsg, @jendrikjoe and everyone else who worked on this !	{ "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
457368077	https://github.com/pydata/xarray/pull/2706#issuecomment-457368077	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1NzM2ODA3Nw==	pep8speaks 24736507	2019-01-24T21:43:09Z	2019-06-29T22:49:41Z	NONE	Hello @jendrikjoe! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers: Comment last updated at 2019-06-29 22:49:41 UTC	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506984973	https://github.com/pydata/xarray/pull/2706#issuecomment-506984973	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjk4NDk3Mw==	shoyer 1217238	2019-06-29T20:27:10Z	2019-06-29T20:27:10Z	MEMBER	@shikharsg the test failure should be fixed on master (by https://github.com/pydata/xarray/pull/3059).	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506958854	https://github.com/pydata/xarray/pull/2706#issuecomment-506958854	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjk1ODg1NA==	shikharsg 8643775	2019-06-29T13:57:31Z	2019-06-29T13:57:31Z	NONE	I have implemented all the changes suggested and refactored the append tests as all tests were previously crammed into `test_write_persistence_modes` I'm not sure why the build fails. In all the failed checks, it is these two tests that are failing: ``` ================================== FAILURES =================================== ____ testrolling_properties[1] ______ da = <xarray.DataArray (a: 3, time: 21, x: 4)> array([[[0.561926, 0.243845, 0.601879, 0.733398], [0.500418, 0.84942...ordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-21 Dimensions without coordinates: a, x `def test_rolling_properties(da): rolling_obj = da.rolling(time=4) assert rolling_obj.obj.get_axis_num('time') == 1 # catching invalid args with pytest.raises(ValueError) as exception: da.rolling(time=7, x=2)` `assert 'exactly one dim/window should' in str(exception)` E AssertionError: assert 'exactly one dim/window should' in '<ExceptionInfo ValueError tblen=4>' E + where '<ExceptionInfo ValueError tblen=4>' = str(<ExceptionInfo ValueError tblen=4>) xarray\tests\test_dataarray.py:3715: AssertionError ____ testrolling_properties[1] ______ ds = <xarray.Dataset> Dimensions: (time: 10, x: 8, y: 2) Coordinates: * x (x) float64 0.0 0.1429 0.2857 0.4286 0....-1.152 -0.6704 ... -0.9796 -1.884 0.4049 z2 (time, y) float64 -1.218 -0.9627 -1.398 ... -0.3552 0.1446 0.3392 `def test_rolling_properties(ds): # catching invalid args with pytest.raises(ValueError) as exception: ds.rolling(time=7, x=2)` `assert 'exactly one dim/window should' in str(exception)` E AssertionError: assert 'exactly one dim/window should' in '<ExceptionInfo ValueError tblen=4>' E + where '<ExceptionInfo ValueError tblen=4>' = str(<ExceptionInfo ValueError tblen=4>) xarray\tests\test_dataset.py:4845: AssertionError ============================== warnings summary =============================== ``` I have no idea why as the same two tests pass on my local machine	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506418064	https://github.com/pydata/xarray/pull/2706#issuecomment-506418064	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjQxODA2NA==	shikharsg 8643775	2019-06-27T16:27:58Z	2019-06-27T16:27:58Z	NONE	it's done. I fixed it by opening the zarr dataset beforehand using `xr.open_zarr`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506403594	https://github.com/pydata/xarray/pull/2706#issuecomment-506403594	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjQwMzU5NA==	shikharsg 8643775	2019-06-27T15:50:36Z	2019-06-27T15:50:36Z	NONE	adding a new variable currently errors if we don't provide the `append_dim` argument: Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code. @rabernat , the scenario I am talking about is adding a new `DataArray` to an existing `Dataset`(in which case we do not have to specify an `append_dim` argument). Yes it is covered by tests, specifically see the `with` clause here: https://github.com/pydata/xarray/pull/2706/files#diff-df47fcb9c2f1f7dfc0c6032d97072af2R1636 Just to be clear, we do always requiring writing `append_dim` if you want to append values along a dimension, right? And we raise an informative error if you write `append_dim='not-a-valid-dimension'`? @shoyer We do always require `append_dim` when appending to an existing array, but I just realized that it does not raise an error when `append_dim='not-valid'`, but silently fails to append to the existing array. Let me write a test for that and push	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506390414	https://github.com/pydata/xarray/pull/2706#issuecomment-506390414	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjM5MDQxNA==	shoyer 1217238	2019-06-27T15:18:00Z	2019-06-27T15:18:00Z	MEMBER	Just to be clear, we do always requiring writing `append_dim` if you want to append values along a dimension, right? And we raise an informative error if you write `append_dim='not-a-valid-dimension'`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506383042	https://github.com/pydata/xarray/pull/2706#issuecomment-506383042	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjM4MzA0Mg==	rabernat 1197350	2019-06-27T14:59:52Z	2019-06-27T14:59:52Z	MEMBER	adding a new variable currently errors if we don't provide the `append_dim` argument: Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506363156	https://github.com/pydata/xarray/pull/2706#issuecomment-506363156	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjM2MzE1Ng==	shikharsg 8643775	2019-06-27T14:11:55Z	2019-06-27T14:11:55Z	NONE	I have fixed the above error now and all comments have now been addressed. @rabernat @shoyer	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
506288004	https://github.com/pydata/xarray/pull/2706#issuecomment-506288004	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNjI4ODAwNA==	shikharsg 8643775	2019-06-27T10:21:53Z	2019-06-27T10:21:53Z	NONE	adding a new variable currently errors if we don't provide the `append_dim` argument: ```python import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) store = dict() ds0.to_zarr(store, mode='w') <xarray.backends.zarr.ZarrStore object at 0x7fae926505c0> ds1.to_zarr(store, mode='a') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/shikhar/code/xarray/xarray/core/dataset.py", line 1374, in to_zarr consolidated=consolidated, append_dim=append_dim) File "/home/shikhar/code/xarray/xarray/backends/api.py", line 1071, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/shikhar/code/xarray/xarray/backends/api.py", line 928, in dump_to_store unlimited_dims=unlimited_dims) File "/home/shikhar/code/xarray/xarray/backends/zarr.py", line 366, in store unlimited_dims=unlimited_dims) File "/home/shikhar/code/xarray/xarray/backends/zarr.py", line 406, in set_variables "was not set".format(name) ValueError: variable 'time' already exists, but append_dim was not set ``` this works: ```python import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) store = dict() ds0.to_zarr(store, mode='w') <xarray.backends.zarr.ZarrStore object at 0x7fae926505c0> ds1.to_zarr(store, mode='a', append_dim='asdfasdf') xr.open_zarr(store) <xarray.Dataset> Dimensions: (time: 3) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: pressure (time) int64 dask.array<shape=(3,), chunksize=(3,)> temperature (time) int64 dask.array<shape=(3,), chunksize=(3,)> ``` will push a fix for this in a bit	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
505958006	https://github.com/pydata/xarray/pull/2706#issuecomment-505958006	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNTk1ODAwNg==	shoyer 1217238	2019-06-26T16:52:59Z	2019-06-26T16:52:59Z	MEMBER	The Appveyor build failures are definitely not your fault, please ignore them	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
505957913	https://github.com/pydata/xarray/pull/2706#issuecomment-505957913	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNTk1NzkxMw==	shoyer 1217238	2019-06-26T16:52:43Z	2019-06-26T16:52:43Z	MEMBER	+1 for saving as native variable length strings in zarr. I didn't realize that was an option when we originally wrote xarray's zarr support, but its definitely a much cleaner way to do things in most cases.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
505953845	https://github.com/pydata/xarray/pull/2706#issuecomment-505953845	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNTk1Mzg0NQ==	shikharsg 8643775	2019-06-26T16:41:33Z	2019-06-26T16:41:33Z	NONE	also any idea why all the AppVeyor builds are failing since yesterday? I did not change any build file in any of my commits.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
505953301	https://github.com/pydata/xarray/pull/2706#issuecomment-505953301	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNTk1MzMwMQ==	shikharsg 8643775	2019-06-26T16:40:02Z	2019-06-26T16:40:02Z	NONE	Thanks @shoyer @rabernat for the detailed review. All the comments have been addresses except the removal of the `encode_utf8` function. In this last commit: https://github.com/pydata/xarray/pull/2706/commits/a6ff49436a620eb4071c0529d785370b2eb739b0, I have tried to do that. In the same commit I have also tried to take a shot at variable length strings using @shoyer's comments from here: https://github.com/pydata/xarray/issues/2724#issuecomment-458808896. Please let me know if this is acceptable or should I revert this commit or something.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
505903058	https://github.com/pydata/xarray/pull/2706#issuecomment-505903058	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNTkwMzA1OA==	rabernat 1197350	2019-06-26T14:34:27Z	2019-06-26T14:34:27Z	MEMBER	Thanks @shoyer for your more careful review of this PR. As usual you pick up on all the important details.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
504477302	https://github.com/pydata/xarray/pull/2706#issuecomment-504477302	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNDQ3NzMwMg==	shikharsg 8643775	2019-06-21T15:56:01Z	2019-06-21T15:56:01Z	NONE	@shikharsg - are the issues you found in #2706 (comment) now resolved and covered by tests? @rabernat these are now resolved	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
504184558	https://github.com/pydata/xarray/pull/2706#issuecomment-504184558	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNDE4NDU1OA==	davidbrochart 4711805	2019-06-20T21:11:57Z	2019-06-20T21:11:57Z	CONTRIBUTOR	You're right @shikharsg, the `chunk_dim` argument can be removed. I was not very happy with the complexity it brought as well.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
504115125	https://github.com/pydata/xarray/pull/2706#issuecomment-504115125	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwNDExNTEyNQ==	shikharsg 8643775	2019-06-20T17:33:04Z	2019-06-20T17:33:04Z	NONE	I have fixed the compute=False for appending to zarr store, but there are two issues that remain when appending to an existing array, I resize the array first, and the return a `dask.delayed` object which fills up the new region of the array when `compute` is called on it. So if the delayed object does not get computed for whatever reason, the new portion of the array will end up with nonsense values. For this reason I was wondering if the resize function should be in the delayed object itself so the array is not resized in advance. the `compute=False` will not work when the `chunk_dim` argument is set, i.e. instead of lazily appending when the compute method is called on the delayed object, it will directly append to the target store when the `to_zarr` method with `mode='a'` is called. The reason is because when the `chunk_dim` argument is set, it reads the original array from memory, appends to that array in memory, and overwrites the appended array to the target store. I understand that this was done because of the concern that @davidbrochart raised about doing very frequent appends to the array(for example hourly or six hourly as happens in climate modelling), and the resulting smallness of the chunk size of the dimension being appended to. But @davidbrochart, I would almost recommend removing this `chunk_dim` argument because the concern you raised can be overcome as follows: suppose you have a Dataset as follows: ```python temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] ds = xr.Dataset({'temperature': (['x', 'y', 'time'], temp), 'precipitation': (['x', 'y', 'time'], precip)}, coords={'lon': (['x', 'y'], lon), 'lat': (['x', 'y'], lat), 'time': pd.date_range('2014-09-06', periods=3), 'reference_time': pd.Timestamp('2014-09-05')}) and want to append it to very often. When calling the `to_zarr` function the first time call it like so:python store = dict() ds.to_zarr(store, encoding={'temperature': {'chunks':(100,100,100)}, 'precipitation': {'chunks':(100,100,100)}}) <xarray.backends.zarr.ZarrStore object at 0x7f7027778d68> import zarr zarr.open_group(store)['temperature'].info Name : /temperature Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33903 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 zarr.open_group(store)['precipitation'].info Name : /precipitation Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33906 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 `` and then this large chunk size remains (100,100,100) (or whatever other large numbers you may want) Thechunk_dim` functionality, as it works now, is not feasible for very large arrays, since it is essentially reading the entire array into memory(and we may not have too much memory) and then overwriting the target store, because it essentially "rechunks" the array. Thoughts please	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
503502211	https://github.com/pydata/xarray/pull/2706#issuecomment-503502211	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwMzUwMjIxMQ==	shikharsg 8643775	2019-06-19T10:24:47Z	2019-06-19T10:24:47Z	NONE	Hi all, sorry for the delay. I was on break for 10 days. I have started working on it now and should be able to do this in a couple of days. A quick thing I noticed as I started working on this: currently the zarr.Array.append function(which handles the resize of the array) is being used to append to the array, and as of now, it's not being done asynchronously using a dask delayed object(as I pointed out in my earlier comment). So to do it asynchronously, my plan is to resize the target array, and then the delayed object would write to the appropriate region of the resized array(dask.array.store is currently being used for this). But if for whatever reason, the delayed object does not end up being called, we would end up with nonsense values in the resized portion of the array(or whatever is the fill value). For this reason I wonder if I should put the resize in the delayed object too. Thoughts?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
502827736	https://github.com/pydata/xarray/pull/2706#issuecomment-502827736	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwMjgyNzczNg==	jendrikjoe 9658781	2019-06-17T19:56:23Z	2019-06-17T19:56:23Z	CONTRIBUTOR	I build a filter that is raising a value error as soon as any variable has a dtype different from any subclass of np.number or np.string_. I as well build test for that and added a function to manually convert dynamic sized string arrays to fixed sized ones. I as well wrote a test for @shikharsg issue and can reproduce it. The test is currently commented to not fail the pipeline as I wanted to discuss if this is a blocking issue or if we should merge it and raise a new issue for it. It seems to be originating from the fact that we moved away from using writer.add and instead are actually calling the zarr functions directly. There should be a way to change this back to do it lazily, but that will probably take time. What do you think?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
502754545	https://github.com/pydata/xarray/pull/2706#issuecomment-502754545	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwMjc1NDU0NQ==	jendrikjoe 9658781	2019-06-17T16:24:46Z	2019-06-17T16:24:46Z	CONTRIBUTOR	@jendrikjoe - thanks for digging in and finding this important issue! This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose. `* Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings) * Raise an error if append is called on the incompatible datatypes * Move forward with this PR, which is otherwise very nearly ready * Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr` How does that sound to everyone? This sounds like a plan. I will try to work on getting this ready tonight and tmrw. Let us see how far I can get.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
502700081	https://github.com/pydata/xarray/pull/2706#issuecomment-502700081	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwMjcwMDA4MQ==	rabernat 1197350	2019-06-17T14:13:45Z	2019-06-17T14:13:45Z	MEMBER	@shikharsg - are the issues you found in https://github.com/pydata/xarray/pull/2706#issuecomment-498194520 now resolved and covered by tests?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
502699808	https://github.com/pydata/xarray/pull/2706#issuecomment-502699808	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwMjY5OTgwOA==	rabernat 1197350	2019-06-17T14:12:59Z	2019-06-17T14:12:59Z	MEMBER	@jendrikjoe - thanks for digging in and finding this important issue! This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose. - Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings) - Raise an error if append is called on the incompatible datatypes - Move forward with this PR, which is otherwise very nearly ready - Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr How does that sound to everyone?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
502481584	https://github.com/pydata/xarray/pull/2706#issuecomment-502481584	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDUwMjQ4MTU4NA==	jendrikjoe 9658781	2019-06-16T20:05:04Z	2019-06-16T20:23:54Z	CONTRIBUTOR	Hey there everyone, sorry for not working on this for so long from my side. I just picked it up again and realised that the way the encoding works, all the datatypes and the maximum string lengths in the first xarray have to be representative for all others. Otherwise the following cuts away every char after the second: `ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds1 = xr.Dataset({'temperature': (['time'], ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time')` It is solvable when explicitly setting the type before writing: `ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds0['temperature'] = ds0.temperature.astype(np.dtype('S5')) ds1 = xr.Dataset({'temperature': (['time'], ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time')` It becomes however worse when using non-ascii characters, as they get encoded in zarr.py l:218, but with the next chunk that is coming in the check in conventions.py l:86 fails. So I think we actually have to resolve the the TODO in zarr.py l:215 before this is able to be merged. Otherwise, the following leads to multiple issues: `ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds1 = xr.Dataset({'temperature': (['time'], ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') xr.open_zarr('temp').temperature.values` The only way to work around this issue is to explicitly encode the data beforehand to utf-8: from xarray.coding.variables import safe_setitem, unpack_for_encoding from xarray.coding.strings import encode_string_array from xarray.core.variable import Variable def encode_utf8(var, string_max_length): dims, data, attrs, encoding = unpack_for_encoding(var) safe_setitem(attrs, '_Encoding', 'utf-8') data = encode_string_array(data, 'utf-8') data = data.astype(np.dtype(f"S{string_max_length*2}")) return Variable(dims, data, attrs, encoding) ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]}) ds0['temperature'] = encode_utf8(ds0.temperature, 2) ds1 = xr.Dataset({'temperature': (['time'], ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]}) ds1['temperature'] = encode_utf8(ds1.temperature, 2) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') xr.open_zarr('temp').temperature.values Even though this is doable if it is known in advance, we should definitely mention this in the documentation or fix this by fixing the encoding itself. What do you think? Cheers, Jendrik	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
498583209	https://github.com/pydata/xarray/pull/2706#issuecomment-498583209	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5ODU4MzIwOQ==	shikharsg 8643775	2019-06-04T08:52:28Z	2019-06-04T08:52:28Z	NONE	will do	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
498227133	https://github.com/pydata/xarray/pull/2706#issuecomment-498227133	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5ODIyNzEzMw==	rabernat 1197350	2019-06-03T11:58:19Z	2019-06-03T11:58:19Z	MEMBER	Let’s make sure this new scenario is covered by tests! Sent from my iPhone On Jun 3, 2019, at 6:40 AM, Jendrik Jördening notifications@github.com wrote: Gave you the permissions @shikharsg — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
498205860	https://github.com/pydata/xarray/pull/2706#issuecomment-498205860	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5ODIwNTg2MA==	jendrikjoe 9658781	2019-06-03T10:40:28Z	2019-06-03T10:40:28Z	CONTRIBUTOR	Gave you the permissions @shikharsg	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
498199759	https://github.com/pydata/xarray/pull/2706#issuecomment-498199759	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5ODE5OTc1OQ==	davidbrochart 4711805	2019-06-03T10:19:15Z	2019-06-03T10:19:15Z	CONTRIBUTOR	It would be great if you could fix it. @jendrikjoe can give you the permission to push to the branch in his fork.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
498194520	https://github.com/pydata/xarray/pull/2706#issuecomment-498194520	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5ODE5NDUyMA==	shikharsg 8643775	2019-06-03T10:03:01Z	2019-06-03T10:03:54Z	NONE	I think I figured out the problem. Previously the `ZarrStore` defaults to the `store` method of the parent class `AbstractWritableDataStore`. The `store` method of `AbstractWritableDataStore` uses `set_variables`(which belongs to `AbstractWritableDataStore`) which adds the "source and target" variables to the `ArrayWriter` class, i.e. the array is not written until the call to `_finalize_store` which happens in the `xarray.backends.api.to_zarr` function(or not in `compute=False`). Instead your PR implements another `store` function in `ZarrStore` which uses `set_variables`(belonging to `ZarrStore`) which directly writes the data to the target array, ignoring the `ArrayWriter` class. I think if you just use `ArrayWriter` in `ZarrStore.set_variables` it should fix the problem. I will be happy to push a fix, if permissions are given.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
498166896	https://github.com/pydata/xarray/pull/2706#issuecomment-498166896	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5ODE2Njg5Ng==	davidbrochart 4711805	2019-06-03T08:41:01Z	2019-06-03T08:41:01Z	CONTRIBUTOR	Thanks @shikharsg for looking into that. This PR and master have diverged quite a bit, I will need to merge the changes, I will let you know.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
498072960	https://github.com/pydata/xarray/pull/2706#issuecomment-498072960	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5ODA3Mjk2MA==	shikharsg 8643775	2019-06-02T23:07:41Z	2019-06-02T23:07:41Z	NONE	Hi all, not sure if I could be doing something wrong myself, but the below might be a bug in this PR. I checked out the `jendrik_joe:append_zarr` branch. When calling `to_zarr` on a `Dataset` with the `compute=False` argument, the data array should not be populated in the target `zarr` store right? But it seems to be doing just that(I though only metadata should be initialised). ```python import zarr import xarray ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) store = dict() ds = ds.chunk(dict(dim1=1, dim2=1)) ds <xarray.Dataset> Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array<shape=(2, 2), chunksize=(1, 1)> store = dict() ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-72c3ec95-6ef1-473a-a0de-e81c98eb9576') xarray.open_zarr(store)['arr1'].values array([[10, 11], [12, 13]]) ``` Instead this does not happen in the xarray master branch: ```python import zarr import xarray ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) ds = ds.chunk(dict(dim1=1, dim2=1)) ds <xarray.Dataset> Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array<shape=(2, 2), chunksize=(1, 1)> store = dict() ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-965b0a89-bff6-4adc-8bab-e47a77e762f3') xarray.open_zarr(store)['arr1'].values array([[4611686018427387904, 4611686018427387904], [4611686018427387904, 4611686018427387904]]) `Python: 3.6.7`python xarray.version '0.12.1+45.g519b3986' zarr.version '2.2.1.dev185' `OS:` Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic ``` I am also trying to figure out why this is happening(assuming this is a bug) and will post updates soon. @davidbrochart	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
494340006	https://github.com/pydata/xarray/pull/2706#issuecomment-494340006	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5NDM0MDAwNg==	davidbrochart 4711805	2019-05-21T10:46:20Z	2019-05-21T10:46:20Z	CONTRIBUTOR	No problem @rabernat, and thanks a lot for your time in reviewing this PR. I added a test for `chunk_dim`. Please let me know if this is clearer, and if I should explain further in the docs.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
494062757	https://github.com/pydata/xarray/pull/2706#issuecomment-494062757	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ5NDA2Mjc1Nw==	rabernat 1197350	2019-05-20T16:38:24Z	2019-05-20T16:38:24Z	MEMBER	Hi @davidbrochart. I'm really sorry it takes me so long between reviews of your PR. It is very important work, and I appreciate your continued patience. I looked at your new code, and I noticed that `chunk_dim` does not appear in the tests. I think it is important to test this parameter and verify that it works as expected. (This would also help me understand how it works, since it's not totally clear from the docs.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
485340727	https://github.com/pydata/xarray/pull/2706#issuecomment-485340727	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ4NTM0MDcyNw==	davidbrochart 4711805	2019-04-22T06:38:04Z	2019-04-22T06:38:04Z	CONTRIBUTOR	I added a `chunk_dim` parameter which allows to rechunk the appended coordinate. I think it is ready for a final review now.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
484551274	https://github.com/pydata/xarray/pull/2706#issuecomment-484551274	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ4NDU1MTI3NA==	davidbrochart 4711805	2019-04-18T15:14:24Z	2019-04-18T15:14:24Z	CONTRIBUTOR	I don't think it's ready yet. I think I should address the chunking issue of the appended dimension, as explained in https://medium.com/pangeo/continuously-extending-zarr-datasets-c54fbad3967d. For instance if we append along a time dimension, the time coordinate (which is a 1-D array) will have very small chunks, instead of maybe only one.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
484542216	https://github.com/pydata/xarray/pull/2706#issuecomment-484542216	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ4NDU0MjIxNg==	rabernat 1197350	2019-04-18T14:49:54Z	2019-04-18T14:49:54Z	MEMBER	Where do we stand on this PR? @davidbrochart - do you feel this is ready for a final review? Or do you want advice or feedback on anything?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
479932785	https://github.com/pydata/xarray/pull/2706#issuecomment-479932785	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3OTkzMjc4NQ==	davidbrochart 4711805	2019-04-04T14:57:45Z	2019-04-04T14:57:45Z	CONTRIBUTOR	@rabernat you're right, I took your suggestion into account in my last commit. I also rewrote the test.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
479802142	https://github.com/pydata/xarray/pull/2706#issuecomment-479802142	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3OTgwMjE0Mg==	jendrikjoe 9658781	2019-04-04T08:28:56Z	2019-04-04T08:28:56Z	CONTRIBUTOR	Nice :+1: On Apr 4, 2019 21:24, David Brochart notifications@github.com wrote: Thanks @jendrikjoehttps://github.com/jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pydata/xarray/pull/2706#issuecomment-479800472, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJNhnai3TAdnmcRLsMnUXXRsMx7jcf3Vks5vdbakgaJpZM4aRxJT.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
479800472	https://github.com/pydata/xarray/pull/2706#issuecomment-479800472	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3OTgwMDQ3Mg==	davidbrochart 4711805	2019-04-04T08:24:01Z	2019-04-04T08:24:01Z	CONTRIBUTOR	Thanks @jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
479798342	https://github.com/pydata/xarray/pull/2706#issuecomment-479798342	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3OTc5ODM0Mg==	jendrikjoe 9658781	2019-04-04T08:17:43Z	2019-04-04T08:17:43Z	CONTRIBUTOR	I added you to the fork :) But feel free to do whatever is easiest for you :)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
479788744	https://github.com/pydata/xarray/pull/2706#issuecomment-479788744	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3OTc4ODc0NA==	davidbrochart 4711805	2019-04-04T07:46:52Z	2019-04-04T07:46:52Z	CONTRIBUTOR	Or should I open a new PR?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
479480023	https://github.com/pydata/xarray/pull/2706#issuecomment-479480023	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3OTQ4MDAyMw==	davidbrochart 4711805	2019-04-03T13:04:24Z	2019-04-03T13:04:24Z	CONTRIBUTOR	@jendrikjoe I think you need to give me the permission to push to the branch in your fork.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
478399527	https://github.com/pydata/xarray/pull/2706#issuecomment-478399527	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3ODM5OTUyNw==	jendrikjoe 9658781	2019-04-01T00:19:11Z	2019-04-01T00:19:11Z	CONTRIBUTOR	Sure everyone feel welcome to join in! Sorry for the long silence. Kind of a busy time right now 😉 On Apr 1, 2019 08:47, Ryan Abernathey notifications@github.com wrote: @davidbrocharthttps://github.com/davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoehttps://github.com/jendrikjoe would not mind if we make it a team effort! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pydata/xarray/pull/2706#issuecomment-478374296, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJNhnVvW1PlIIRFRQ-nW6QM7gd5JMAi5ks5vcRDogaJpZM4aRxJT.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
478374296	https://github.com/pydata/xarray/pull/2706#issuecomment-478374296	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3ODM3NDI5Ng==	rabernat 1197350	2019-03-31T19:47:49Z	2019-03-31T19:47:49Z	MEMBER	@davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoe would not mind if we make it a team effort!	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
478365516	https://github.com/pydata/xarray/pull/2706#issuecomment-478365516	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3ODM2NTUxNg==	davidbrochart 4711805	2019-03-31T18:22:53Z	2019-03-31T18:22:53Z	CONTRIBUTOR	May I try and take this work over?	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
475558223	https://github.com/pydata/xarray/pull/2706#issuecomment-475558223	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ3NTU1ODIyMw==	davidbrochart 4711805	2019-03-22T09:53:43Z	2019-03-22T09:53:43Z	CONTRIBUTOR	Hi @jendrikjoe, do you plan to work on this PR again in the future? I think it would be a great contribution to xarray.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
459813678	https://github.com/pydata/xarray/pull/2706#issuecomment-459813678	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1OTgxMzY3OA==	rabernat 1197350	2019-02-01T18:07:26Z	2019-02-01T18:07:26Z	MEMBER	We should definitely always make sure that we write data consistently (e.g., for dates), but checking for alignment of all coordinates could be expensive/slow. This implies we should be checking for attributes compatibility before calling `zarr.append`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
459667067	https://github.com/pydata/xarray/pull/2706#issuecomment-459667067	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1OTY2NzA2Nw==	davidbrochart 4711805	2019-02-01T09:51:55Z	2019-02-01T09:51:55Z	CONTRIBUTOR	When we use this feature e.g. to store data that is produced every day, we might start with a data set that has a small size on the time dimension, and thus the chunks will be chosen according to this initial shape. When we append to this data set, will the chunks be kept as in the initial zarr archive? If so, we might end up with a lot of small chunks on the time dimension, where ideally we would have chosen only one chunk.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
459637253	https://github.com/pydata/xarray/pull/2706#issuecomment-459637253	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1OTYzNzI1Mw==	shoyer 1217238	2019-02-01T07:54:30Z	2019-02-01T07:54:30Z	MEMBER	We should definitely always make sure that we write data consistently (e.g., for dates), but checking for alignment of all coordinates could be expensive/slow. Potentially a keyword argument `ignore_alignment=True` would be a good way for user to opt-out of checking index coordinates for consistency.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
459177873	https://github.com/pydata/xarray/pull/2706#issuecomment-459177873	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1OTE3Nzg3Mw==	rabernat 1197350	2019-01-31T01:24:11Z	2019-01-31T01:24:11Z	MEMBER	So the problem in @davidbrochart's example is that there are different encodings on the time variables in the two datasets. When writing datetimes, xarray automatically picks an encoding (i.e. `days since 2000-01-01 00:00:00`) based on some heuristics. When serializing the dataset, this encoding is used to encode the `datetime64[ns]` dtype into a different dtype, and the encoding is placed in the attributes of the store. When you open the dataset, the encoding is automatically decoded according to CF conventions. This can be disabled by using `decode_cf=False` or `decode_times=False` when you open the dataset. In this case, xarray's heuristics are picking different encodings for the two dates. You could make this example work by manually specifying encoding on the appended dataset to be the same as the original. This example illustrates the need for some sort of compatibility checks between the target dataset and the appended dataset. For example, checking for attribute compatibility would have caught this error.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
458896024	https://github.com/pydata/xarray/pull/2706#issuecomment-458896024	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1ODg5NjAyNA==	jendrikjoe 9658781	2019-01-30T10:37:56Z	2019-01-30T10:37:56Z	CONTRIBUTOR	I will check as well how xarry stores times to check if we have to add the offset to the xarray first or if this can be resolved with a PR to zarr :)	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
458866421	https://github.com/pydata/xarray/pull/2706#issuecomment-458866421	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1ODg2NjQyMQ==	davidbrochart 4711805	2019-01-30T09:05:53Z	2019-01-30T09:05:53Z	CONTRIBUTOR	zarr stores the reference in the `.zattrs` file: `{ "_ARRAY_DIMENSIONS": [ "time" ], "calendar": "proleptic_gregorian", "units": "days since 2000-01-01 00:00:00" }`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
458736067	https://github.com/pydata/xarray/pull/2706#issuecomment-458736067	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1ODczNjA2Nw==	jendrikjoe 9658781	2019-01-29T22:39:00Z	2019-01-29T22:39:00Z	CONTRIBUTOR	Hey @davidbrochart, thanks for all your input and as well for the resarch on how zarr stores the data. I would actually claim that the calculation of the accurate relative time should be handled by the zarr append function. An exception would be of course if xarray is storing the data with deltas to a reference as well? Then I would try collecting the minimum and offsetting the input by this. @rabernat can you provide input on that?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
458730247	https://github.com/pydata/xarray/pull/2706#issuecomment-458730247	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1ODczMDI0Nw==	davidbrochart 4711805	2019-01-29T22:19:07Z	2019-01-29T22:19:07Z	CONTRIBUTOR	To make it work, time dimensions would have to be treated separately because zarr doesn't encode absolute time values but deltas relative to a reference (see https://github.com/davidbrochart/pangeo_upload/blob/master/py/trmm2pangeo.py#L108).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
458720920	https://github.com/pydata/xarray/pull/2706#issuecomment-458720920	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1ODcyMDkyMA==	davidbrochart 4711805	2019-01-29T21:50:14Z	2019-01-29T21:50:14Z	CONTRIBUTOR	Hi @jendrikjoe, Thanks for your PR, I am very interested in it because this is something I was hacking around (see here). In my particular case, I want to append along a time dimension, but it looks like your PR currently doesn't support it. In the following example `ds2` should have a time dimension ranging from 2000-01-01 to 2000-01-06: ```python import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'temperature': (['time'], [53, 54, 55])}, coords={'time': pd.date_range('2000-01-04', periods=3)}) ds0.to_zarr('temp') ds1.to_zarr('temp', mode='a', append_dim='time') ds2 = xr.open_zarr('temp') `But it's not the case:` ds2.time <xarray.DataArray 'time' (time: 6)> array(['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000', '2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-03 ``` Maybe it's not intended to work with time dimensions yet?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
458694955	https://github.com/pydata/xarray/pull/2706#issuecomment-458694955	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1ODY5NDk1NQ==	jendrikjoe 9658781	2019-01-29T20:29:05Z	2019-01-29T20:31:59Z	CONTRIBUTOR	You are definitely right, that there are no checks regarding the alignment. However, if another shape than the append_dim does not align zarr will raise an error. If the coordinate differs that could be definitely an issue. I did not think about that as I am dumping reshaped dask.dataframe partitions with the append mode. Therefore, I am anyway not allowed to have a name twice. Might be interesting for other users indeed. Similar point for the attributes. I could try figuring that out as well, but that might take a while. The place where the ValueError is raised should allow to add other variables, as those are added in the KeyError exception above :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
458692011	https://github.com/pydata/xarray/pull/2706#issuecomment-458692011	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1ODY5MjAxMQ==	rabernat 1197350	2019-01-29T20:19:58Z	2019-01-29T20:19:58Z	MEMBER	Ok, with the example, I can see a bit better how this works. Here is my main concern: there doesn't appear to be any alignment checking between the target dataset and the new data. The only check that happens is whether a variable with the same name already exists in the target store, if so, append is used (rather than creating a new array). What if the coordinates differ? What if the attributes differ? I'm not sure this is a deal-breaker. But we should be very clear about this in the docs.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
457827734	https://github.com/pydata/xarray/pull/2706#issuecomment-457827734	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1NzgyNzczNA==	jendrikjoe 9658781	2019-01-26T12:35:28Z	2019-01-26T12:35:28Z	CONTRIBUTOR	Hi @rabernat, happy to help! I love using xarray. I added the test for the append mode. One is making sure, that it behaves like the 'w' mode, if no data exist at the target path. The other one is testing what you described. The append_dim argument is actually the same as the dim argument for concat. Hope that helps clarifying my code :)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148
457741867	https://github.com/pydata/xarray/pull/2706#issuecomment-457741867	https://api.github.com/repos/pydata/xarray/issues/2706	MDEyOklzc3VlQ29tbWVudDQ1Nzc0MTg2Nw==	rabernat 1197350	2019-01-25T21:49:26Z	2019-01-25T21:49:26Z	MEMBER	Hi @jendrikjoe -- thanks for submitting a PR to address one of the most important issues in xarray (IMHO)! I am very excited about your contribution and am looking forward to getting this feature merged. I have many questions about how this works. I think the best way to move forward is to wait until we have a test for the append feature which involves the following steps: - Write a dataset to a zarr store - Open the store in append mode - Append data along a particular dimension Seeing the code that accomplishes this will help clarify for me what is happening. Thanks again for your contribution, and welcome to xarray!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Appending to zarr store 402908148

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

59 rows where issue = 402908148 sorted by updated_at descending

Comment last updated at 2019-06-29 22:49:41 UTC

Advanced export