home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where author_association = "NONE" and issue = 402908148 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 2

  • shikharsg 13
  • pep8speaks 1

issue 1

  • Appending to zarr store · 14 ✖

author_association 1

  • NONE · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
457368077 https://github.com/pydata/xarray/pull/2706#issuecomment-457368077 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1NzM2ODA3Nw== pep8speaks 24736507 2019-01-24T21:43:09Z 2019-06-29T22:49:41Z NONE

Hello @jendrikjoe! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2019-06-29 22:49:41 UTC
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506958854 https://github.com/pydata/xarray/pull/2706#issuecomment-506958854 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjk1ODg1NA== shikharsg 8643775 2019-06-29T13:57:31Z 2019-06-29T13:57:31Z NONE

I have implemented all the changes suggested and refactored the append tests as all tests were previously crammed into test_write_persistence_modes

I'm not sure why the build fails. In all the failed checks, it is these two tests that are failing: ``` ================================== FAILURES =================================== ____ testrolling_properties[1] ______

da = <xarray.DataArray (a: 3, time: 21, x: 4)> array([[[0.561926, 0.243845, 0.601879, 0.733398], [0.500418, 0.84942...ordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-21 Dimensions without coordinates: a, x

def test_rolling_properties(da):
    rolling_obj = da.rolling(time=4)

    assert rolling_obj.obj.get_axis_num('time') == 1

    # catching invalid args
    with pytest.raises(ValueError) as exception:
        da.rolling(time=7, x=2)
  assert 'exactly one dim/window should' in str(exception)

E AssertionError: assert 'exactly one dim/window should' in '<ExceptionInfo ValueError tblen=4>' E + where '<ExceptionInfo ValueError tblen=4>' = str(<ExceptionInfo ValueError tblen=4>)

xarray\tests\test_dataarray.py:3715: AssertionError ____ testrolling_properties[1] ______

ds = <xarray.Dataset> Dimensions: (time: 10, x: 8, y: 2) Coordinates: * x (x) float64 0.0 0.1429 0.2857 0.4286 0....-1.152 -0.6704 ... -0.9796 -1.884 0.4049 z2 (time, y) float64 -1.218 -0.9627 -1.398 ... -0.3552 0.1446 0.3392

def test_rolling_properties(ds):
    # catching invalid args
    with pytest.raises(ValueError) as exception:
        ds.rolling(time=7, x=2)
  assert 'exactly one dim/window should' in str(exception)

E AssertionError: assert 'exactly one dim/window should' in '<ExceptionInfo ValueError tblen=4>' E + where '<ExceptionInfo ValueError tblen=4>' = str(<ExceptionInfo ValueError tblen=4>)

xarray\tests\test_dataset.py:4845: AssertionError ============================== warnings summary =============================== ```

I have no idea why as the same two tests pass on my local machine

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506418064 https://github.com/pydata/xarray/pull/2706#issuecomment-506418064 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjQxODA2NA== shikharsg 8643775 2019-06-27T16:27:58Z 2019-06-27T16:27:58Z NONE

it's done. I fixed it by opening the zarr dataset beforehand using xr.open_zarr

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506403594 https://github.com/pydata/xarray/pull/2706#issuecomment-506403594 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjQwMzU5NA== shikharsg 8643775 2019-06-27T15:50:36Z 2019-06-27T15:50:36Z NONE

adding a new variable currently errors if we don't provide the append_dim argument:

Is this scenario now covered by the tests? Sorry if the answer is obvious; it's hard for me to discern just by looking at the code.

@rabernat , the scenario I am talking about is adding a new DataArray to an existing Dataset(in which case we do not have to specify an append_dim argument). Yes it is covered by tests, specifically see the with clause here: https://github.com/pydata/xarray/pull/2706/files#diff-df47fcb9c2f1f7dfc0c6032d97072af2R1636

Just to be clear, we do always requiring writing append_dim if you want to append values along a dimension, right? And we raise an informative error if you write append_dim='not-a-valid-dimension'?

@shoyer We do always require append_dim when appending to an existing array, but I just realized that it does not raise an error when append_dim='not-valid', but silently fails to append to the existing array. Let me write a test for that and push

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506363156 https://github.com/pydata/xarray/pull/2706#issuecomment-506363156 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjM2MzE1Ng== shikharsg 8643775 2019-06-27T14:11:55Z 2019-06-27T14:11:55Z NONE

I have fixed the above error now and all comments have now been addressed.

@rabernat @shoyer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
506288004 https://github.com/pydata/xarray/pull/2706#issuecomment-506288004 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNjI4ODAwNA== shikharsg 8643775 2019-06-27T10:21:53Z 2019-06-27T10:21:53Z NONE

adding a new variable currently errors if we don't provide the append_dim argument: ```python

import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) store = dict() ds0.to_zarr(store, mode='w') <xarray.backends.zarr.ZarrStore object at 0x7fae926505c0> ds1.to_zarr(store, mode='a') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/shikhar/code/xarray/xarray/core/dataset.py", line 1374, in to_zarr consolidated=consolidated, append_dim=append_dim) File "/home/shikhar/code/xarray/xarray/backends/api.py", line 1071, in to_zarr dump_to_store(dataset, zstore, writer, encoding=encoding) File "/home/shikhar/code/xarray/xarray/backends/api.py", line 928, in dump_to_store unlimited_dims=unlimited_dims) File "/home/shikhar/code/xarray/xarray/backends/zarr.py", line 366, in store unlimited_dims=unlimited_dims) File "/home/shikhar/code/xarray/xarray/backends/zarr.py", line 406, in set_variables "was not set".format(name) ValueError: variable 'time' already exists, but append_dim was not set ```

this works: ```python

import xarray as xr import pandas as pd ds0 = xr.Dataset({'temperature': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) ds1 = xr.Dataset({'pressure': (['time'], [50, 51, 52])}, coords={'time': pd.date_range('2000-01-01', periods=3)}) store = dict() ds0.to_zarr(store, mode='w') <xarray.backends.zarr.ZarrStore object at 0x7fae926505c0> ds1.to_zarr(store, mode='a', append_dim='asdfasdf') xr.open_zarr(store) <xarray.Dataset> Dimensions: (time: 3) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 Data variables: pressure (time) int64 dask.array<shape=(3,), chunksize=(3,)> temperature (time) int64 dask.array<shape=(3,), chunksize=(3,)> ```

will push a fix for this in a bit

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
505953845 https://github.com/pydata/xarray/pull/2706#issuecomment-505953845 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNTk1Mzg0NQ== shikharsg 8643775 2019-06-26T16:41:33Z 2019-06-26T16:41:33Z NONE

also any idea why all the AppVeyor builds are failing since yesterday? I did not change any build file in any of my commits.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
505953301 https://github.com/pydata/xarray/pull/2706#issuecomment-505953301 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNTk1MzMwMQ== shikharsg 8643775 2019-06-26T16:40:02Z 2019-06-26T16:40:02Z NONE

Thanks @shoyer @rabernat for the detailed review. All the comments have been addresses except the removal of the encode_utf8 function. In this last commit: https://github.com/pydata/xarray/pull/2706/commits/a6ff49436a620eb4071c0529d785370b2eb739b0, I have tried to do that. In the same commit I have also tried to take a shot at variable length strings using @shoyer's comments from here: https://github.com/pydata/xarray/issues/2724#issuecomment-458808896. Please let me know if this is acceptable or should I revert this commit or something.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
504477302 https://github.com/pydata/xarray/pull/2706#issuecomment-504477302 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNDQ3NzMwMg== shikharsg 8643775 2019-06-21T15:56:01Z 2019-06-21T15:56:01Z NONE

@shikharsg - are the issues you found in #2706 (comment) now resolved and covered by tests?

@rabernat these are now resolved

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
504115125 https://github.com/pydata/xarray/pull/2706#issuecomment-504115125 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwNDExNTEyNQ== shikharsg 8643775 2019-06-20T17:33:04Z 2019-06-20T17:33:04Z NONE

I have fixed the compute=False for appending to zarr store, but there are two issues that remain

  • when appending to an existing array, I resize the array first, and the return a dask.delayed object which fills up the new region of the array when compute is called on it. So if the delayed object does not get computed for whatever reason, the new portion of the array will end up with nonsense values. For this reason I was wondering if the resize function should be in the delayed object itself so the array is not resized in advance.
  • the compute=False will not work when the chunk_dim argument is set, i.e. instead of lazily appending when the compute method is called on the delayed object, it will directly append to the target store when the to_zarr method with mode='a' is called. The reason is because when the chunk_dim argument is set, it reads the original array from memory, appends to that array in memory, and overwrites the appended array to the target store. I understand that this was done because of the concern that @davidbrochart raised about doing very frequent appends to the array(for example hourly or six hourly as happens in climate modelling), and the resulting smallness of the chunk size of the dimension being appended to. But @davidbrochart, I would almost recommend removing this chunk_dim argument because the concern you raised can be overcome as follows: suppose you have a Dataset as follows: ```python temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] ds = xr.Dataset({'temperature': (['x', 'y', 'time'], temp), 'precipitation': (['x', 'y', 'time'], precip)}, coords={'lon': (['x', 'y'], lon), 'lat': (['x', 'y'], lat), 'time': pd.date_range('2014-09-06', periods=3), 'reference_time': pd.Timestamp('2014-09-05')})

and want to append it to very often. When calling the `to_zarr` function the first time call it like so:python

store = dict() ds.to_zarr(store, encoding={'temperature': {'chunks':(100,100,100)}, 'precipitation': {'chunks':(100,100,100)}}) <xarray.backends.zarr.ZarrStore object at 0x7f7027778d68> import zarr zarr.open_group(store)['temperature'].info Name : /temperature Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33903 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 zarr.open_group(store)['precipitation'].info Name : /precipitation Type : zarr.core.Array Data type : float64 Shape : (2, 2, 3) Chunk shape : (100, 100, 100) Order : C Read-only : False Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0) Store type : builtins.dict No. bytes : 96 No. bytes stored : 33906 (33.1K) Storage ratio : 0.0 Chunks initialized : 1/1 `` and then this large chunk size remains (100,100,100) (or whatever other large numbers you may want) Thechunk_dim` functionality, as it works now, is not feasible for very large arrays, since it is essentially reading the entire array into memory(and we may not have too much memory) and then overwriting the target store, because it essentially "rechunks" the array.

Thoughts please

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
503502211 https://github.com/pydata/xarray/pull/2706#issuecomment-503502211 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMzUwMjIxMQ== shikharsg 8643775 2019-06-19T10:24:47Z 2019-06-19T10:24:47Z NONE

Hi all, sorry for the delay. I was on break for 10 days. I have started working on it now and should be able to do this in a couple of days.

A quick thing I noticed as I started working on this: currently the zarr.Array.append function(which handles the resize of the array) is being used to append to the array, and as of now, it's not being done asynchronously using a dask delayed object(as I pointed out in my earlier comment). So to do it asynchronously, my plan is to resize the target array, and then the delayed object would write to the appropriate region of the resized array(dask.array.store is currently being used for this). But if for whatever reason, the delayed object does not end up being called, we would end up with nonsense values in the resized portion of the array(or whatever is the fill value). For this reason I wonder if I should put the resize in the delayed object too. Thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498583209 https://github.com/pydata/xarray/pull/2706#issuecomment-498583209 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODU4MzIwOQ== shikharsg 8643775 2019-06-04T08:52:28Z 2019-06-04T08:52:28Z NONE

will do

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498194520 https://github.com/pydata/xarray/pull/2706#issuecomment-498194520 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODE5NDUyMA== shikharsg 8643775 2019-06-03T10:03:01Z 2019-06-03T10:03:54Z NONE

I think I figured out the problem. Previously the ZarrStore defaults to the store method of the parent class AbstractWritableDataStore. The store method of AbstractWritableDataStore uses set_variables(which belongs to AbstractWritableDataStore) which adds the "source and target" variables to the ArrayWriter class, i.e. the array is not written until the call to _finalize_store which happens in the xarray.backends.api.to_zarr function(or not in compute=False). Instead your PR implements another store function in ZarrStore which uses set_variables(belonging to ZarrStore) which directly writes the data to the target array, ignoring the ArrayWriter class. I think if you just use ArrayWriter in ZarrStore.set_variables it should fix the problem.

I will be happy to push a fix, if permissions are given.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498072960 https://github.com/pydata/xarray/pull/2706#issuecomment-498072960 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODA3Mjk2MA== shikharsg 8643775 2019-06-02T23:07:41Z 2019-06-02T23:07:41Z NONE

Hi all, not sure if I could be doing something wrong myself, but the below might be a bug in this PR.

I checked out the jendrik_joe:append_zarr branch. When calling to_zarr on a Dataset with the compute=False argument, the data array should not be populated in the target zarr store right? But it seems to be doing just that(I though only metadata should be initialised). ```python

import zarr import xarray ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) store = dict() ds = ds.chunk(dict(dim1=1, dim2=1)) ds <xarray.Dataset> Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array<shape=(2, 2), chunksize=(1, 1)> store = dict() ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-72c3ec95-6ef1-473a-a0de-e81c98eb9576') xarray.open_zarr(store)['arr1'].values array([[10, 11], [12, 13]]) ```

Instead this does not happen in the xarray master branch: ```python

import zarr import xarray ds = xarray.Dataset( ... {'arr1': (['dim1', 'dim2'], [[10,11],[12,13]])}, ... coords={'dim1': [1, 2], 'dim2': [1,2]} ... ) ds = ds.chunk(dict(dim1=1, dim2=1)) ds <xarray.Dataset> Dimensions: (dim1: 2, dim2: 2) Coordinates: * dim1 (dim1) int64 1 2 * dim2 (dim2) int64 1 2 Data variables: arr1 (dim1, dim2) int64 dask.array<shape=(2, 2), chunksize=(1, 1)> store = dict() ds.to_zarr(store, mode='w', compute=False) Delayed('_finalize_store-965b0a89-bff6-4adc-8bab-e47a77e762f3') xarray.open_zarr(store)['arr1'].values array([[4611686018427387904, 4611686018427387904], [4611686018427387904, 4611686018427387904]]) Python: 3.6.7python xarray.version '0.12.1+45.g519b3986' zarr.version '2.2.1.dev185'

OS: Distributor ID: Ubuntu Description: Ubuntu 18.04.2 LTS Release: 18.04 Codename: bionic ```

I am also trying to figure out why this is happening(assuming this is a bug) and will post updates soon.

@davidbrochart

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 17.422ms · About: xarray-datasette