home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 402908148 and user = 9658781 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • jendrikjoe · 11 ✖

issue 1

  • Appending to zarr store · 11 ✖

author_association 1

  • CONTRIBUTOR 11
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
502827736 https://github.com/pydata/xarray/pull/2706#issuecomment-502827736 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjgyNzczNg== jendrikjoe 9658781 2019-06-17T19:56:23Z 2019-06-17T19:56:23Z CONTRIBUTOR

I build a filter that is raising a value error as soon as any variable has a dtype different from any subclass of np.number or np.string_. I as well build test for that and added a function to manually convert dynamic sized string arrays to fixed sized ones.

I as well wrote a test for @shikharsg issue and can reproduce it. The test is currently commented to not fail the pipeline as I wanted to discuss if this is a blocking issue or if we should merge it and raise a new issue for it. It seems to be originating from the fact that we moved away from using writer.add and instead are actually calling the zarr functions directly. There should be a way to change this back to do it lazily, but that will probably take time. What do you think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
502754545 https://github.com/pydata/xarray/pull/2706#issuecomment-502754545 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjc1NDU0NQ== jendrikjoe 9658781 2019-06-17T16:24:46Z 2019-06-17T16:24:46Z CONTRIBUTOR

@jendrikjoe - thanks for digging in and finding this important issue!

This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose.

* Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings)

* Raise an error if append is called on the incompatible datatypes

* Move forward with this PR, which is otherwise very nearly ready

* Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr

How does that sound to everyone?

This sounds like a plan. I will try to work on getting this ready tonight and tmrw. Let us see how far I can get.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
502481584 https://github.com/pydata/xarray/pull/2706#issuecomment-502481584 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDUwMjQ4MTU4NA== jendrikjoe 9658781 2019-06-16T20:05:04Z 2019-06-16T20:23:54Z CONTRIBUTOR

Hey there everyone, sorry for not working on this for so long from my side. I just picked it up again and realised that the way the encoding works, all the datatypes and the maximum string lengths in the first xarray have to be representative for all others. Otherwise the following cuts away every char after the second:

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds1 = xr.Dataset({'temperature': (['time'],  ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')

It is solvable when explicitly setting the type before writing:

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds0['temperature'] = ds0.temperature.astype(np.dtype('S5'))
ds1 = xr.Dataset({'temperature': (['time'],  ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')

It becomes however worse when using non-ascii characters, as they get encoded in zarr.py l:218, but with the next chunk that is coming in the check in conventions.py l:86 fails. So I think we actually have to resolve the the TODO in zarr.py l:215 before this is able to be merged. Otherwise, the following leads to multiple issues:

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds1 = xr.Dataset({'temperature': (['time'],  ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
xr.open_zarr('temp').temperature.values

The only way to work around this issue is to explicitly encode the data beforehand to utf-8:

from xarray.coding.variables import safe_setitem, unpack_for_encoding
from xarray.coding.strings import encode_string_array
from xarray.core.variable import Variable

def encode_utf8(var, string_max_length):
    dims, data, attrs, encoding = unpack_for_encoding(var)
    safe_setitem(attrs, '_Encoding', 'utf-8')
    data = encode_string_array(data, 'utf-8')
    data = data.astype(np.dtype(f"S{string_max_length*2}"))
    return Variable(dims, data, attrs, encoding)

ds0 = xr.Dataset({'temperature': (['time'],  ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds0['temperature'] = encode_utf8(ds0.temperature, 2)
ds1 = xr.Dataset({'temperature': (['time'],  ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]})
ds1['temperature'] = encode_utf8(ds1.temperature, 2)
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
xr.open_zarr('temp').temperature.values

Even though this is doable if it is known in advance, we should definitely mention this in the documentation or fix this by fixing the encoding itself. What do you think?

Cheers,

Jendrik

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
498205860 https://github.com/pydata/xarray/pull/2706#issuecomment-498205860 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ5ODIwNTg2MA== jendrikjoe 9658781 2019-06-03T10:40:28Z 2019-06-03T10:40:28Z CONTRIBUTOR

Gave you the permissions @shikharsg

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479802142 https://github.com/pydata/xarray/pull/2706#issuecomment-479802142 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTgwMjE0Mg== jendrikjoe 9658781 2019-04-04T08:28:56Z 2019-04-04T08:28:56Z CONTRIBUTOR

Nice :+1:

On Apr 4, 2019 21:24, David Brochart notifications@github.com wrote:

Thanks @jendrikjoehttps://github.com/jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pydata/xarray/pull/2706#issuecomment-479800472, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJNhnai3TAdnmcRLsMnUXXRsMx7jcf3Vks5vdbakgaJpZM4aRxJT.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
479798342 https://github.com/pydata/xarray/pull/2706#issuecomment-479798342 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3OTc5ODM0Mg== jendrikjoe 9658781 2019-04-04T08:17:43Z 2019-04-04T08:17:43Z CONTRIBUTOR

I added you to the fork :) But feel free to do whatever is easiest for you :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
478399527 https://github.com/pydata/xarray/pull/2706#issuecomment-478399527 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ3ODM5OTUyNw== jendrikjoe 9658781 2019-04-01T00:19:11Z 2019-04-01T00:19:11Z CONTRIBUTOR

Sure everyone feel welcome to join in! Sorry for the long silence. Kind of a busy time right now 😉

On Apr 1, 2019 08:47, Ryan Abernathey notifications@github.com wrote:

@davidbrocharthttps://github.com/davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoehttps://github.com/jendrikjoe would not mind if we make it a team effort!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pydata/xarray/pull/2706#issuecomment-478374296, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJNhnVvW1PlIIRFRQ-nW6QM7gd5JMAi5ks5vcRDogaJpZM4aRxJT.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458896024 https://github.com/pydata/xarray/pull/2706#issuecomment-458896024 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODg5NjAyNA== jendrikjoe 9658781 2019-01-30T10:37:56Z 2019-01-30T10:37:56Z CONTRIBUTOR

I will check as well how xarry stores times to check if we have to add the offset to the xarray first or if this can be resolved with a PR to zarr :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458736067 https://github.com/pydata/xarray/pull/2706#issuecomment-458736067 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODczNjA2Nw== jendrikjoe 9658781 2019-01-29T22:39:00Z 2019-01-29T22:39:00Z CONTRIBUTOR

Hey @davidbrochart, thanks for all your input and as well for the resarch on how zarr stores the data. I would actually claim that the calculation of the accurate relative time should be handled by the zarr append function. An exception would be of course if xarray is storing the data with deltas to a reference as well? Then I would try collecting the minimum and offsetting the input by this. @rabernat can you provide input on that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
458694955 https://github.com/pydata/xarray/pull/2706#issuecomment-458694955 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1ODY5NDk1NQ== jendrikjoe 9658781 2019-01-29T20:29:05Z 2019-01-29T20:31:59Z CONTRIBUTOR

You are definitely right, that there are no checks regarding the alignment. However, if another shape than the append_dim does not align zarr will raise an error. If the coordinate differs that could be definitely an issue. I did not think about that as I am dumping reshaped dask.dataframe partitions with the append mode. Therefore, I am anyway not allowed to have a name twice. Might be interesting for other users indeed. Similar point for the attributes. I could try figuring that out as well, but that might take a while. The place where the ValueError is raised should allow to add other variables, as those are added in the KeyError exception above :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148
457827734 https://github.com/pydata/xarray/pull/2706#issuecomment-457827734 https://api.github.com/repos/pydata/xarray/issues/2706 MDEyOklzc3VlQ29tbWVudDQ1NzgyNzczNA== jendrikjoe 9658781 2019-01-26T12:35:28Z 2019-01-26T12:35:28Z CONTRIBUTOR

Hi @rabernat,

happy to help! I love using xarray. I added the test for the append mode. One is making sure, that it behaves like the 'w' mode, if no data exist at the target path. The other one is testing what you described. The append_dim argument is actually the same as the dim argument for concat. Hope that helps clarifying my code :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Appending to zarr store 402908148

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.654ms · About: xarray-datasette