html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/pull/2706#issuecomment-502827736,https://api.github.com/repos/pydata/xarray/issues/2706,502827736,MDEyOklzc3VlQ29tbWVudDUwMjgyNzczNg==,9658781,2019-06-17T19:56:23Z,2019-06-17T19:56:23Z,CONTRIBUTOR,"I build a filter that is raising a value error as soon as any variable has a dtype different from any subclass of np.number or np.string_. I as well build test for that and added a function to manually convert dynamic sized string arrays to fixed sized ones.
I as well wrote a test for @shikharsg issue and can reproduce it. The test is currently commented to not fail the pipeline as I wanted to discuss if this is a blocking issue or if we should merge it and raise a new issue for it.
It seems to be originating from the fact that we moved away from using writer.add and instead are actually calling the zarr functions directly. There should be a way to change this back to do it lazily, but that will probably take time.
What do you think?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-502754545,https://api.github.com/repos/pydata/xarray/issues/2706,502754545,MDEyOklzc3VlQ29tbWVudDUwMjc1NDU0NQ==,9658781,2019-06-17T16:24:46Z,2019-06-17T16:24:46Z,CONTRIBUTOR,"> @jendrikjoe - thanks for digging in and finding this important issue!
>
> This PR has been hanging around for a long time. (A lot of that is on me!) It would be good to get something merged soon. Here's what I propose.
>
> * Identify which datatypes can easily be appended now (e.g. floats, etc.) and which cannot (variable length strings)
>
> * Raise an error if append is called on the incompatible datatypes
>
> * Move forward with this PR, which is otherwise very nearly ready
>
> * Open a new issue to keep track of the outstanding incompatible types, which require upstream resolution in zarr
>
>
> How does that sound to everyone?
This sounds like a plan. I will try to work on getting this ready tonight and tmrw.
Let us see how far I can get.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-502481584,https://api.github.com/repos/pydata/xarray/issues/2706,502481584,MDEyOklzc3VlQ29tbWVudDUwMjQ4MTU4NA==,9658781,2019-06-16T20:05:04Z,2019-06-16T20:23:54Z,CONTRIBUTOR,"Hey there everyone, sorry for not working on this for so long from my side. I just picked it up again and realised that the way the encoding works, all the datatypes and the maximum string lengths in the first xarray have to be representative for all others. Otherwise the following cuts away every char after the second:
ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds1 = xr.Dataset({'temperature': (['time'], ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
It is solvable when explicitly setting the type before writing:
ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds0['temperature'] = ds0.temperature.astype(np.dtype('S5'))
ds1 = xr.Dataset({'temperature': (['time'], ['abc', 'def', 'ghijk'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
It becomes however worse when using non-ascii characters, as they get encoded in [zarr.py l:218](https://github.com/pydata/xarray/blob/442e938c2c5dcc0f192f0db2348cd679d07c16cb/xarray/backends/zarr.py#L218), but with the next chunk that is coming in the check in [conventions.py l:86](https://github.com/pydata/xarray/blob/442e938c2c5dcc0f192f0db2348cd679d07c16cb/xarray/conventions.py#L86) fails. So I think we actually have to resolve the the TODO in [zarr.py l:215](https://github.com/pydata/xarray/blob/442e938c2c5dcc0f192f0db2348cd679d07c16cb/xarray/backends/zarr.py#L215) before this is able to be merged. Otherwise, the following leads to multiple issues:
ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds1 = xr.Dataset({'temperature': (['time'], ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]})
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
xr.open_zarr('temp').temperature.values
The only way to work around this issue is to explicitly encode the data beforehand to utf-8:
from xarray.coding.variables import safe_setitem, unpack_for_encoding
from xarray.coding.strings import encode_string_array
from xarray.core.variable import Variable
def encode_utf8(var, string_max_length):
dims, data, attrs, encoding = unpack_for_encoding(var)
safe_setitem(attrs, '_Encoding', 'utf-8')
data = encode_string_array(data, 'utf-8')
data = data.astype(np.dtype(f""S{string_max_length*2}""))
return Variable(dims, data, attrs, encoding)
ds0 = xr.Dataset({'temperature': (['time'], ['ab', 'cd', 'ef'])}, coords={'time': [0, 1, 2]})
ds0['temperature'] = encode_utf8(ds0.temperature, 2)
ds1 = xr.Dataset({'temperature': (['time'], ['üý', 'ãä', 'õö'])}, coords={'time': [0, 1, 2]})
ds1['temperature'] = encode_utf8(ds1.temperature, 2)
ds0.to_zarr('temp')
ds1.to_zarr('temp', mode='a', append_dim='time')
xr.open_zarr('temp').temperature.values
Even though this is doable if it is known in advance, we should definitely mention this in the documentation or fix this by fixing the encoding itself. What do you think?
Cheers,
Jendrik","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-498205860,https://api.github.com/repos/pydata/xarray/issues/2706,498205860,MDEyOklzc3VlQ29tbWVudDQ5ODIwNTg2MA==,9658781,2019-06-03T10:40:28Z,2019-06-03T10:40:28Z,CONTRIBUTOR,Gave you the permissions @shikharsg ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/issues/2927#issuecomment-489142071,https://api.github.com/repos/pydata/xarray/issues/2927,489142071,MDEyOklzc3VlQ29tbWVudDQ4OTE0MjA3MQ==,9658781,2019-05-03T15:46:41Z,2019-05-03T15:46:41Z,CONTRIBUTOR,"I am seeing the same behaviour.
For me, writing with a prepended s3:// is no issue though (even appending to zarr works when using the related PR).
Only opening cannot cope with the prepended s3://.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,438166604
https://github.com/pydata/xarray/pull/2706#issuecomment-479802142,https://api.github.com/repos/pydata/xarray/issues/2706,479802142,MDEyOklzc3VlQ29tbWVudDQ3OTgwMjE0Mg==,9658781,2019-04-04T08:28:56Z,2019-04-04T08:28:56Z,CONTRIBUTOR,"Nice :+1:
On Apr 4, 2019 21:24, David Brochart wrote:
Thanks @jendrikjoe, I just pushed to your fork: to make sure that the encoding of the appended variables is compatible with the target store, we explicitly put the target store encodings in the appended variable.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-479798342,https://api.github.com/repos/pydata/xarray/issues/2706,479798342,MDEyOklzc3VlQ29tbWVudDQ3OTc5ODM0Mg==,9658781,2019-04-04T08:17:43Z,2019-04-04T08:17:43Z,CONTRIBUTOR,I added you to the fork :) But feel free to do whatever is easiest for you :) ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-478399527,https://api.github.com/repos/pydata/xarray/issues/2706,478399527,MDEyOklzc3VlQ29tbWVudDQ3ODM5OTUyNw==,9658781,2019-04-01T00:19:11Z,2019-04-01T00:19:11Z,CONTRIBUTOR,"Sure everyone feel welcome to join in! Sorry for the long silence. Kind of a busy time right now 😉
On Apr 1, 2019 08:47, Ryan Abernathey wrote:
@davidbrochart I would personally be happy to see anyone work on this. I'm sure @jendrikjoe would not mind if we make it a team effort!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-458896024,https://api.github.com/repos/pydata/xarray/issues/2706,458896024,MDEyOklzc3VlQ29tbWVudDQ1ODg5NjAyNA==,9658781,2019-01-30T10:37:56Z,2019-01-30T10:37:56Z,CONTRIBUTOR,I will check as well how xarry stores times to check if we have to add the offset to the xarray first or if this can be resolved with a PR to zarr :) ,"{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-458736067,https://api.github.com/repos/pydata/xarray/issues/2706,458736067,MDEyOklzc3VlQ29tbWVudDQ1ODczNjA2Nw==,9658781,2019-01-29T22:39:00Z,2019-01-29T22:39:00Z,CONTRIBUTOR,"Hey @davidbrochart,
thanks for all your input and as well for the resarch on how zarr stores the data.
I would actually claim that the calculation of the accurate relative time should be handled by the zarr append function.
An exception would be of course if xarray is storing the data with deltas to a reference as well?
Then I would try collecting the minimum and offsetting the input by this.
@rabernat can you provide input on that?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-458694955,https://api.github.com/repos/pydata/xarray/issues/2706,458694955,MDEyOklzc3VlQ29tbWVudDQ1ODY5NDk1NQ==,9658781,2019-01-29T20:29:05Z,2019-01-29T20:31:59Z,CONTRIBUTOR,"You are definitely right, that there are no checks regarding the alignment.
However, if another shape than the append_dim does not align zarr will raise an error.
If the coordinate differs that could be definitely an issue. I did not think about that as I am dumping reshaped dask.dataframe partitions with the append mode. Therefore, I am anyway not allowed to have a name twice. Might be interesting for other users indeed. Similar point for the attributes. I could try figuring that out as well, but that might take a while.
The place where the ValueError is raised should allow to add other variables, as those are added in the KeyError exception above :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/pull/2706#issuecomment-457827734,https://api.github.com/repos/pydata/xarray/issues/2706,457827734,MDEyOklzc3VlQ29tbWVudDQ1NzgyNzczNA==,9658781,2019-01-26T12:35:28Z,2019-01-26T12:35:28Z,CONTRIBUTOR,"Hi @rabernat,
happy to help! I love using xarray. I added the test for the append mode.
One is making sure, that it behaves like the 'w' mode, if no data exist at the target path.
The other one is testing what you described. The append_dim argument is actually the same as the dim argument for concat.
Hope that helps clarifying my code :)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,402908148
https://github.com/pydata/xarray/issues/2689#issuecomment-456201092,https://api.github.com/repos/pydata/xarray/issues/2689,456201092,MDEyOklzc3VlQ29tbWVudDQ1NjIwMTA5Mg==,9658781,2019-01-21T21:17:02Z,2019-01-21T21:17:02Z,CONTRIBUTOR,"Okay will have a look at #1887 first, before going forward with this request :) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,400678252
https://github.com/pydata/xarray/issues/2689#issuecomment-456128567,https://api.github.com/repos/pydata/xarray/issues/2689,456128567,MDEyOklzc3VlQ29tbWVudDQ1NjEyODU2Nw==,9658781,2019-01-21T16:21:29Z,2019-01-21T16:21:29Z,CONTRIBUTOR,"Hey Shoyer,
sure I am happy to propose one.
Given the input from the xarray example page (http://xarray.pydata.org/en/stable/examples/weather-data.html), I would imagine something like this:
```python
xarr = xarr.loc[xarr['tmin'] > 5]
```
If the DataArray is one dimensional this is straight forward to achieve by altering the _LocIndexer in the following way:
```
class _LocIndexer(object):
def __init__(self, dataset):
self.dataset = dataset
def __getitem__(self, key):
if not utils.is_dict_like(key):
selector = {dim: key[dim][key] for dim in key.dims}
keep_vars = []
for var in self.dataset.data_vars:
if np.all(dim in self.dataset[var].dims for dim in key.dims):
keep_vars.append(var)
return self.dataset[keep_vars].sel(selector)
return self.dataset.sel(**key)
```
This does not work for higher dimensions though as 2-dimensional boolean indexing is not supported.
It would as well get rid of all other DataArrarys which do not have shared dimensions with the indexer.
Probably, there is a better place to do this, that in the loc function. However, I think it would be great in case people need to filter their data by something else than the array dimensions.
Cheers,
Jendrik","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,400678252