home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

40 rows where author_association = "COLLABORATOR" and user = 35919497 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 24

  • Zarr chunking fixes 6
  • Flexible backends - Harmonise zarr chunking with other backends chunking 4
  • Fix open_dataset regression 4
  • groupby beahaviour w.r.t. non principal coordinates 2
  • Feature Request: Hierarchical storage and processing in xarray 2
  • Error when rechunking from Zarr store 2
  • Group together decoding options into a single argument 2
  • Allow using a custom engine class directly in xr.open_dataset 2
  • DataArrayResample.interpolate coordinates out of bound. 1
  • deprecate pynio backend 1
  • Failing main branch — test_save_mfdataset_compute_false_roundtrip 1
  • Backends entrypoints 1
  • APIv2 internal cleanups 1
  • Fix warning on chunks compatibility 1
  • ImportError: module 'xarray.backends.*' has no attribute '*_backend' 1
  • open_dataset regression 1
  • error with cfgrib + eccodes 1
  • Better error message when no backend engine is found. 1
  • Suggesting specific IO backends to install when open_dataset() fails 1
  • xarray 0.18.0 raises ValueError, not FileNotFoundError, when opening a non-existent file 1
  • Kwargs to rasterio open 1
  • Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests 1
  • Allow to parse more backend kwargs to pydap backend 1
  • In backends, support expressing a dimension's preferred chunk sizes as a tuple of integers 1

user 1

  • aurghs · 40 ✖

author_association 1

  • COLLABORATOR · 40 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1091315261 https://github.com/pydata/xarray/pull/6334#issuecomment-1091315261 https://api.github.com/repos/pydata/xarray/issues/6334 IC_kwDOAMm_X85BDCY9 aurghs 35919497 2022-04-07T08:33:43Z 2022-04-07T08:37:33Z COLLABORATOR

Thank you for this fix! It looks good to me. I would prefer a separate function for the check, but that's fine too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  In backends, support expressing a dimension's preferred chunk sizes as a tuple of integers 1160073438
1041249397 https://github.com/pydata/xarray/pull/6276#issuecomment-1041249397 https://api.github.com/repos/pydata/xarray/issues/6276 IC_kwDOAMm_X84-EDR1 aurghs 35919497 2022-02-16T08:46:05Z 2022-02-16T08:46:24Z COLLABORATOR

I would prefer to avoid using **kwargs, the explicit list of parameters would make the code more readable. But I think it's fine that way too :)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow to parse more backend kwargs to pydap backend 1138440632
969122428 https://github.com/pydata/xarray/pull/5959#issuecomment-969122428 https://api.github.com/repos/pydata/xarray/issues/5959 IC_kwDOAMm_X845w6J8 aurghs 35919497 2021-11-15T17:08:38Z 2021-11-15T17:08:38Z COLLABORATOR

@alexamici Could you please have a final look into this?

Thank you @kmuehlbauer! LGTM

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix module name retrieval in `backend.plugins.remove_duplicates()`, plugin tests 1048309254
887324339 https://github.com/pydata/xarray/pull/5609#issuecomment-887324339 https://api.github.com/repos/pydata/xarray/issues/5609 IC_kwDOAMm_X840436z aurghs 35919497 2021-07-27T08:37:35Z 2021-07-27T08:38:30Z COLLABORATOR

I would try to stay as close to open_dataset as possible, which would make migrating to rioxarray's engine easier once we deprecate open_rasterio. If I understand the signature of open_dataset correctly, this is called backend_kwargs?

The idea was to deprecate in the future open_dataset backend_kwargs. Currently, in open_dataset signature, you can use either backend_kwargs or kwargs to pass the backend additional parameters. I would avoid adding backend_kwargs in open_rasterio interface.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Kwargs to rasterio open 945434599
843328294 https://github.com/pydata/xarray/pull/5300#issuecomment-843328294 https://api.github.com/repos/pydata/xarray/issues/5300 MDEyOklzc3VlQ29tbWVudDg0MzMyODI5NA== aurghs 35919497 2021-05-18T16:31:13Z 2021-05-18T16:31:13Z COLLABORATOR

That's perfect.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Better error message when no backend engine is found. 891253662
843101733 https://github.com/pydata/xarray/issues/5329#issuecomment-843101733 https://api.github.com/repos/pydata/xarray/issues/5329 MDEyOklzc3VlQ29tbWVudDg0MzEwMTczMw== aurghs 35919497 2021-05-18T11:48:37Z 2021-05-18T12:24:21Z COLLABORATOR

I think that It's not a bug: filename_or_obj in open_dataset can be a file, file-like, bytes, URL. The accepted inputs depend on the engine. So it doesn't make sense to raise a FileNotFoundError if the engine is not defined by the user or not automatically detected by xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray 0.18.0 raises ValueError, not FileNotFoundError, when opening a non-existent file 894125618
840820163 https://github.com/pydata/xarray/issues/5302#issuecomment-840820163 https://api.github.com/repos/pydata/xarray/issues/5302 MDEyOklzc3VlQ29tbWVudDg0MDgyMDE2Mw== aurghs 35919497 2021-05-13T20:39:14Z 2021-05-13T20:39:14Z COLLABORATOR

Me too, I was thinking about something like that to fix the error message.
Let me try to implement it. But I have really no time this week and the next one, sorry. I can do it after 23th if for you is ok.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Suggesting specific IO backends to install when open_dataset() fails 891281614
819654025 https://github.com/pydata/xarray/pull/5033#issuecomment-819654025 https://api.github.com/repos/pydata/xarray/issues/5033 MDEyOklzc3VlQ29tbWVudDgxOTY1NDAyNQ== aurghs 35919497 2021-04-14T16:31:52Z 2021-04-14T16:31:52Z COLLABORATOR

@Illviljan I see your point, but the subclassing doesn't add too much complexity and for consistency would be better to add a check on the class. After that, I think we can merge it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow using a custom engine class directly in xr.open_dataset 831008649
819571176 https://github.com/pydata/xarray/issues/5150#issuecomment-819571176 https://api.github.com/repos/pydata/xarray/issues/5150 MDEyOklzc3VlQ29tbWVudDgxOTU3MTE3Ng== aurghs 35919497 2021-04-14T14:40:34Z 2021-04-14T14:41:19Z COLLABORATOR

I can try to reproduce the error and fix it, but I need to know at least the xarray version.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  error with cfgrib + eccodes 856915051
819563132 https://github.com/pydata/xarray/pull/5135#issuecomment-819563132 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgxOTU2MzEzMg== aurghs 35919497 2021-04-14T14:29:59Z 2021-04-14T14:29:59Z COLLABORATOR

@bcbnz could you check if this fixes also #5132?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
818808309 https://github.com/pydata/xarray/pull/5135#issuecomment-818808309 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgxODgwODMwOQ== aurghs 35919497 2021-04-13T15:03:39Z 2021-04-13T15:03:39Z COLLABORATOR

@aurghs can you confirm that all failures are unrelated to the changes?

In my understanding, the errors are not related to the changes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
816517328 https://github.com/pydata/xarray/pull/5033#issuecomment-816517328 https://api.github.com/repos/pydata/xarray/issues/5033 MDEyOklzc3VlQ29tbWVudDgxNjUxNzMyOA== aurghs 35919497 2021-04-09T08:31:05Z 2021-04-09T08:31:27Z COLLABORATOR

Making a backend doesn't have to be super difficult either depending if you already have a nice 3rd party module you can thinly wrap to return a Dataset instead of whatever is the default

I agree. Adding a plugin is not really very difficult, but in some cases could be discouraging especially if you are just exploring how the backends work.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow using a custom engine class directly in xr.open_dataset 831008649
816050897 https://github.com/pydata/xarray/pull/5135#issuecomment-816050897 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgxNjA1MDg5Nw== aurghs 35919497 2021-04-08T18:37:46Z 2021-04-08T18:37:46Z COLLABORATOR

At this point, probably api.normalize_path is the best choice.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
816021103 https://github.com/pydata/xarray/pull/5135#issuecomment-816021103 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgxNjAyMTEwMw== aurghs 35919497 2021-04-08T17:52:46Z 2021-04-08T17:52:46Z COLLABORATOR

LGTM. I don't know how we would test this...

I don't have any good idea about it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
811810701 https://github.com/pydata/xarray/pull/5065#issuecomment-811810701 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTgxMDcwMQ== aurghs 35919497 2021-04-01T10:21:15Z 2021-04-01T11:01:44Z COLLABORATOR

python new_var = var.chunk(chunks, name=name2, lock=lock) new_var.encoding = var.encoding Here you are modifying _maybe_chunk, but _maybe_chunk is also used in Dataset.chunks. Probably would be better to change backend.api.py, here: https://github.com/pydata/xarray/blob/ddc352faa6de91f266a1749773d08ae8d6f09683/xarray/backends/api.py#L296-L307

But maybe also in this case we want to drop encoding["chunks"] if they are not compatible with dask ones.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811820701 https://github.com/pydata/xarray/issues/5098#issuecomment-811820701 https://api.github.com/repos/pydata/xarray/issues/5098 MDEyOklzc3VlQ29tbWVudDgxMTgyMDcwMQ== aurghs 35919497 2021-04-01T10:40:37Z 2021-04-01T10:41:44Z COLLABORATOR

I think that this is consequence of the refactor done by @alexamici when he has removed _normalize_path: https://github.com/pydata/xarray/pull/4701 We decided to demand the path interpretation to the backends.

I'll have a look to understand how to fix it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset regression 847014702
811818035 https://github.com/pydata/xarray/pull/5065#issuecomment-811818035 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTgxODAzNQ== aurghs 35919497 2021-04-01T10:35:29Z 2021-04-01T10:36:01Z COLLABORATOR

Hmm. I would also be happy with explicitly deleting chunks from encoding for now. It's not adding a lot of technical debt.

I see two reasons for keeping it: - We should be able to read and write the data with the same structure on disk. - The user may be interested in this information.

But it seems to me that having two different definitions of chunks (dask one and encoded one), is not very intuitive and it's not easy to define a clear default in writing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811209453 https://github.com/pydata/xarray/pull/5065#issuecomment-811209453 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTIwOTQ1Mw== aurghs 35919497 2021-03-31T16:27:05Z 2021-03-31T17:50:19Z COLLABORATOR

~rechunk~ Variable.chunk is used always when you open a data with dask, even if you are using the default chunking. So in this way, you will drop the encoding always when dask is used (≈ always).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811284237 https://github.com/pydata/xarray/pull/5065#issuecomment-811284237 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTI4NDIzNw== aurghs 35919497 2021-03-31T17:45:29Z 2021-03-31T17:49:17Z COLLABORATOR

Does it actually get called every time we load a dataset with chunks?

Yes

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
811199910 https://github.com/pydata/xarray/pull/5065#issuecomment-811199910 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgxMTE5OTkxMA== aurghs 35919497 2021-03-31T16:20:30Z 2021-03-31T16:31:32Z COLLABORATOR

Should the Zarr backend be setting this?

Yes, they are already defined in zarr: preferred_chunks=chunks. We decide to separate the chunks and the preferred_chunks: - The preferred_chunks is used by the backend to define the default chunks to be used by xarray. - The chunks are the on-disk chunks.

They are not necessarily the same. Maybe we can drop the preferred_chunks after they are used.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
808399567 https://github.com/pydata/xarray/pull/5065#issuecomment-808399567 https://api.github.com/repos/pydata/xarray/issues/5065 MDEyOklzc3VlQ29tbWVudDgwODM5OTU2Nw== aurghs 35919497 2021-03-26T17:34:44Z 2021-03-26T18:08:04Z COLLABORATOR

Perhaps we could remove also overwrite_encoded_chunks, it shouldn't be any more necessary.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr chunking fixes 837243943
808057690 https://github.com/pydata/xarray/issues/4118#issuecomment-808057690 https://api.github.com/repos/pydata/xarray/issues/4118 MDEyOklzc3VlQ29tbWVudDgwODA1NzY5MA== aurghs 35919497 2021-03-26T09:09:38Z 2021-03-26T09:09:38Z COLLABORATOR

We could also provide a use-case in remote sensing: it would be really useful in the interferometric processing for managing Sentinel-1 IW and EW SLC data, which has multiple tiles (burts) partially overlapping in one direction (azimuth).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
806403993 https://github.com/pydata/xarray/issues/4118#issuecomment-806403993 https://api.github.com/repos/pydata/xarray/issues/4118 MDEyOklzc3VlQ29tbWVudDgwNjQwMzk5Mw== aurghs 35919497 2021-03-25T06:41:09Z 2021-03-25T06:42:58Z COLLABORATOR

@alexamici and I can write the technical part of the proposal.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
801920057 https://github.com/pydata/xarray/issues/5053#issuecomment-801920057 https://api.github.com/repos/pydata/xarray/issues/5053 MDEyOklzc3VlQ29tbWVudDgwMTkyMDA1Nw== aurghs 35919497 2021-03-18T13:19:27Z 2021-03-18T16:09:17Z COLLABORATOR

Unfortunately, in a previous version, there were internals plugins for the backends. Now they have been removed, but you need to re-install xarray to remove the entrypoints.
There isn't a release with the internal plugins, so the users should not have this problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ImportError: module 'xarray.backends.*' has no attribute '*_backend' 834641104
786652376 https://github.com/pydata/xarray/issues/4491#issuecomment-786652376 https://api.github.com/repos/pydata/xarray/issues/4491 MDEyOklzc3VlQ29tbWVudDc4NjY1MjM3Ng== aurghs 35919497 2021-02-26T13:37:20Z 2021-02-26T13:37:20Z COLLABORATOR

If you want I can take care to move pynio to an external repository, but we should decide where.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  deprecate pynio backend 715730538
751489633 https://github.com/pydata/xarray/issues/4380#issuecomment-751489633 https://api.github.com/repos/pydata/xarray/issues/4380 MDEyOklzc3VlQ29tbWVudDc1MTQ4OTYzMw== aurghs 35919497 2020-12-27T16:43:56Z 2021-01-08T07:20:24Z COLLABORATOR

Does encoding['chunks'] serve any purpose after you've loaded a Zarr store and all the variables are defined as dask arrays?

No. I run into this frequently and it is annoying. @rabernat do you remember why you chose to keep chunks around in encoding

The encodings["chunks"] is used in to_zarr. It seems to be reasonable: I expect that I should be able to read and re-write a Zarr without modifying the chunking on disk. It seems to me that dask chunks are used in writing only when the encodings["chunks"] is not defined or they are not compatible anymore with variables shapes. In the other cases encodings["chunks"] is used. So if you want to use the encoded chunks, you have to be sure that they are still compatible with variables shapes and that each Zarr chunk is contained in only one dask chunk. If you want to use the dask chunks you can: - Delite the encoded chunking as done by @eric-czech. - Use encoding when you write: ds.to_zarr('/tmp/ds3.zarr', mode='w', encoding={'x': {}}).

Maybe this interface is a little bit confusing. Probably would be better to move overwrite_encoded_chunks from open_dataset to to_zarr. open_dataset interface would be cleaner and would be clear how to use dask chunks in writing.

Concerning the different chunking per variable, I link here this related issue: https://github.com/pydata/xarray/issues/4623

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when rechunking from Zarr store 686608969
751481163 https://github.com/pydata/xarray/issues/4380#issuecomment-751481163 https://api.github.com/repos/pydata/xarray/issues/4380 MDEyOklzc3VlQ29tbWVudDc1MTQ4MTE2Mw== aurghs 35919497 2020-12-27T15:32:10Z 2020-12-27T15:32:29Z COLLABORATOR

I'm not sure but ... It seems to be a bug this error. There is a check on the final chunk that it seems to have the wrong direction in the inequality. The part of the code to decide what's chunking should be used in case we have defined both, dask chunking and encoded chucking, is the following: https://github.com/pydata/xarray/blob/ac234619d5471e789b0670a673084dbb01df4f9e/xarray/backends/zarr.py#L141-L173 the aims of these checks, as described in the comment, is to avoid to have multiple dask chunks in one zarr chunk. According to this logic this inequality at line 163: https://github.com/pydata/xarray/blob/ac234619d5471e789b0670a673084dbb01df4f9e/xarray/backends/zarr.py#L163 has the wrong direction. It should be in this way: if dchunks[-1] < zchunk, but this last one seems to me that it is always verified.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Error when rechunking from Zarr store 686608969
750313634 https://github.com/pydata/xarray/pull/4726#issuecomment-750313634 https://api.github.com/repos/pydata/xarray/issues/4726 MDEyOklzc3VlQ29tbWVudDc1MDMxMzYzNA== aurghs 35919497 2020-12-23T14:02:54Z 2020-12-23T14:29:21Z COLLABORATOR

Btw & not for here. There are other warnings from the backends refactor. Would be nice if you could hunt them down and either fix or suppress them.

I was just doing that now. Some of them are fixed in https://github.com/pydata/xarray/pull/4728.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix warning on chunks compatibility 773717776
749649237 https://github.com/pydata/xarray/pull/4721#issuecomment-749649237 https://api.github.com/repos/pydata/xarray/issues/4721 MDEyOklzc3VlQ29tbWVudDc0OTY0OTIzNw== aurghs 35919497 2020-12-22T16:45:58Z 2020-12-22T16:45:58Z COLLABORATOR

This cleanup involves only apiv2. I merge it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  APIv2 internal cleanups 773048869
747391520 https://github.com/pydata/xarray/issues/2148#issuecomment-747391520 https://api.github.com/repos/pydata/xarray/issues/2148 MDEyOklzc3VlQ29tbWVudDc0NzM5MTUyMA== aurghs 35919497 2020-12-17T11:47:44Z 2020-12-17T11:47:44Z COLLABORATOR

I think this has been fixed at some point. it can be closed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby beahaviour w.r.t. non principal coordinates 324032926
735372002 https://github.com/pydata/xarray/issues/4496#issuecomment-735372002 https://api.github.com/repos/pydata/xarray/issues/4496 MDEyOklzc3VlQ29tbWVudDczNTM3MjAwMg== aurghs 35919497 2020-11-29T10:29:34Z 2020-11-29T10:29:34Z COLLABORATOR

@ravwojdyla I think that currently there is no way to do this. But it would be nice to have an interface that allows defining different chunks for each variable. The main problem that I see in implementing that is to keep the ´xr.open_dataset(... chunks=)´, ´ds.chunk´ and ´ds.chunks´ interfaces backwards compatible. Probably a new issue for that would be better since this refactor is already a little bit tricky and your proposal could be implemented separately.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible backends - Harmonise zarr chunking with other backends chunking 717410970
732208262 https://github.com/pydata/xarray/pull/4577#issuecomment-732208262 https://api.github.com/repos/pydata/xarray/issues/4577 MDEyOklzc3VlQ29tbWVudDczMjIwODI2Mg== aurghs 35919497 2020-11-23T14:47:45Z 2020-11-24T06:48:10Z COLLABORATOR
  • I have replaced entrypoints with pkg_resources. I can't see any drawback in this change, but only advantages, the main one is that we remove an external dependency.
  • I have added the tests.
  • I didn't remove the signature inspection, since we have already talked about it widely with @shoyer, and in the end, we decided to keep the inspection and add the checks on the signature to be sure that neither args nor *kwargs will be used.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Backends entrypoints 741714847
721590240 https://github.com/pydata/xarray/issues/4496#issuecomment-721590240 https://api.github.com/repos/pydata/xarray/issues/4496 MDEyOklzc3VlQ29tbWVudDcyMTU5MDI0MA== aurghs 35919497 2020-11-04T08:35:08Z 2020-11-04T09:22:01Z COLLABORATOR

@weiji14 Thank you very much for your feedback. I think we should align also xr.open_mfdataset. In the case of engine == zarr and chunk == -1 there is a UserWarning also in xr.open_dataset, but I think it should be removed.

Maybe we should evaluate for the future to integrate/use dask function dask.array.core.normalize_chunks (https://docs.dask.org/en/latest/array-api.html#dask.array.core.normalize_chunks) with the key previous_chunks (see comment https://github.com/pydata/xarray/pull/2530#discussion_r247352940) It could be particularly useful for (re-)chunking taking into account the previous chunks or the on-disk chunks, especially if the on-disk chunks are small.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible backends - Harmonise zarr chunking with other backends chunking 717410970
720785384 https://github.com/pydata/xarray/issues/4496#issuecomment-720785384 https://api.github.com/repos/pydata/xarray/issues/4496 MDEyOklzc3VlQ29tbWVudDcyMDc4NTM4NA== aurghs 35919497 2020-11-02T23:32:48Z 2020-11-03T09:28:48Z COLLABORATOR

I think we can keep talking here about xarray chunking interface.

It seems that the interface for chunking is a tricky problem in xarray. There are involved different interfaces already implemented: - dask: da.rechunk, da.from_array - xarray: xr.open_dataset - xarray: ds.chunk - xarray-zarr: xr.open_dataset(engine="zarr") (≈ xr.open_zarr)

They are similar, but there are some inconsistencies.

dask The allowed values for chunking in dask are: - dictionary (or tuple) - integers > 0 - -1: no chunking (along this dimension) - auto: allow the chunking (in this dimension) to accommodate ideal chunk sizes (default 128MiB)

The allowed values in the dictionary are: -1, auto, None (no change to the chunking along this dimension) Note: None isn't supported outside the dictionary. Note: If chunking along some dimension is not specified then the chunking along this dimension will not change (e.g. {} is equivalent to {0: None})

xarray: xr.open_dataset for all the engines != "zarr" It works as dask but also None is supported. If chunk is None then it doesn't use dask at all.

xarray: ds.chunk It works as dask but also None is supported. None is equivalent to a dictionary with all values None (and equivalent to the empty dictionary).

xarray: xr.open_dataset(engine="zarr") It works as dask except for: - None is supported. If chunk is None then it doesn't use dask at all. - If chunking along some dimension is not specified then encoded chunks are used. - auto is equivalent to the empty dictionary, encoded chunks are used. - auto inside the dictionary is passed on to dask and behaves as in dask.

Points to be discussed:

1) auto and {} The main problem is how to uniform dask and xarray-zarr.

Option 1 Maybe the encoded chunking provided by the backend can be seen just as the current on-disk data chunking. According to dask interface, if in a dictionary the chunks for some dimension are None or not defined, then the current chunking along that dimension doesn't change. From this perspective, we would have: - with auto it uses dask auto-chunking. - with -1 it uses dask but no chunking. - with {} it uses the backend encoded chunks (when available) for on-disk data (xr.open_dataset) and the current chunking for already opened datasets (ds.chunk)

Note: ds.chunk behavior would be unchanged Note: xr.open_dataset would be unchanged, except for engine="zarr", since currently the var.encodings["chunks"] is defined only by zarr.

Option 2 We could use a different new value for the encoded chunks (e.g.encoded TBC). Something like: open_dataset(chunks="encoded") open_dataset(chunks={"x": "encoded", "y": 10,...}) Both expressions could be supported. cons: - chunks="encoded": with zarr the user probably needs to specify always to use the encoded chunks. - chunks="encoded": the user must specify explicitly in the dictionary which dimension should be chunked with the encoded chunks, that's very inconvenient (but is it really used? @weiji14 do you have some idea about it?).

2) None chunks=None should produce the same result in xr.open_dataset and ds.rechunk.

@shoyer, @alexamici, @jhamman, @dcherian, @weiji14 suggestions are welcome

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible backends - Harmonise zarr chunking with other backends chunking 717410970
718346664 https://github.com/pydata/xarray/issues/4490#issuecomment-718346664 https://api.github.com/repos/pydata/xarray/issues/4490 MDEyOklzc3VlQ29tbWVudDcxODM0NjY2NA== aurghs 35919497 2020-10-29T04:07:46Z 2020-10-29T04:07:46Z COLLABORATOR

Taking into account the comments in this issue and the calls, I would propose this solution: https://github.com/pydata/xarray/pull/4547

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Group together decoding options into a single argument 715374721
716527698 https://github.com/pydata/xarray/issues/4539#issuecomment-716527698 https://api.github.com/repos/pydata/xarray/issues/4539 MDEyOklzc3VlQ29tbWVudDcxNjUyNzY5OA== aurghs 35919497 2020-10-26T12:56:00Z 2020-10-26T12:56:00Z COLLABORATOR

I've tried to replicate the error, but I couldn't.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Failing main branch — test_save_mfdataset_compute_false_roundtrip 729117202
706098129 https://github.com/pydata/xarray/issues/4496#issuecomment-706098129 https://api.github.com/repos/pydata/xarray/issues/4496 MDEyOklzc3VlQ29tbWVudDcwNjA5ODEyOQ== aurghs 35919497 2020-10-09T10:18:10Z 2020-10-09T10:18:10Z COLLABORATOR
  • The key value auto is redundant because it has the same behavior as {}, we could remove one of them.

That's not completely true. With no dask installed auto uses chunk=None, while {} raises an error. Probably it makes sense.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Flexible backends - Harmonise zarr chunking with other backends chunking 717410970
704607745 https://github.com/pydata/xarray/issues/4490#issuecomment-704607745 https://api.github.com/repos/pydata/xarray/issues/4490 MDEyOklzc3VlQ29tbWVudDcwNDYwNzc0NQ== aurghs 35919497 2020-10-06T23:35:30Z 2020-10-06T23:35:30Z COLLABORATOR

I agree, open_dataset() currently has a very long signature that should be changed. The interface you proposed is obviously clearer, but a class could give a false idea that all backends support all the decoding options listed in the class. I see two other alternatives: - Instead of a class we could use a dictionary. Pros 1, 2 and 3 would still hold. - With the interface proposed by @alexamici in #4309 the pros 2 and 3 would still hold and partially 1 (since the open_dataset interface would be greatly simplified).

For both these proposals, we would lose the autocompletion with the tab but, on the other hand, the user would be relieved of managing a class. Finally, I'm not sure that for the user it would be clear the separation between backend_kwargs and decode, since they both contain arguments that will be passed to the backend. Especially if the backend needs more specific decoding options that must be set in backend_kwargs. In this sense, #4309 seems less error-prone.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Group together decoding options into a single argument 715374721
393898567 https://github.com/pydata/xarray/issues/2197#issuecomment-393898567 https://api.github.com/repos/pydata/xarray/issues/2197 MDEyOklzc3VlQ29tbWVudDM5Mzg5ODU2Nw== aurghs 35919497 2018-06-01T14:30:46Z 2018-06-01T14:32:53Z COLLABORATOR

Also with oversampling we have the same problem (2007-02-02 02:00:00 is out of bound):

``` python

import numpy as np import pandas as pd import xarray as xr

time = np.arange('2007-01-01 00:00:00', '2007-02-02 00:00:00', dtype='datetime64[ns]') arr = xr.DataArray( np.arange(time.size), coords=[time,], dims=('time',), name='data' ) resampler = arr.resample(time='3h', base=2, label='right')

resampler DatetimeIndex(['2007-01-01 02:00:00', '2007-01-01 05:00:00', '2007-01-01 08:00:00', '2007-01-01 11:00:00', '2007-01-01 14:00:00', '2007-01-01 17:00:00', '2007-01-01 20:00:00', '2007-01-01 23:00:00', '2007-01-02 02:00:00', '2007-01-02 05:00:00', ... '2007-01-31 23:00:00', '2007-02-01 02:00:00', '2007-02-01 05:00:00', '2007-02-01 08:00:00', '2007-02-01 11:00:00', '2007-02-01 14:00:00', '2007-02-01 17:00:00', '2007-02-01 20:00:00', '2007-02-01 23:00:00', '2007-02-02 02:00:00'], dtype='datetime64[ns]', name='time', length=257, freq='3H') ``` The fix is really very easy, I can try to make pull request.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArrayResample.interpolate coordinates out of bound. 327591169
389935163 https://github.com/pydata/xarray/issues/2148#issuecomment-389935163 https://api.github.com/repos/pydata/xarray/issues/2148 MDEyOklzc3VlQ29tbWVudDM4OTkzNTE2Mw== aurghs 35919497 2018-05-17T16:52:16Z 2018-05-17T17:27:07Z COLLABORATOR

The coordinates are grouped correctly: ```python list(arr.groupby('x'))

[(1, <xarray.DataArray (x: 3)> array([1., 1., 1.]) Coordinates: * x (x) int64 1 1 1 x2 (x) int64 1 2 3), (2, <xarray.DataArray (x: 2)> array([1., 1.]) Coordinates: * x (x) int64 2 2 x2 (x) int64 4 5)] ``` I think the grouping make sense. But once the groups are collapsed with some operation, I'm not sure that can be found a corresponding meaningful operation to apply to the grouped coordinates.

In the following cases the mean after gourpby() works as expected:

``` arr = xr.DataArray( np.ones(5), dims=('x',), coords={ 'x': ('x', np.array([1, 1, 1, 2, 2])), 'x1': ('x', np.array([1, 1, 1, 2, 2])), 'x2': ('x', np.array([1, 2, 3, 4, 5])),

}

)

arr.groupby('x1').mean('x')

<xarray.DataArray (x1: 2)> array([1., 1.]) Coordinates: * x1 (x1) int64 1 2

arr.groupby((xr.DataArray([1,1,1,2,2], dims=('x'), name='x3'))).mean('x')

<xarray.DataArray (x3: 2)> array([1., 1.]) Coordinates: * x3 (x3) int64 1 2 Also also if I try to group with an array named with as the dimension along which we perform the mean, I get the same problem:python arr.groupby(xr.DataArray([1,1,1,2,2], dims=('x'), name='x')).mean('x')

<xarray.DataArray (x: 2)> array([1., 1.]) Coordinates: * x (x) int64 1 2 x1 (x) int64 1 1 1 2 2 x2 (x) int64 1 2 3 4 5 ```

If I try to use an other dimension name we obtain again an strange behaviour: ```python arr = xr.DataArray( np.ones((5, 2)), dims=('x', 'y'), coords={ 'x': ('x', np.array([1, 1, 1, 2, 2])), 'x1': ('x', np.array([1, 1, 1, 2, 2])), 'x2': ('x', np.array([1, 2, 3, 4, 5])), } )

arr.groupby(xr.DataArray([1,1,1,2,2], dims=('x'), name='y')).mean('x')

<xarray.DataArray (y: 4)> array([1., 1., 1., 1.]) Coordinates: * y (y) int64 1 2

``` In this case probably it should raise an error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby beahaviour w.r.t. non principal coordinates 324032926

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1283.531ms · About: xarray-datasette