home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

591 rows where user = 35968931 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Generalize handling of chunked array types 17
  • release v0.18.0 15
  • Feature Request: Hierarchical storage and processing in xarray 14
  • Feature: N-dimensional auto_combine 13
  • API for N-dimensional combine 13
  • Concatenate across multiple dimensions with open_mfdataset 10
  • Feature Proposal: `xarray.interactive` module 10
  • Add histogram method 10
  • Allow xr.combine_by_coords to work with point coordinates? 8
  • Awkward array backend? 8
  • Xarray combine_by_coords return the monotonic global index error 7
  • combine_by_coords can succed when it shouldn't 7
  • Automatic duck array testing - reductions 7
  • Rely on NEP-18 to dispatch to dask in duck_array_ops 7
  • Duck array compatibility meeting 7
  • Hypothesis strategies in xarray.testing.strategies 7
  • list available backends and basic descriptors 7
  • Periodic Boundary Index 7
  • Add to_numpy() and as_numpy() methods 6
  • Recommended way to extend xarray Datasets using accessors? 5
  • New inline_array kwarg for open_dataset 5
  • Import datatree in xarray? 5
  • [WIP] Feature: Animated 1D plots 4
  • convert DataArray to DataSet before combine 4
  • xr.combine_nested() fails when passed nested DataSets 4
  • Add option to choose mfdataset attributes source. 4
  • Plots get labels from pint arrays 4
  • Release v0.19? 4
  • Release v0.20? 4
  • Update minimum dependencies for 0.20 4
  • …

user 1

  • TomNicholas · 591 ✖

author_association 1

  • MEMBER 591
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1561543105 https://github.com/pydata/xarray/issues/7870#issuecomment-1561543105 https://api.github.com/repos/pydata/xarray/issues/7870 IC_kwDOAMm_X85dE0HB TomNicholas 35968931 2023-05-24T16:31:30Z 2023-05-24T16:31:30Z MEMBER

Thanks for raising this @vhaasteren ! We want to do what we can to support users from all fields of science :)

I would be okay with that change (especially as it's not really special-casing pint-pulsar, so much as generalizing an existing error-catching mechanism), but would defer to the opinion of @keewis on this.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Name collision with Pulsar Timing package 'PINT'  1722614979
1561504841 https://github.com/pydata/xarray/issues/7856#issuecomment-1561504841 https://api.github.com/repos/pydata/xarray/issues/7856 IC_kwDOAMm_X85dEqxJ TomNicholas 35968931 2023-05-24T16:16:41Z 2023-05-24T16:26:15Z MEMBER

Solution for those who just found this issue:

Just re-install xarray. pip install -e . is sufficient. Re-installing any way through pip/conda should register the dask chunkmanager entrypoint.


@Illviljan I brought this up in the xarray team call today and we decided that since this only affects people who have previously cloned the xarray repository, are using a development install, and then updated by pulling changes from main; this problem only affects maybe ~10-20 people worldwide, all of whom are developers who are equipped to quickly solve it.

I'm going to add a note into the what's new entry for this version now - if you think we need to do more then let me know.

EDIT: I added a note to whatsnew in https://github.com/pydata/xarray/commit/69445c62953958488a6b35fafd8b9cfd6c0374a5, and updated the release notes.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unrecognized chunk manager dask - must be one of: [] 1718410975
1556201913 https://github.com/pydata/xarray/issues/7856#issuecomment-1556201913 https://api.github.com/repos/pydata/xarray/issues/7856 IC_kwDOAMm_X85cwcG5 TomNicholas 35968931 2023-05-21T15:04:05Z 2023-05-21T15:04:05Z MEMBER

The only reason I didn't separate the chunkmanager entry points into local and other entry points was simplicity of code.

I didn't realise that might make a difference when it came to whether or not you have to pip install - I assumed that adding a new type of entry point would require re-installing no matter how I implemented it. If that's not the case perhaps we should adjust it (and re-release).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unrecognized chunk manager dask - must be one of: [] 1718410975
1556191719 https://github.com/pydata/xarray/issues/7856#issuecomment-1556191719 https://api.github.com/repos/pydata/xarray/issues/7856 IC_kwDOAMm_X85cwZnn TomNicholas 35968931 2023-05-21T14:19:27Z 2023-05-21T14:35:22Z MEMBER

Yes, but I'm wondering what functional difference is that making here?

Have you tried doing the local pip install of the xarray dev version again? I.e. pip install -e . from the xarray folder.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unrecognized chunk manager dask - must be one of: [] 1718410975
1556182520 https://github.com/pydata/xarray/issues/7856#issuecomment-1556182520 https://api.github.com/repos/pydata/xarray/issues/7856 IC_kwDOAMm_X85cwXX4 TomNicholas 35968931 2023-05-21T13:36:22Z 2023-05-21T13:36:22Z MEMBER

Hmm, it's acting as if dask is not installed/importable. Any idea what's different about your setup vs the xarray CI?

Yes daskmanager is also registered via a different entry point, but that should already be set up to happen by default.

To see which chunk managers it can find you can call

```python from xarray.core.parallelcompat import list_chunkmanagers

list_chunkmanagers() ```

I expect it will return an empty list in your case, but that's the code we should be trying to debug on your system.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unrecognized chunk manager dask - must be one of: [] 1718410975
1553614542 https://github.com/pydata/xarray/issues/7848#issuecomment-1553614542 https://api.github.com/repos/pydata/xarray/issues/7848 IC_kwDOAMm_X85cmkbO TomNicholas 35968931 2023-05-18T20:36:35Z 2023-05-18T21:07:33Z MEMBER

np.pad is an interesting example in the context of chunked arrays (xref #6807) - dask implements it (parallelized using various approaches, including map_blocks internally), but cubed currently doesn't implement it because it's not part of the array API standard. (cc @tomwhite)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Compatibility with the Array API standard  1716228662
1553588837 https://github.com/pydata/xarray/pull/7815#issuecomment-1553588837 https://api.github.com/repos/pydata/xarray/issues/7815 IC_kwDOAMm_X85cmeJl TomNicholas 35968931 2023-05-18T20:10:43Z 2023-05-18T20:10:43Z MEMBER

Closing in favour of #7847

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array API fixes for astype 1695244129
1553395594 https://github.com/pydata/xarray/pull/7019#issuecomment-1553395594 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85clu-K TomNicholas 35968931 2023-05-18T17:37:22Z 2023-05-18T17:37:22Z MEMBER

Woooo thanks @dcherian !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1550564976 https://github.com/pydata/xarray/pull/7019#issuecomment-1550564976 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85ca75w TomNicholas 35968931 2023-05-17T01:39:08Z 2023-05-17T01:39:08Z MEMBER

@Illviljan thanks for all your comments!

Would you (or @keewis?) be willing to approve this PR now? I would really like to merge this so that I can release a version of xarray that I can use as a dependency for cubed-xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1540307585 https://github.com/pydata/xarray/pull/7815#issuecomment-1540307585 https://api.github.com/repos/pydata/xarray/issues/7815 IC_kwDOAMm_X85bzzqB TomNicholas 35968931 2023-05-09T14:58:41Z 2023-05-09T14:58:41Z MEMBER

Just merged in https://github.com/pydata/xarray/pull/7820, which should hopefully fix this :crossed_fingers:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array API fixes for astype 1695244129
1535108194 https://github.com/pydata/xarray/issues/7813#issuecomment-1535108194 https://api.github.com/repos/pydata/xarray/issues/7813 IC_kwDOAMm_X85bf-Ri TomNicholas 35968931 2023-05-04T17:07:15Z 2023-05-04T17:07:15Z MEMBER

If you hover over a node in the SVG representation you'll get a tooltip that shows the call stack and the line number of the top-level user function that invoked the computation. Does that help at all?

That's neat!

When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on

I realise now that this is not true - but can we make it true for cubed in xarray? Using cubed with xarray creates array's with names like array-002, but couldn't we use the dataarray's .name attribute to give this node of the graph the name "U" for example?

BTW should this be moved to a cubed issue?

I raised it here because it relates to an unfinished part of #7019 - where there is still dask-specific logic for naming individual tasks. I think that to solve this we will need to alter xarray code to allow ChunkManager objects to decide how they want to name their tasks, but using information passed from xarray (i.e. DataArray.name).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Task naming for general chunkmanagers 1694956396
1534087275 https://github.com/pydata/xarray/pull/7019#issuecomment-1534087275 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85bcFBr TomNicholas 35968931 2023-05-04T04:41:22Z 2023-05-04T04:41:22Z MEMBER

(Okay now the failures are from https://github.com/pydata/xarray/pull/7815 which I've separated out, and from https://github.com/pydata/xarray/pull/7561 being recently merged into main which is definitely not my fault :sweat_smile: https://github.com/pydata/xarray/pull/7019/commits/316c63d55f4e2c317b028842f752a40596f16c6d shows that this PR passes the tests by itself.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1531992793 https://github.com/pydata/xarray/pull/7019#issuecomment-1531992793 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85bUFrZ TomNicholas 35968931 2023-05-02T18:58:23Z 2023-05-02T19:01:20Z MEMBER

I would like to merge this now please! It works, it passes the tests, including mypy.

The main feature not in this PR is using parallel=True with open_mfdataset, which is still coupled to dask.delayed - I made #7811 to track that so I could get this PR merged.

If we merge this I can start properly testing cubed with xarray (in cubed-xarray).

@shoyer @dcherian if one of you could merge this or otherwise tell me anything else you think is still required!

{
    "total_count": 4,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 2,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1530111638 https://github.com/pydata/xarray/pull/7799#issuecomment-1530111638 https://api.github.com/repos/pydata/xarray/issues/7799 IC_kwDOAMm_X85bM6aW TomNicholas 35968931 2023-05-01T19:30:05Z 2023-05-01T19:30:05Z MEMBER

I was not aware of https://github.com/pydata/xarray/issues/6894, which is definitely my bad for not searching properley before setting off smile

No worries! :grin:

It looks like the changes I'm proposing here are probably orthogonal to work in https://github.com/pydata/xarray/issues/6894 though?

I think generally yes they are, I agree.

the goal of this PR is to generalise the existing unit testing to make it a bit easier to run tests with different unit libraries

Any work that helps generalise xarray's support of units beyond specifically just pint is going to be useful!

My main point to draw your attention to is the idea that eventually, one-day, it would be nice to move all array-library specific testing out of the xarray core repo in favour of an approach similar to that proposed in #6894.

I think that testing for unit libraries is a bit less general than the duck array testing stuff, because there's a host of extra information you need to be a unit library compared to a general duck array.

This is also true. Maybe that means for example the base class you are writing here has a long-term future as an optional part of xarray's testing framework in #6894, specifically for use when testing units libraries? Just thinking out loud

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Start making unit testing more general 1690019325
1529785512 https://github.com/pydata/xarray/issues/7515#issuecomment-1529785512 https://api.github.com/repos/pydata/xarray/issues/7515 IC_kwDOAMm_X85bLqyo TomNicholas 35968931 2023-05-01T14:40:39Z 2023-05-01T14:40:39Z MEMBER

Just wanted to drop in and remind people interested in this that we hold a bi-weekly pangeo working group for distributed array computing, which is the perfect place to come and ask about any questions over zoom! I'll be there at 1pm EST today.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Aesara as an array backend in Xarray 1575494367
1529775846 https://github.com/pydata/xarray/pull/7799#issuecomment-1529775846 https://api.github.com/repos/pydata/xarray/issues/7799 IC_kwDOAMm_X85bLobm TomNicholas 35968931 2023-05-01T14:28:24Z 2023-05-01T14:28:24Z MEMBER

Hi @dstansby, thanks for taking initiative on this! Supporting other units-aware packages would be awesome.

Are you aware of our efforts around https://github.com/pydata/xarray/issues/6894? The idea there was to create a general framework for downstream testing of duck-array libraries, including any implementations of units.

I think the ideas you are proposing here are useful and important, but we should probably discuss what we want the end state of duck-array test suites to look like.

cc @keewis

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Start making unit testing more general 1690019325
1527695510 https://github.com/pydata/xarray/pull/5704#issuecomment-1527695510 https://api.github.com/repos/pydata/xarray/issues/5704 IC_kwDOAMm_X85bDsiW TomNicholas 35968931 2023-04-28T14:57:54Z 2023-04-28T14:57:54Z MEMBER

For the benefit of anyone else reading this having come from https://github.com/pydata/xarray/issues/7792 or similar questions - see https://github.com/pydata/xarray/issues/4628 and https://github.com/pydata/xarray/issues/5081 to see what needs to be done. Also see discussion in https://github.com/pydata/xarray/issues/6807 for non-dask lazy backends.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow in-memory arrays with open_mfdataset 970245117
1524029516 https://github.com/pydata/xarray/issues/7764#issuecomment-1524029516 https://api.github.com/repos/pydata/xarray/issues/7764 IC_kwDOAMm_X85a1thM TomNicholas 35968931 2023-04-26T20:50:59Z 2023-04-26T20:50:59Z MEMBER

I support this (seems just like what we do for bottleneck) but maybe don't use the word backend for the kwarg again :sweat_smile: In fact as we're only talking about one function could our kwarg literally point to that function? i.e.

python def dot(..., einsum_func=np.einsum): ....

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support opt_einsum in xr.dot 1672288892
1516802286 https://github.com/pydata/xarray/issues/7772#issuecomment-1516802286 https://api.github.com/repos/pydata/xarray/issues/7772 IC_kwDOAMm_X85aaJDu TomNicholas 35968931 2023-04-20T18:58:48Z 2023-04-20T18:58:48Z MEMBER

Thanks for raising this @dabhicusp !

So why have that if block at line 396?

Because xarray can wrap many different type of numpy-like arrays, and for some of those types then the self.size * self.dtype.itemsize approach may not return the correct size. Think of a sparse matrix for example - its size in memory is designed to be much smaller than the size of the matrix would suggest. That's why in general we defer to the underlying array itself to tell us its size if it can (i.e. if it has a .nbytes attribute).

But you're not using an unusual type of array, you're just opening a netCDF file as a numpy array, in theory lazily. The memory usage you're seeing is not desired, so something weird must be happening in the .nbytes call. Going deeper into the stack at that point would be helpful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Process getting killed due to high memory consumption of xarray's nbytes method 1676561243
1514925779 https://github.com/pydata/xarray/issues/7767#issuecomment-1514925779 https://api.github.com/repos/pydata/xarray/issues/7767 IC_kwDOAMm_X85aS-7T TomNicholas 35968931 2023-04-19T15:20:11Z 2023-04-19T15:20:31Z MEMBER

So while xr.where(cond, x, y) is semantically, "where condition is true, x, else y", da.where(cond, x) is "where condition is true da, else x".

Adding this description to both docstrings would be a helpful clarification IMO.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistency between xr.where() and da.where() 1674532233
1499791533 https://github.com/pydata/xarray/pull/7019#issuecomment-1499791533 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85ZZQCt TomNicholas 35968931 2023-04-07T00:32:47Z 2023-04-07T00:59:03Z MEMBER

I'm having problems with ensuring the behaviour of the chunks='auto' option is consistent between .chunk and open_dataset

Update on this rabbit hole: This commit to dask changed the behaviour of dask's auto-chunking logic, such that if I run my little test script test_old_get_chunk.py on dask releases before and after that commit I get different chunking patterns:

```python from xarray.core.variable import IndexVariable from dask.array.core import normalize_chunks # import the import itertools from numbers import Number import dask import dask.array as da import xarray as xr import numpy as np # This function is copied from xarray, but calls dask.array.core.normalize_chunks # It is used in open_dataset, but not in Dataset.chunk def _get_chunk(var, chunks): """ Return map from each dim to chunk sizes, accounting for backend's preferred chunks. """ if isinstance(var, IndexVariable): return {} dims = var.dims shape = var.shape # Determine the explicit requested chunks. preferred_chunks = var.encoding.get("preferred_chunks", {}) preferred_chunk_shape = tuple( preferred_chunks.get(dim, size) for dim, size in zip(dims, shape) ) if isinstance(chunks, Number) or (chunks == "auto"): chunks = dict.fromkeys(dims, chunks) chunk_shape = tuple( chunks.get(dim, None) or preferred_chunk_sizes for dim, preferred_chunk_sizes in zip(dims, preferred_chunk_shape) ) chunk_shape = normalize_chunks( chunk_shape, shape=shape, dtype=var.dtype, previous_chunks=preferred_chunk_shape ) # Warn where requested chunks break preferred chunks, provided that the variable # contains data. if var.size: for dim, size, chunk_sizes in zip(dims, shape, chunk_shape): try: preferred_chunk_sizes = preferred_chunks[dim] except KeyError: continue # Determine the stop indices of the preferred chunks, but omit the last stop # (equal to the dim size). In particular, assume that when a sequence # expresses the preferred chunks, the sequence sums to the size. preferred_stops = ( range(preferred_chunk_sizes, size, preferred_chunk_sizes) if isinstance(preferred_chunk_sizes, Number) else itertools.accumulate(preferred_chunk_sizes[:-1]) ) # Gather any stop indices of the specified chunks that are not a stop index # of a preferred chunk. Again, omit the last stop, assuming that it equals # the dim size. breaks = set(itertools.accumulate(chunk_sizes[:-1])).difference( preferred_stops ) if breaks: warnings.warn( "The specified Dask chunks separate the stored chunks along " f'dimension "{dim}" starting at index {min(breaks)}. This could ' "degrade performance. Instead, consider rechunking after loading." ) return dict(zip(dims, chunk_shape)) chunks = 'auto' encoded_chunks = 100 dask_arr = da.from_array( np.ones((500, 500), dtype="float64"), chunks=encoded_chunks ) var = xr.core.variable.Variable(data=dask_arr, dims=['x', 'y']) with dask.config.set({"array.chunk-size": "1MiB"}): chunks_suggested = _get_chunk(var, chunks) print(chunks_suggested) ```

python (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ git checkout 2022.9.2 Previous HEAD position was 7fe622b44 Add docs on running Dask in a standalone Python script (#9513) HEAD is now at 3ef47422b bump version to 2022.9.2 (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ python ../experimentation/bugs/auto_chunking/test_old_get_chunk.py {'x': (362, 138), 'y': (362, 138)} (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ git checkout 2022.9.1 Previous HEAD position was 3ef47422b bump version to 2022.9.2 HEAD is now at b944abf68 bump version to 2022.9.1 (cubed) tom@tom-XPS-9315:~/Documents/Work/Code/dask$ python ../experimentation/bugs/auto_chunking/test_old_get_chunk.py {'x': (250, 250), 'y': (250, 250)} (I was absolutely tearing my hair out trying to find this bug, because after the change normalize_chunks became a pure function, but before the change it actually wasn't, so I was trying calling normalize_chunks with the exact same set of input arguments and was still not able to reproduce the bug :angry: )

Anyway what this means is as this PR vendors dask.array.core.normalize_chunks, but the behaviour of dask.array.core.normalize_chunks changed between the version in CI job min-all-deps and the other CI jobs, the single vendored function cannot possibly match both behaviours.

I think one simple way to fix this failure without should be to upgrade the minimum version of dask to >=2022.9.2 (from 2022.1.1 where it currently is).

EDIT: I tried changing the minimum version of dask-core in min-all-deps.yml but the conda solve failed. But also would updating to 2022.9.2 now violate xarray's minimum dependency versions policy?

EDIT2: Another way to fix this should be to un-vendor dask.array.core.normalize_chunks within xarray. We could still achieve the goal of running cubed without dask by making normalize_chunks the responsibility of the chunkmanager instead, as cubed's vendored version of that function is not subject to xarray's minimum dependencies requirement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1499432372 https://github.com/pydata/xarray/pull/7019#issuecomment-1499432372 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85ZX4W0 TomNicholas 35968931 2023-04-06T18:03:48Z 2023-04-06T18:07:24Z MEMBER

I'm having problems with ensuring the behaviour of the chunks='auto' option is consistent between .chunk and open_dataset. These problems appeared since vendoring dask.array.core.normalize_chunks. Right now the only failing tests use chunks='auto' (e.g xarray/tests/test_backends.py::test_chunking_consintency[auto] - yes there's a typo in that test's name), and they fail because xarray decides on different sizes for the automatically-chosen chunks.

What's weird is that all tests pass for me locally but these failures occur on just some of the CI jobs (and which CI jobs is not even consistent apparently???). I have no idea why this would behave differently on only some of the CI jobs, especially after double-checking that array-chunk-size is being correctly determined from the dask config variable within normalize_chunks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1492401032 https://github.com/pydata/xarray/pull/7681#issuecomment-1492401032 https://api.github.com/repos/pydata/xarray/issues/7681 IC_kwDOAMm_X85Y9DuI TomNicholas 35968931 2023-03-31T18:11:34Z 2023-03-31T18:11:34Z MEMBER

@harshitha1201 if it passes the CI, it should be fine. The error you are getting looks like something to do with importing rasterio, which is an optional backend anyway.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  restructure the contributing guide 1641188400
1492399834 https://github.com/pydata/xarray/issues/4637#issuecomment-1492399834 https://api.github.com/repos/pydata/xarray/issues/4637 IC_kwDOAMm_X85Y9Dba TomNicholas 35968931 2023-03-31T18:10:24Z 2023-03-31T18:10:24Z MEMBER

@alrho007 the code for this method on DataArray is in here

https://github.com/pydata/xarray/blob/850156cf80fe8791d45bcaff2da579cffc0cfc35/xarray/core/dataarray.py#L3303

which calls the implementation defined in xarray.core.missing

https://github.com/pydata/xarray/blob/1c81162755457b3f4dc1f551f0321c75ec9daf6c/xarray/core/missing.py#L308

I would start by trying to understand that code (looking at where things are imported to make it work), and then create a small test case example with a monotonically decreasing index that causes a problem. Then try to work out exactly which step in the code causes the issue, and whether it can be generalized to fix the issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for monotonically decreasing indices in interpolate_na 754789691
1492125602 https://github.com/pydata/xarray/pull/7623#issuecomment-1492125602 https://api.github.com/repos/pydata/xarray/issues/7623 IC_kwDOAMm_X85Y8Aei TomNicholas 35968931 2023-03-31T15:26:10Z 2023-03-31T15:26:10Z MEMBER

Thanks @nishtha981 !

I just realised after merging that this PR should in theory have had a corresponding entry in the what's new page, as all PRs are supposed to have.

We won't worry about that this time, but try and remember to add it next time! That way you will also be listed as a contributor on the what's new page.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added an introduction to the reshaping documentation 1623776623
1491278724 https://github.com/pydata/xarray/pull/7623#issuecomment-1491278724 https://api.github.com/repos/pydata/xarray/issues/7623 IC_kwDOAMm_X85Y4xuE TomNicholas 35968931 2023-03-31T04:40:58Z 2023-03-31T04:40:58Z MEMBER

I've also just told the readthedocs to rebuild just now

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added an introduction to the reshaping documentation 1623776623
1491278420 https://github.com/pydata/xarray/pull/7623#issuecomment-1491278420 https://api.github.com/repos/pydata/xarray/issues/7623 IC_kwDOAMm_X85Y4xpU TomNicholas 35968931 2023-03-31T04:40:23Z 2023-03-31T04:40:23Z MEMBER

Hi @nishtha981 - that's great that you identified what was causing the docs ci builds to fail! I was wondering why that was!

Fixing this for xarray may require more than just re-running the ci. For example it might require us to pin a particular version of a library (here sphinx_book_theme) in order to guarantee the CI works again. If it doesn't work immediately now, what we normally do is to open a new github issue on xarray's issue tracker to track the problem. That way if the problem comes up in multiple PRs we can just refer all of them back to the one issue, until it gets resolved completely.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added an introduction to the reshaping documentation 1623776623
1490687686 https://github.com/pydata/xarray/pull/7623#issuecomment-1490687686 https://api.github.com/repos/pydata/xarray/issues/7623 IC_kwDOAMm_X85Y2hbG TomNicholas 35968931 2023-03-30T17:43:55Z 2023-03-30T17:43:55Z MEMBER

I've made a final edit to remove the mention of a specific method in the intro, and remove some blanks lines. This looks good to me now so I'll merge it! Thanks @nishtha981 !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added an introduction to the reshaping documentation 1623776623
1488750990 https://github.com/pydata/xarray/pull/7677#issuecomment-1488750990 https://api.github.com/repos/pydata/xarray/issues/7677 IC_kwDOAMm_X85YvImO TomNicholas 35968931 2023-03-29T14:37:35Z 2023-03-29T14:37:35Z MEMBER

In your comment, you said "If you hit commit next to my change, it will merge the change into your pull request " kindly bear with me, I can't seem to find any button that says commit next to your change.

You can ignore this comment - I suggested a change, asked you to commit it, then realised it would be simpler for you if I just used my admin rights to commit it myself, and edited my comment to remove the recommendation. Sorry for the confusion.

Moreover please is there a way I can reach out to you? I tried reaching out to you on Twitter and the provided email, but I have not been successful with that. The mail seems to be invalid.

Sorry! I've been getting a lot of messages from Outreachy applicants and have not managed to reply to them all :cry: The email should not be invalid though - where did you send it?

If you have questions about xarray itself they should be raised on the repository though - you don't need to contact me directly for that. Also if you raise them publicly on the repository it gives other people a chance to answer if I don't see it.

If you have questions about Outreachy specifically then ask them either on the discord channel or by emailing me (thomas dot nicholas at columbia dot edu).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Added a pronunciation guide to the word Xarray in the README.MD fil… 1640484371
1488734179 https://github.com/pydata/xarray/pull/7694#issuecomment-1488734179 https://api.github.com/repos/pydata/xarray/issues/7694 IC_kwDOAMm_X85YvEfj TomNicholas 35968931 2023-03-29T14:28:12Z 2023-03-29T14:28:12Z MEMBER

Hi @harshitha1201 - thanks for this!

We do already have a section covering these methods in https://docs.xarray.dev/en/stable/user-guide/computation.html#missing-values. I suggest that we don't need to duplicate all of this information on the FAQ page.

That said the examples and explanation you have written here are still useful! Perhaps they can either be used to improve the page I just linked, or go into the docstrings of those particular methods.

For the FAQ page instead I think we probably just want to provide a summary in a couple of sentences and a link to the more detailed information on specific methods. The summary should mention that - xarray can handle missing values, - it uses np.NaN to do so, - most computation methods will automatically handle missing values appropriately, - aggregation methods have a skipna argument, - plotting will just leave them as blank spaces (link to plotting page), - we have a set of special methods for manipulating missing and filling values (link to here).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How xarray handles missing values 1644759739
1487975336 https://github.com/pydata/xarray/issues/7685#issuecomment-1487975336 https://api.github.com/repos/pydata/xarray/issues/7685 IC_kwDOAMm_X85YsLOo TomNicholas 35968931 2023-03-29T05:38:20Z 2023-03-29T05:38:20Z MEMBER

Good idea @dcherian

Could you specifiy what kind of function you would like the bot to perform?

When a github user creates their first ever issue or pull request on the xarray repository, the bot would reply to them welcoming them with a friendly message, thanking them for their interest, and point them towards useful links like contributing guidelines and contact channels.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add welcome bot? 1642317716
1487909969 https://github.com/pydata/xarray/issues/7692#issuecomment-1487909969 https://api.github.com/repos/pydata/xarray/issues/7692 IC_kwDOAMm_X85Yr7RR TomNicholas 35968931 2023-03-29T03:57:40Z 2023-03-29T03:57:40Z MEMBER

:+1: to not having to write to_dataset before saving things.

I think the .to_X vs e.g. .save(engine='X') is important enough that we should think about it a bit more. The response to #7496 doesn't seem conclusive to me (yet).

A twitter poll? Then a discussion in a meeting?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature proposal: DataArray.to_zarr() 1644429340
1485893772 https://github.com/pydata/xarray/issues/7617#issuecomment-1485893772 https://api.github.com/repos/pydata/xarray/issues/7617 IC_kwDOAMm_X85YkPCM TomNicholas 35968931 2023-03-27T21:37:20Z 2023-03-27T21:37:20Z MEMBER

Hi @Amisha2778 - unfortunately this issue has already been resolved by pull request #7625. We just forgot to close this issue once it was resolved, sorry about that! (The closing normally happens automatically but apparently didn't this time.)

You are welcome to work on any other issue you like, but I will close this one now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  The documentation contains some non-descriptive link texts. 1620573171
1485891524 https://github.com/pydata/xarray/issues/7378#issuecomment-1485891524 https://api.github.com/repos/pydata/xarray/issues/7378 IC_kwDOAMm_X85YkOfE TomNicholas 35968931 2023-03-27T21:35:14Z 2023-03-27T21:35:14Z MEMBER

Hi @Amisha2778 - great to hear you are interested. You don't need my permission - please have a go at solving any issue that looks interesting to you, and please ask questions if you have any difficulties!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improve docstrings for better discoverability 1497131525
1485279617 https://github.com/pydata/xarray/pull/7638#issuecomment-1485279617 https://api.github.com/repos/pydata/xarray/issues/7638 IC_kwDOAMm_X85Yh5GB TomNicholas 35968931 2023-03-27T15:00:52Z 2023-03-27T15:00:52Z MEMBER

Thanks @harshitha1201 ! A great usability improvement.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Faq pull request (According to pull request #7604 & issue #1285 1627983028
1483155153 https://github.com/pydata/xarray/pull/7019#issuecomment-1483155153 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85YZybR TomNicholas 35968931 2023-03-24T17:19:44Z 2023-03-24T17:21:32Z MEMBER

I've made a bare-bones cubed-xarray package to store the CubedManager class, as well as any integration tests (yet to be written). @tomwhite you should have an invitation to be an owner of that repo. It uses the entrypoint exposed in this PR to hook in, and seems to work for me locally :grin:

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1481626515 https://github.com/pydata/xarray/pull/7019#issuecomment-1481626515 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85YT9OT TomNicholas 35968931 2023-03-23T17:47:35Z 2023-03-23T17:47:51Z MEMBER

Thanks for the review @dcherian! I agree with basically everything you wrote.

The main difficulty I have at this point is non-reproducible failures as described here

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1481504146 https://github.com/pydata/xarray/pull/7019#issuecomment-1481504146 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85YTfWS TomNicholas 35968931 2023-03-23T16:26:53Z 2023-03-23T17:36:41Z MEMBER

I would like to get to the point where you can use xarray with a chunked array without ever importing dask. I think this PR gets very close, but that would be tricky to test because cubed depends on dask (so I can't just run the test suite without dask in the environment

I just released Cubed 0.6.0 which doesn't have a dependency on Dask, so this should be possible now.

Actually testing cubed with xarray in an environment without dask is currently blocked by rechunker's explicitly dependency on dask, see https://github.com/pangeo-data/rechunker/issues/139

EDIT: We can hack around this by pip installing cubed, then pip uninstalling dask as mentioned here

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1478328995 https://github.com/pydata/xarray/pull/7019#issuecomment-1478328995 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85YHYKj TomNicholas 35968931 2023-03-21T17:40:36Z 2023-03-21T17:40:36Z MEMBER

Does this mean my comment https://github.com/pydata/xarray/pull/7019#discussion_r970713341 is valid again?

Yes I think it does @headtr1ck - thanks for the reminder about that.

I now want to finish this PR by exposing the "chunk manager" interface as a new entrypoint, copying the pattern used for xarray's backends. That would allow me to move the cubed-specific CubedManager code into a separate repository, have the choice of chunkmanager default to whatever is installed, but ask explicitly what to do if multiple chunkmanagers are installed. That should address your comment @headtr1ck.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1474149056 https://github.com/pydata/xarray/pull/7356#issuecomment-1474149056 https://api.github.com/repos/pydata/xarray/issues/7356 IC_kwDOAMm_X85X3brA TomNicholas 35968931 2023-03-17T17:10:44Z 2023-03-17T17:10:44Z MEMBER

This came up in the xarray office hours today, and I'm confused why this PR made any difference to the behavior at all? The .data property just points to ._data, so why would it matter which one we check?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid loading entire dataset by getting the nbytes in an array 1475567394
1472766481 https://github.com/pydata/xarray/pull/7019#issuecomment-1472766481 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85XyKIR TomNicholas 35968931 2023-03-16T21:26:36Z 2023-03-16T21:26:36Z MEMBER

Thanks @dcherian ! Once I copied that explicit indexer business I was able to get serialization to and from zarr working with cubed!

```python In [1]: import xarray as xr

In [2]: from cubed import Spec

In [3]: ds = xr.open_dataset( ...: 'airtemps.zarr', ...: chunks={}, ...: from_array_kwargs={ ...: 'manager': 'cubed', ...: 'spec': Spec(work_dir="tmp", max_mem=20e6), ...: } ...: ) /home/tom/Documents/Work/Code/xarray/xarray/backends/plugins.py:139: RuntimeWarning: 'netcdf4' fails while guessing warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning) /home/tom/Documents/Work/Code/xarray/xarray/backends/plugins.py:139: RuntimeWarning: 'scipy' fails while guessing warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)

In [4]: ds['air'] Out[4]: <xarray.DataArray 'air' (time: 2920, lat: 25, lon: 53)> cubed.Array<array-004, shape=(2920, 25, 53), dtype=float32, chunks=((730, 730, 730, 730), (13, 12), (27, 26))> Coordinates: * lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0 * lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0 * time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 Attributes: GRIB_id: 11 ...

In [5]: ds.isel(time=slice(100, 300)).to_zarr("cubed_subset.zarr") /home/tom/Documents/Work/Code/xarray/xarray/core/dataset.py:2118: SerializationWarning: saving variable None with floating point data as an integer dtype without any _FillValue to use for NaNs return to_zarr( # type: ignore Out[5]: <xarray.backends.zarr.ZarrStore at 0x7f34953033c0> ```

{
    "total_count": 8,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 6,
    "rocket": 0,
    "eyes": 2
}
  Generalize handling of chunked array types 1368740629
1470154910 https://github.com/pydata/xarray/pull/7619#issuecomment-1470154910 https://api.github.com/repos/pydata/xarray/issues/7619 IC_kwDOAMm_X85XoMie TomNicholas 35968931 2023-03-15T14:54:10Z 2023-03-15T14:54:10Z MEMBER

@Ravenin7 do you need some guidance on how to add a test for this? Happy to help if so. It would be great to get this fix merged soon because a number of other people have encountered the same bug!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  don't use `issubdtype` to check for integer dtypes in `polyval` 1621385466
1469050607 https://github.com/pydata/xarray/pull/7019#issuecomment-1469050607 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85Xj-7v TomNicholas 35968931 2023-03-15T00:30:08Z 2023-03-15T10:03:11Z MEMBER

I tried opening a zarr store into xarray with chunking via cubed, but I got an error inside the indexing adapter classes. Somehow the type is completely wrong - would be good to type hint this part of the code, because this happens despite mypy passing now.

```python

create example zarr store

orig = xr.tutorial.open_dataset("air_temperature") orig.to_zarr('air2.zarr')

open it as a cubed array

ds = xr.open_dataset('air2.zarr', engine='zarr', chunks={}, from_array_kwargs={'manager': 'cubed'})

fails at this point

ds.load() ```

```python --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:382, in Retrying.__call__(self, fn, *args, **kwargs) 381 try: --> 382 result = fn(*args, **kwargs) 383 except BaseException: # noqa: B902 File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/runtime/executors/python.py:10, in exec_stage_func(func, *args, **kwargs) 8 @retry(stop=stop_after_attempt(3)) 9 def exec_stage_func(func, *args, **kwargs): ---> 10 return func(*args, **kwargs) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/primitive/blockwise.py:66, in apply_blockwise(out_key, config) 64 args.append(arg) ---> 66 result = config.function(*args) 67 if isinstance(result, dict): # structured array with named fields File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/ops.py:439, in map_blocks.<locals>.func_with_block_id.<locals>.wrap(*a, **kw) 438 block_id = offset_to_block_id(a[-1].item()) --> 439 return func(*a[:-1], block_id=block_id, **kw) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/ops.py:572, in map_direct.<locals>.new_func.<locals>.wrap(block_id, *a, **kw) 571 args = a + arrays --> 572 return func(*args, block_id=block_id, **kw) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/ops.py:76, in _from_array(e, x, outchunks, asarray, block_id) 75 def _from_array(e, x, outchunks=None, asarray=None, block_id=None): ---> 76 out = x[get_item(outchunks, block_id)] 77 if asarray: File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:627, in CopyOnWriteArray.__getitem__(self, key) 626 def __getitem__(self, key): --> 627 return type(self)(_wrap_numpy_scalars(self.array[key])) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:534, in LazilyIndexedArray.__getitem__(self, indexer) 533 return array[indexer] --> 534 return type(self)(self.array, self._updated_key(indexer)) File ~/Documents/Work/Code/xarray/xarray/core/indexing.py:500, in LazilyIndexedArray._updated_key(self, new_key) 499 def _updated_key(self, new_key): --> 500 iter_new_key = iter(expanded_indexer(new_key.tuple, self.ndim)) 501 full_key = [] AttributeError: 'tuple' object has no attribute 'tuple' The above exception was the direct cause of the following exception: RetryError Traceback (most recent call last) Cell In[69], line 1 ----> 1 ds.load() File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:761, in Dataset.load(self, **kwargs) 758 chunkmanager = get_chunked_array_type(*lazy_data.values()) 760 # evaluate all the chunked arrays simultaneously --> 761 evaluated_data = chunkmanager.compute(*lazy_data.values(), **kwargs) 763 for k, data in zip(lazy_data, evaluated_data): 764 self.variables[k].data = data File ~/Documents/Work/Code/xarray/xarray/core/parallelcompat.py:451, in CubedManager.compute(self, *data, **kwargs) 448 def compute(self, *data: "CubedArray", **kwargs) -> np.ndarray: 449 from cubed import compute --> 451 return compute(*data, **kwargs) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/array.py:300, in compute(executor, callbacks, optimize_graph, *arrays, **kwargs) 297 executor = PythonDagExecutor() 299 _return_in_memory_array = kwargs.pop("_return_in_memory_array", True) --> 300 plan.execute( 301 executor=executor, 302 callbacks=callbacks, 303 optimize_graph=optimize_graph, 304 array_names=[a.name for a in arrays], 305 **kwargs, 306 ) 308 if _return_in_memory_array: 309 return tuple(a._read_stored() for a in arrays) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/core/plan.py:154, in Plan.execute(self, executor, callbacks, optimize_graph, array_names, **kwargs) 152 if callbacks is not None: 153 [callback.on_compute_start(dag) for callback in callbacks] --> 154 executor.execute_dag( 155 dag, callbacks=callbacks, array_names=array_names, **kwargs 156 ) 157 if callbacks is not None: 158 [callback.on_compute_end(dag) for callback in callbacks] File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/cubed/runtime/executors/python.py:22, in PythonDagExecutor.execute_dag(self, dag, callbacks, array_names, **kwargs) 20 if stage.mappable is not None: 21 for m in stage.mappable: ---> 22 exec_stage_func(stage.function, m, config=pipeline.config) 23 if callbacks is not None: 24 event = TaskEndEvent(array_name=name) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:289, in BaseRetrying.wraps.<locals>.wrapped_f(*args, **kw) 287 @functools.wraps(f) 288 def wrapped_f(*args: t.Any, **kw: t.Any) -> t.Any: --> 289 return self(f, *args, **kw) File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:379, in Retrying.__call__(self, fn, *args, **kwargs) 377 retry_state = RetryCallState(retry_object=self, fn=fn, args=args, kwargs=kwargs) 378 while True: --> 379 do = self.iter(retry_state=retry_state) 380 if isinstance(do, DoAttempt): 381 try: File ~/miniconda3/envs/cubed/lib/python3.9/site-packages/tenacity/__init__.py:326, in BaseRetrying.iter(self, retry_state) 324 if self.reraise: 325 raise retry_exc.reraise() --> 326 raise retry_exc from fut.exception() 328 if self.wait: 329 sleep = self.wait(retry_state) RetryError: RetryError[<Future at 0x7fc0c69be4f0 state=finished raised AttributeError> ```

This still works fine for dask.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1468732253 https://github.com/pydata/xarray/pull/7019#issuecomment-1468732253 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85XixNd TomNicholas 35968931 2023-03-14T19:52:38Z 2023-03-14T19:52:38Z MEMBER

Thanks @tomwhite - I think it might make sense for me to remove the CubedManager class from this PR and instead put that & cubed+xarray tests into another repo. That keeps xarray's changes minimal, doesn't require putting cubed in any xarray CI envs, and hopefully allows us to merge the ChunkManager changes here earlier.


Places dask is still explicitly imported in xarray

There are a few remaining places where I haven't generalised to remove specific import dask calls either because it won't be imported at runtime unless you ask for it, cubed doesn't implement the equivalent function, that function isn't in the array API standard, or because I'm not sure if the dask concept used generalises to other parallel frameworks.

  • [ ] open_mfdataset(..., parallel=True) - there is no cubed.delayed to wrap the open_dataset calls in,
  • [ ] Dataset.__dask_graph__ and all the other similar dask magic methods
  • [ ] dask_array_ops.rolling - uses functions from dask.array.overlap,
  • [ ] dask_array_ops.least_squares - uses dask.array.apply_along_axis and dask.array.linalg.lstsq,
  • [ ] dask_array_ops.push - uses dask.array.reductions.cumreduction

I would like to get to the point where you can use xarray with a chunked array without ever importing dask. I think this PR gets very close, but that would be tricky to test because cubed depends on dask (so I can't just run the test suite without dask in the environment), and there are not yet any other parallel chunk-aware frameworks I know of (ramba and arkouda don't have a chunks attribute so wouldn't require this PR).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1468433743 https://github.com/pydata/xarray/issues/6899#issuecomment-1468433743 https://api.github.com/repos/pydata/xarray/issues/6899 IC_kwDOAMm_X85XhoVP TomNicholas 35968931 2023-03-14T16:31:07Z 2023-03-14T16:31:07Z MEMBER

@alrho007 great! We would welcome a pull request to update this :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  interpolate_na documentation 1333117804
1466635308 https://github.com/pydata/xarray/issues/7617#issuecomment-1466635308 https://api.github.com/repos/pydata/xarray/issues/7617 IC_kwDOAMm_X85XaxQs TomNicholas 35968931 2023-03-13T17:57:28Z 2023-03-13T17:57:28Z MEMBER

Thanks @remigathoni ! We would welcome a pull request that made these improvements :slightly_smiling_face:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  The documentation contains some non-descriptive link texts. 1620573171
1463915266 https://github.com/pydata/xarray/issues/7378#issuecomment-1463915266 https://api.github.com/repos/pydata/xarray/issues/7378 IC_kwDOAMm_X85XQZMC TomNicholas 35968931 2023-03-10T14:50:52Z 2023-03-10T14:50:52Z MEMBER

Hi @mahamtariq58, thanks for your interest! We don't normally assign issues to individuals, you are just welcome to have a go at solving any issue that interests you.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improve docstrings for better discoverability 1497131525
1462493521 https://github.com/pydata/xarray/issues/5985#issuecomment-1462493521 https://api.github.com/repos/pydata/xarray/issues/5985 IC_kwDOAMm_X85XK-FR TomNicholas 35968931 2023-03-09T17:47:58Z 2023-03-09T17:47:58Z MEMBER

Hi @alrho007 - thanks for your interest in contributing to xarray!

The code for the string accessor is in xarray/core/accessor_str.py. That's where you would need to make changes.

The idea would be to wrap xr.DataArray.str.mod in xr.DataArray.str.format.

@mathause could you expand on what you mean exactly? I'm new to this part of the codebase. There is already a .format method, are you talking about changing its behaviour?

xr.DataArray([0, 1, 2]).str would raise an error right since it's int type?

Also @ahuang11 this seems fine? Because the .str accessor doesn't actually perform a check that the passed data is a str type immediately?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Formatting data array as strings? 1052753606
1462420684 https://github.com/pydata/xarray/pull/7595#issuecomment-1462420684 https://api.github.com/repos/pydata/xarray/issues/7595 IC_kwDOAMm_X85XKsTM TomNicholas 35968931 2023-03-09T16:59:48Z 2023-03-09T16:59:48Z MEMBER

Oh right my bad - I needed to click the lower of the two "View Docs" buttons :sweat_smile:

That's definitely confusing, so I'll add this screenshot to this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Clarifications in contributors guide 1615570467
1462393425 https://github.com/pydata/xarray/pull/7595#issuecomment-1462393425 https://api.github.com/repos/pydata/xarray/issues/7595 IC_kwDOAMm_X85XKlpR TomNicholas 35968931 2023-03-09T16:45:57Z 2023-03-09T16:45:57Z MEMBER

Wait hang on the docs build in the CI is not pointing to my PR - it's pointing to the stable branch! That's not right.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Clarifications in contributors guide 1615570467
1460980509 https://github.com/pydata/xarray/pull/7019#issuecomment-1460980509 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85XFMsd TomNicholas 35968931 2023-03-08T22:37:21Z 2023-03-08T22:39:46Z MEMBER

I'm making progress with this PR, and now that @tomwhite implemented cubed.apply_gufunc I've re-routed xarray.apply_ufunc to use whatever version of apply_gufunc is defined by the chosen ChunkManager. This means many basic operations should now just work:

```python In [1]: import xarray as xr

In [2]: da = xr.DataArray([1, 2, 3], dims='x')

In [3]: da_chunked = da.chunk(from_array_kwargs={'manager': 'cubed'})

In [4]: da_chunked Out[4]: <xarray.DataArray (x: 3)> cubed.Array<array-003, shape=(3,), dtype=int64, chunks=((3,),)> Dimensions without coordinates: x

In [5]: da_chunked.mean() Out[5]: <xarray.DataArray ()> cubed.Array<array-006, shape=(), dtype=int64, chunks=()>

In [6]: da_chunked.mean().compute() [cubed.Array<array-009, shape=(), dtype=int64, chunks=()>] Out[6]: <xarray.DataArray ()> array(2) ```

(You need to install both cubed>0.5.0 and the main branch of rechunker for this to work.)

I still have a fair bit more to do on this PR (see checklist at top), but for testing should I:

  1. Start making a test_cubed.py file in xarray as part of this PR with bespoke tests,
  2. Put bespoke tests for xarray wrapping cubed somewhere else (e.g. the cubed repo or a new cubed-xarray repo),
  3. Merge this PR without cubed-specific tests and concentrate on finishing the general duck-array testing framework in #6908 so we can implement (b) in the way we actually eventually want things to work for 3rd-party duck array libraries?

I would prefer not to have this PR grow to be thousands of lines by including tests in it, but also waiting for #6908 might take a while because that's also a fairly ambitious PR.

The fact that the tests are currently green for this PR (ignoring some mypy stuff) is evidence that the decoupling of dask from xarray is working so far.

(I have already added some tests for the ability to register custom ChunkManagers though.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1460476581 https://github.com/pydata/xarray/pull/7595#issuecomment-1460476581 https://api.github.com/repos/pydata/xarray/issues/7595 IC_kwDOAMm_X85XDRql TomNicholas 35968931 2023-03-08T16:40:27Z 2023-03-08T16:40:27Z MEMBER

There is a link in the docs to an xarray wiki:

If your test requires working with files or network connectivity, there is more information on the `testing page <https://github.com/pydata/xarray/wiki/Testing>`_ of the wiki.

This link is broken, but I'm not sure what to replace it with. I didn't even know xarray had a wiki!! Does it still exist?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Clarifications in contributors guide 1615570467
1458901771 https://github.com/pydata/xarray/issues/7556#issuecomment-1458901771 https://api.github.com/repos/pydata/xarray/issues/7556 IC_kwDOAMm_X85W9RML TomNicholas 35968931 2023-03-07T21:31:16Z 2023-03-07T21:31:16Z MEMBER

Thanks for reporting @arfriedman .

The link is supposed to be referring to the section immediately following the Datetime Indexing subsection. I think the underscores in datetime_component_indexing might be what's breaking the link.

If anyone wants to make their first small contribution to xarray this would be a great start!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  broken documentation link 1598266728
1456460042 https://github.com/pydata/xarray/issues/7588#issuecomment-1456460042 https://api.github.com/repos/pydata/xarray/issues/7588 IC_kwDOAMm_X85Wz9EK TomNicholas 35968931 2023-03-06T16:30:25Z 2023-03-06T16:30:25Z MEMBER

Wow thanks for reporting this @Metamess ! Do you see a simple fix within merge_core?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.merge with compat="minimal" returns corrupted Dataset and causes __len__ to return wrong and possibly negative values. 1611701140
1453094468 https://github.com/pydata/xarray/issues/2542#issuecomment-1453094468 https://api.github.com/repos/pydata/xarray/issues/2542 IC_kwDOAMm_X85WnHZE TomNicholas 35968931 2023-03-03T07:27:48Z 2023-03-03T07:27:48Z MEMBER

See #3980 - this issue should be solved as part of #3980, but see that issue for more general discussion of subclassing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  full_like, ones_like, zeros_like should retain subclasses 377356113
1453076251 https://github.com/pydata/xarray/issues/4213#issuecomment-1453076251 https://api.github.com/repos/pydata/xarray/issues/4213 IC_kwDOAMm_X85WnC8b TomNicholas 35968931 2023-03-03T07:07:31Z 2023-03-03T07:07:31Z MEMBER

Closing this as having answered the original question. If anyone wants to discuss mosaicing rasters in more detail we can raise another issue for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray combine_by_coords return the monotonic global index error 654150730
1450384405 https://github.com/pydata/xarray/issues/7560#issuecomment-1450384405 https://api.github.com/repos/pydata/xarray/issues/7560 IC_kwDOAMm_X85WcxwV TomNicholas 35968931 2023-03-01T15:54:45Z 2023-03-01T15:54:45Z MEMBER

Or just xr.concat.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.merge does not respect datatypes of inputs 1600364072
1444165475 https://github.com/pydata/xarray/issues/7559#issuecomment-1444165475 https://api.github.com/repos/pydata/xarray/issues/7559 IC_kwDOAMm_X85WFDdj TomNicholas 35968931 2023-02-24T18:07:08Z 2023-02-24T18:07:08Z MEMBER

The chunk_by_labels functionality seems quite useful even when not talking about times, so I would be :+1: for that kind of option.

On the API question is there anywhere else in xarray where we have made some choice about how to let the user choose between specifying via indexes or labels? Apart from just .isel vs .sel I mean

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support specifying chunk sizes using labels (e.g. frequency string) 1599056009
1438877252 https://github.com/pydata/xarray/issues/7539#issuecomment-1438877252 https://api.github.com/repos/pydata/xarray/issues/7539 IC_kwDOAMm_X85Vw4ZE TomNicholas 35968931 2023-02-21T17:48:30Z 2023-02-21T17:48:30Z MEMBER

Or indeed at least make it clearer in the docs that something like drop_indexes or reset_coords should be used first in order to skip auto-alignment for some variables.

I think we should do this regardless. I don't know of anywhere in the docs that these kind of subtleties with concat are clearly documented.

I guess easiest for a concat version with no auto-alignment would be to drop the index when such case happens.

Right - in this case that would have given the intuitive result.

We could get halfway to a better xr.concat by changing https://github.com/pydata/xarray/issues/2064 IMO.

I propose join="exact", data_vars="minimal", coords="minimal", compat="override"

That wouldn't have helped with this specific issue though right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concat doesn't concatenate dimension coordinates along new dims 1588461863
1433726618 https://github.com/pydata/xarray/issues/4610#issuecomment-1433726618 https://api.github.com/repos/pydata/xarray/issues/4610 IC_kwDOAMm_X85VdO6a TomNicholas 35968931 2023-02-16T21:17:56Z 2023-02-16T21:17:56Z MEMBER

it was triggering a load

Can we not just test the in-memory performance by .load()-ing first? Then worry about dask performance? That's what I was vaguely getting at in my comment, trying the in-memory performance but also plotting the dask graph.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add histogram method 750985364
1433675446 https://github.com/pydata/xarray/issues/4610#issuecomment-1433675446 https://api.github.com/repos/pydata/xarray/issues/4610 IC_kwDOAMm_X85VdCa2 TomNicholas 35968931 2023-02-16T20:29:25Z 2023-02-16T20:29:25Z MEMBER

Could you show the example that's this slow, @TomNicholas ? So I can play around with it too.

I think I just timed the difference in the (unweighted) "real" example I gave in the notebook. (Not the weighted one because that didn't give the right answer with flox for some reason.)

One thing I noticed in your notebook is that you haven't used chunks={} on the open_dataset. Which seems to trigger data loading on strange places in xarray (places that calls self.data), but I'm not sure this is your actual problem.

Fair point, worth trying.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add histogram method 750985364
1424477107 https://github.com/pydata/xarray/issues/7515#issuecomment-1424477107 https://api.github.com/repos/pydata/xarray/issues/7515 IC_kwDOAMm_X85U58uz TomNicholas 35968931 2023-02-09T16:33:26Z 2023-02-09T16:34:39Z MEMBER

We'll fix the compatibility issues, but we first need to understand what the expectations on something like data.shape should be in these circumstances.

At a minimum, xarray expects .shape, .ndim and .dtype to always be defined. (And the number of dims to match the shape, which Joe's example above implies aesara doesn't do?) On top of that there are extra expectations about slicing and broadcasting changing shape in the same ways as it does for numpy arrays. (@keewis correct me if I've mis-stated this or missed something important here!)

For a shared variable, it's always possible to get a value for data.shape by referencing the underlying data, but the reason we don't do that by default is—in part—due to the fact that shared variables can be updated with values that have different shapes (but the same dtypes and number of dimensions).

This sounds a bit similar to discussions we have been having about wrapping ragged arrays in xarray, for which there are multiple ways you might choose to define the shape.

The simplest way to guarantee that aesara can be wrapped by xarray is if aesara conformed to the array API standard, and they have a test suite you can use to check that. We are also working on our own testing framework that duck-typed array libraries like aesara could import to quickly test integration with xarray.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Aesara as an array backend in Xarray 1575494367
1423144049 https://github.com/pydata/xarray/issues/4610#issuecomment-1423144049 https://api.github.com/repos/pydata/xarray/issues/4610 IC_kwDOAMm_X85U03Rx TomNicholas 35968931 2023-02-08T19:40:58Z 2023-02-08T20:25:04Z MEMBER

Q: Use xhistogram approach or flox-powered approach?

@dcherian recently showed how his flox package can perform histograms as groupby-like reductions. This begs the question of which approach would be better to use in a histogram function in xarray.

(This is related to but better than what we had tried previously with xarray groupby and numpy_groupies.)

Here's a WIP notebook comparing the two approaches.

Both approaches can feasibly do: - Histograms which leave some dimensions excluded (broadcast over), - Multi-dimensional histograms (e.g. binning two different variables into one 2D bin), - Normalized histograms (return PDFs instead of counts), - Weighted histograms, - Multi-dimensional bins (as @aaronspring asks for above - but it requires work - see how to do it flox, and my stalled PR to xhistogram).

Pros of using flox-powered reductions:

  • Much less code - the flox approach is basically one call to flox.
  • Fewer codepaths, with groupby logic and all histogram functionality flowing through the flox.xarray_reduce codepath.
  • Likely clearer code than the kinda impenetrable reshaped bincount logic lurking in the depths of xhistogram.
  • Supporting new features (e.g. multidimensional bins) should be simpler in flox because the options don't have to be propagated all the way down to the level of the np.bincount caller.

Pros of using xhistogram's blockwise bincount approach:

  • Absolute speed of xhistogram appears to be 3-4x higher, and that's using numpy_groupies in flox. Possibly flox could be faster if using numba but not sure yet.
  • Dask graphs simplicity. Xhistogram literally uses blockwise, whereas the flox graphs IIUC are blockwise-like but actually a specially-constructed HLG right now. (Also important for supporting other parallel backends.) I suspect that in practice both perform similarly well after graph optimization but I have not tested this at scale, and flox's graph might be more sensitive to extra steps in the calculation like adding weights or normalising the result.

Other thoughts:

  • Flox has various clever schemes for making general chunked groupby operations run more efficiently, but I don't think histogramming would really benefit from those unless there is a strong pattern to which values likely fall in which bins, that is known a priori.
  • Deepak's example using flox uses pandas.IntervalIndex to represent the bins on the result object, whereas xhistogram just returns the mid-points of the bins, throwing that info away. This seems like a cool idea on it's own, but probably requires some extra work to make sure it's handled by the indexes refactor and the plotting code.
  • In my comparison notebook here's something I'm missing that's causing my "real example" (from xhistogram docs) to not actually use the provided weights. I suspect its something simple, any idea @dcherian?

xref https://github.com/xgcm/xhistogram/issues/60, https://github.com/xgcm/xhistogram/issues/28

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add histogram method 750985364
1422882836 https://github.com/pydata/xarray/issues/7515#issuecomment-1422882836 https://api.github.com/repos/pydata/xarray/issues/7515 IC_kwDOAMm_X85Uz3gU TomNicholas 35968931 2023-02-08T16:18:08Z 2023-02-08T16:18:08Z MEMBER

(I was already typing this when Ryan posted so I'll finish anyway :sweat_smile:)

To clarify, what @jhamman is suggesting in this specific issue is xarray wrapping aesara, as opposed to aesara wrapping xarray.

Both of these goals would be great, but in xarray we would particularly love to be able to wrap aesara because: - We already have a large community of users who use xarray objects as their top-level interface, - It continues a larger project of xarray being generalized to wrap any numpy-like "duck array". - Our ultimate goal is for xarray users to be able to seamless switch out their computing backend and find which library gives them the best performance without changing the rest of their high-level code.

For xarray to wrap aesara aesara needs to provide a numpy-like API, ideally conforming to the python array api standard. If aesara already does this then we should try out the wrapping right now!

If you're interested in this topic I invite you to drop in to a meeting of the Pangeo working group on distributed arrays! We have so far had talks from distributed computing libraries including Arkouda, Ramba, and cubed, all of which we are hoping to support as compute backends.


If anyone is also separately interested in using xarray inside PyTensor / aesara then that's awesome, but we should try to track efforts in that direction on a different issue to keep this distinction clear. We plan to better support that direction of wrapping soon by fully exposing our (semi-private internal currently) lightweight Variable class.

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Aesara as an array backend in Xarray 1575494367
1420991427 https://github.com/pydata/xarray/pull/7506#issuecomment-1420991427 https://api.github.com/repos/pydata/xarray/issues/7506 IC_kwDOAMm_X85UspvD TomNicholas 35968931 2023-02-07T15:45:26Z 2023-02-07T15:45:26Z MEMBER

Thanks @dcherian!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix whats-new for 2023.02.0 1573030587
1420042790 https://github.com/pydata/xarray/pull/7506#issuecomment-1420042790 https://api.github.com/repos/pydata/xarray/issues/7506 IC_kwDOAMm_X85UpCIm TomNicholas 35968931 2023-02-07T01:37:59Z 2023-02-07T01:37:59Z MEMBER

Wait what version are we on right now?? :sweat_smile: Are you in middle of issuing a release? The whats-new in main doesn't match what's actually been released atm

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix whats-new for 2023.02.0 1573030587
1411454442 https://github.com/pydata/xarray/pull/7418#issuecomment-1411454442 https://api.github.com/repos/pydata/xarray/issues/7418 IC_kwDOAMm_X85UIRXq TomNicholas 35968931 2023-02-01T04:40:30Z 2023-02-01T04:40:30Z MEMBER

I think this PR is ready now, it just fails mypy (@Illviljan I added a py.typed file to datatree but the xarray mypy CI is still not happy with it).

Once this is merged we can push on with implementing open_datatree (https://github.com/pydata/xarray/pull/7437 @jthielen) and I can take a crack at fixing https://github.com/xarray-contrib/datatree/issues/146 in xarray in a follow-up PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Import datatree in xarray? 1519552711
1411324206 https://github.com/pydata/xarray/issues/7493#issuecomment-1411324206 https://api.github.com/repos/pydata/xarray/issues/7493 IC_kwDOAMm_X85UHxku TomNicholas 35968931 2023-02-01T01:42:49Z 2023-02-01T01:42:49Z MEMBER

@khider we are more than happy to help with digging into the codebase! A reasonable place to start would be just trying the operation you want to perform, and looking through the code for the functions any errors get thrown from.

You are also welcome to join our bi-weekly community meetings (there is one tomorrow morning!) or the office hours we run.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Interoperability with Pandas 2.0 non-nanosecond datetime 1563104480
1409354614 https://github.com/pydata/xarray/issues/7493#issuecomment-1409354614 https://api.github.com/repos/pydata/xarray/issues/7493 IC_kwDOAMm_X85UAQt2 TomNicholas 35968931 2023-01-30T21:18:36Z 2023-01-30T21:18:36Z MEMBER

Hi @khider , thanks for raising this.

For those of us who haven't tried to use non-nanosecond datetimes before (e.g. me), could you possibly expand a bit more on

However, most of the interesting functionalities of xarray don't seem to support this datetime out-of-box:

specifically, where are errors being thrown from within xarray? And what functions are you referring to as examples?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Interoperability with Pandas 2.0 non-nanosecond datetime 1563104480
1402777128 https://github.com/pydata/xarray/issues/3980#issuecomment-1402777128 https://api.github.com/repos/pydata/xarray/issues/3980 IC_kwDOAMm_X85TnK4o TomNicholas 35968931 2023-01-24T22:30:25Z 2023-01-24T22:30:25Z MEMBER

@pydata/xarray should this feature be added to our development roadmap? It's arguably another approach to making more flexible data structures...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Make subclassing easier? 602218021
1400848135 https://github.com/pydata/xarray/pull/7019#issuecomment-1400848135 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85Tfz8H TomNicholas 35968931 2023-01-23T19:14:26Z 2023-01-23T19:14:26Z MEMBER

@drtodd13 mentioned today that ramba doesn't actually require explicit chunks to work, which I hadn't realised. So forcing wrapped libraries to implement an explicit chunks method might be too restrictive. Ramba could possibly work entirely through the numpy array API standard.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1400846131 https://github.com/pydata/xarray/pull/7019#issuecomment-1400846131 https://api.github.com/repos/pydata/xarray/issues/7019 IC_kwDOAMm_X85Tfzcz TomNicholas 35968931 2023-01-23T19:12:57Z 2023-01-23T19:12:57Z MEMBER

@drtodd13 tagging you here and linking my notes from today's distributed arrays working group meeting for the links and references to this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generalize handling of chunked array types 1368740629
1398898465 https://github.com/pydata/xarray/pull/7460#issuecomment-1398898465 https://api.github.com/repos/pydata/xarray/issues/7460 IC_kwDOAMm_X85TYX8h TomNicholas 35968931 2023-01-20T20:31:52Z 2023-01-20T20:31:52Z MEMBER

This seems like a good idea to me, but I don't know much about this part of the codebase. We should at minimum state in the docstring of these method whether they are required or optional.

From the errors thrown in the tests it seems set_variable is not defined by backends quite regularly at least.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add abstractmethods to backend classes 1549889322
1397480084 https://github.com/pydata/xarray/issues/7457#issuecomment-1397480084 https://api.github.com/repos/pydata/xarray/issues/7457 IC_kwDOAMm_X85TS9qU TomNicholas 35968931 2023-01-19T19:14:04Z 2023-01-19T19:14:04Z MEMBER

But then da.data will be of this protocol type and not the array class that you assume it has. For internal type checking this is what we want but for the user this will be confusing.

When will the user be using this type annotation? Isn't all this typing stuff basically a dev feature (internally and downstream)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Typing of internal datatypes 1548948097
1397465527 https://github.com/pydata/xarray/issues/7457#issuecomment-1397465527 https://api.github.com/repos/pydata/xarray/issues/7457 IC_kwDOAMm_X85TS6G3 TomNicholas 35968931 2023-01-19T19:02:09Z 2023-01-19T19:02:09Z MEMBER

Doesn't the python array api standard effort have some type of duck array protocol we could import? I feel like this has been mentioned before. Then we would start with @Illviljan 's suggestion and replace it with the correct duck array protocol later.

We might also consider that in the context of different distributed array backends dask arrays define a more specific API that includes methods like .chunk() too. Standardisation of that is a long-term dream though, not an immediate problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Typing of internal datatypes 1548948097
1383229518 https://github.com/pydata/xarray/issues/7439#issuecomment-1383229518 https://api.github.com/repos/pydata/xarray/issues/7439 IC_kwDOAMm_X85ScmhO TomNicholas 35968931 2023-01-15T19:17:35Z 2023-01-15T19:17:35Z MEMBER

Thanks for these comment @paigem! It's useful to hear things that might not occur to us.

Good points @Illviljan. Perhaps we should restructure to make it clear that running tests / pre-commit locally is optional, but still show people how to do it?

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add clarifying language to contributor's guide 1532853152
1382671908 https://github.com/pydata/xarray/issues/2368#issuecomment-1382671908 https://api.github.com/repos/pydata/xarray/issues/2368 IC_kwDOAMm_X85SaeYk TomNicholas 35968931 2023-01-14T06:10:39Z 2023-01-14T06:10:39Z MEMBER

@ronygolderku thanks for your example. Looks like it fails for the same reason as was mentioned for some of the other examples above.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
1382474227 https://github.com/pydata/xarray/pull/7437#issuecomment-1382474227 https://api.github.com/repos/pydata/xarray/issues/7437 IC_kwDOAMm_X85SZuHz TomNicholas 35968931 2023-01-13T22:31:40Z 2023-01-13T22:31:40Z MEMBER

Holla when you want a review :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DRAFT: Implement `open_datatree` in BackendEntrypoint for preliminary DataTree support 1532662115
1382233446 https://github.com/pydata/xarray/issues/7348#issuecomment-1382233446 https://api.github.com/repos/pydata/xarray/issues/7348 IC_kwDOAMm_X85SYzVm TomNicholas 35968931 2023-01-13T18:34:51Z 2023-01-13T18:34:51Z MEMBER

Thanks for the suggestion @nbren12 !

Whilst I agree that this would be a more "correct" way of providing accessor functionality, I think there is a big downside in that Entrypoints are quite a lot harder for users to use than the accessors are.

Whilst the way the accessors actually add the method is kind of black magic, all the user has to do is copy-paste the example from the docs and change the methods and accessor name to what they want, and it will immediately work.

Adding an entrypoint requires going into your setup.py (which you wouldn't even have if you're running a script or notebook), and it's conceptually complicated. The accessors stuff is useful even for relatively novice users, so I don't really want to make it harder to understand...

Interested in what other people think though?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Using entry_points to register dataset and dataarray accessors? 1473152374
1382170026 https://github.com/pydata/xarray/issues/6577#issuecomment-1382170026 https://api.github.com/repos/pydata/xarray/issues/6577 IC_kwDOAMm_X85SYj2q TomNicholas 35968931 2023-01-13T17:30:08Z 2023-01-13T17:30:08Z MEMBER

(I'm looking for issues to tag as beginner-friendly)

Shall we also add a top-level import: xr.show_backends like xr.show_versions. This would be a lot more discoverable.

This is easy right? You could just add an xr.show_backends that points to the same function as xr.backends.list_engines? And point out in the docstring that they are actually the same function?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: list_engines function 1227144046
1382154696 https://github.com/pydata/xarray/issues/6827#issuecomment-1382154696 https://api.github.com/repos/pydata/xarray/issues/6827 IC_kwDOAMm_X85SYgHI TomNicholas 35968931 2023-01-13T17:14:33Z 2023-01-13T17:14:33Z MEMBER

@gabicca given that we do say in our docs that we don't recommend subclassing xarray objects, I'm going to close this issue as a downstream "subclass at your own risk" problem. If you would like it to be easier to subclass, we would welcome your input here.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.concat for datasets fails when object is a child class of Dataset 1318173644
1382077317 https://github.com/pydata/xarray/issues/7256#issuecomment-1382077317 https://api.github.com/repos/pydata/xarray/issues/7256 IC_kwDOAMm_X85SYNOF TomNicholas 35968931 2023-01-13T16:19:04Z 2023-01-13T16:19:04Z MEMBER

The datatree class you mentioned is very interesting, is it yet available for me to use as a package?

Also yes you can find datatree here, though it's a bit experimental compared to the rest of xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset class doesn't respond to groupstr kwarg 1435219725
1382076159 https://github.com/pydata/xarray/issues/7256#issuecomment-1382076159 https://api.github.com/repos/pydata/xarray/issues/7256 IC_kwDOAMm_X85SYM7_ TomNicholas 35968931 2023-01-13T16:18:02Z 2023-01-13T16:18:02Z MEMBER

@NilsHoebe to clarify, are you still unable to open the desired group even when you use group as the kwarg? If so a minimal example would help us to help you!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_dataset class doesn't respond to groupstr kwarg 1435219725
1377905161 https://github.com/pydata/xarray/pull/7418#issuecomment-1377905161 https://api.github.com/repos/pydata/xarray/issues/7418 IC_kwDOAMm_X85SISoJ TomNicholas 35968931 2023-01-10T21:33:27Z 2023-01-10T21:33:27Z MEMBER

Integrating upstream into xarray might also help with people trying to open their nested data formats as a datatree objects, because then we can immediately begin integrating with xarray's backend engines.

See for example this datatree issue asking about opening grib files as a datatree. It would be nice to be able to do

open_datatree("data.grib", engine="cfgrib")

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Import datatree in xarray? 1519552711
1373101988 https://github.com/pydata/xarray/pull/7418#issuecomment-1373101988 https://api.github.com/repos/pydata/xarray/issues/7418 IC_kwDOAMm_X85R19-k TomNicholas 35968931 2023-01-06T03:35:36Z 2023-01-06T03:35:36Z MEMBER

Would it mean that if someone wants to later add any feature "x" or "y" into Xarray, they just need implementing the feature for Dataset (and possibly DataArray) and it will be guaranteed to work with Datatree?

Basically yes, it would immediately work with Datatree. Datatree currently implements most dataset methods by literally copying them and their docstrings, and they work by mapping the method over every node in the tree. We could integrate Datatree in such a way that the additional developer effort to get a method on dataset working on Datatree would be negligible (think adding a single element with the method name to an internal list, or copy-pasting a docstring).

I don't think it is ideal to have such non-synchronized state within Xarray itself.

This is an argument for waiting before integrating.

I'm just speaking generally from my experience of having struggled while doing some heavy refactoring in Xarray recently :)

I appreciate the input @benbovy! I think the main difference between this effort and your (heroic) indexes effort is that Datatree doesn't touch any existing API.

I guess my main concern is that integrating prematurely into Xarray might give a false sense of stability - I don't want to later realize I should redesign Datatree, and have people be annoyed because they thought it was as stable as the rest of xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Import datatree in xarray? 1519552711
1372806772 https://github.com/pydata/xarray/pull/7418#issuecomment-1372806772 https://api.github.com/repos/pydata/xarray/issues/7418 IC_kwDOAMm_X85R0150 TomNicholas 35968931 2023-01-05T21:36:25Z 2023-01-05T21:36:25Z MEMBER

Why? Are its dependencies different from Xarray?

No, datatree has no additional dependencies. I was just asking because if we went for the "import from second repository" plan we may want to test that the import works as part of our CI. Not a major issue though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Import datatree in xarray? 1519552711
1372360850 https://github.com/pydata/xarray/pull/7418#issuecomment-1372360850 https://api.github.com/repos/pydata/xarray/issues/7418 IC_kwDOAMm_X85RzJCS TomNicholas 35968931 2023-01-05T15:24:58Z 2023-01-05T15:24:58Z MEMBER

There is also at least one bug in datatree that cannot be fixed without a (small) change to xarray, and having datatree as an optional import means I could fix it here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Import datatree in xarray? 1519552711
1362271360 https://github.com/pydata/xarray/issues/7397#issuecomment-1362271360 https://api.github.com/repos/pydata/xarray/issues/7397 IC_kwDOAMm_X85RMpyA TomNicholas 35968931 2022-12-22T01:04:39Z 2022-12-22T01:04:39Z MEMBER

Thanks for this bug report. FWIW I have also seen this bug recently when helping out a student.

The question here is whether this is an xarray, numpy, or a netcdf bug (or some combo). Can you reproduce the problem using to_zarr()? If so that would rule out netcdf as the culprit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Memory issue merging NetCDF files using xarray.open_mfdataset and to_netcdf 1506437087
1351995978 https://github.com/pydata/xarray/issues/7378#issuecomment-1351995978 https://api.github.com/repos/pydata/xarray/issues/7378 IC_kwDOAMm_X85QldJK TomNicholas 35968931 2022-12-14T19:05:19Z 2022-12-14T19:05:19Z MEMBER

That's a useful observation, thank you @maawoo!

This comes from the way we generate our code for the many different aggregations xarray can perform. We actually use this script to automatically generate all the source code for all the aggregations in this file. That script has a template that is filled in for each method.

Currently the template looks like this

python TEMPLATE_REDUCTION_SIGNATURE = ''' def {method}( self, dim: Dims = None, *,{extra_kwargs} keep_attrs: bool | None = None, **kwargs: Any, ) -> {obj}: """ Reduce this {obj}'s data by applying ``{method}`` along some dimension(s). Parameters ----------''' where in the case of variance the method is just var so "variance" isn't in the generated docstring anywhere.

How might we fix this? One immediate thought that might help is to change the template to use a method_name and a long_name, where method_name is var but long_name is variance for example. This shouldn't be particularly difficult, and we would welcome a PR if you would be interested in contributing? We would help you out :slightly_smiling_face:

Or we might change the docstrings in some other, more granular way. Adding examples to aggregation methods would also have to deal with the fact they are autogenerated https://github.com/pydata/xarray/issues/6793

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improve docstrings for better discoverability 1497131525
1341289782 https://github.com/pydata/xarray/issues/6610#issuecomment-1341289782 https://api.github.com/repos/pydata/xarray/issues/6610 IC_kwDOAMm_X85P8nU2 TomNicholas 35968931 2022-12-07T17:07:08Z 2022-12-07T17:07:08Z MEMBER

Using xr.Grouper has the advantage that you don't have to start guessing about whether or not the user wanted some complicated behaviour (especially if their input is slightly wrong somehow and you have to raise an informative error). Simple defaults would get left as is and complex use cases can be explicit and opt-in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update GroupBy constructor for grouping by multiple variables, dask arrays 1236174701
1334242366 https://github.com/pydata/xarray/issues/7344#issuecomment-1334242366 https://api.github.com/repos/pydata/xarray/issues/7344 IC_kwDOAMm_X85Phuw- TomNicholas 35968931 2022-12-01T19:24:24Z 2022-12-01T19:24:24Z MEMBER

I kinda think correctness by default is more important than performance, especially if the default performance isn't prohibitively slow.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Disable bottleneck by default? 1471685307
1333941360 https://github.com/pydata/xarray/issues/7341#issuecomment-1333941360 https://api.github.com/repos/pydata/xarray/issues/7341 IC_kwDOAMm_X85PglRw TomNicholas 35968931 2022-12-01T15:29:47Z 2022-12-01T15:29:47Z MEMBER

@keewis https://github.com/pydata/xarray/pull/7338 mentions datatree, so we could refer people back to there from now on?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr groups (grouped) loader and operations 1471273638
1311911082 https://github.com/pydata/xarray/issues/7278#issuecomment-1311911082 https://api.github.com/repos/pydata/xarray/issues/7278 IC_kwDOAMm_X85OMiyq TomNicholas 35968931 2022-11-11T16:21:21Z 2022-11-11T16:21:21Z MEMBER

I realise that this was probably a function intended for internal use only

Yes, everything under xarray.core is internal, and unfortunately internal functions aren't publicly supported, so we provide no guarantee that they will still work/exist between versions.

Is there a better way to do this? What replaces this function?

@benbovy should know!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  remap_label_indexers removed without deprecation update? 1444752393
1302293898 https://github.com/pydata/xarray/issues/4285#issuecomment-1302293898 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85Nn22K TomNicholas 35968931 2022-11-03T15:34:57Z 2022-11-03T15:34:57Z MEMBER

The email that you have listed here doesn't work (bounced back).

Oops - use thomas dot nicholas at columbia dot edu please!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1302240686 https://github.com/pydata/xarray/issues/4285#issuecomment-1302240686 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85Nnp2u TomNicholas 35968931 2022-11-03T14:58:11Z 2022-11-03T14:58:11Z MEMBER

I should be able to join today as well @jpivarski ! Will need the zoom address

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1293975333 https://github.com/pydata/xarray/issues/7211#issuecomment-1293975333 https://api.github.com/repos/pydata/xarray/issues/7211 IC_kwDOAMm_X85NIH8l TomNicholas 35968931 2022-10-27T19:33:55Z 2022-10-27T19:40:19Z MEMBER

Hi @airton-neto - here is a much shorter example that reproduces the same error

```python url = 'https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2022/20220201/gfs.0p25.2022020100.f000.grib2'

dataset = xr.open_dataset(url, cache=True, engine="netcdf4") python from dagster._utils import frozenlist

variables = list(['u-component_of_wind_height_above_ground']) # <--- This way it runs

variables = frozenlist(['u-component_of_wind_height_above_ground'])

dataset[variables] ```

```python

KeyError Traceback (most recent call last) File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:1317, in Dataset._construct_dataarray(self, name) 1316 try: -> 1317 variable = self._variables[name] 1318 except KeyError:

KeyError: ['u-component_of_wind_height_above_ground']

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last) Input In [43], in <cell line: 1>() ----> 1 dataset[variables]

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:1410, in Dataset.getitem(self, key) 1408 return self.isel(**key) 1409 if utils.hashable(key): -> 1410 return self._construct_dataarray(key) 1411 if utils.iterable_of_hashable(key): 1412 return self._copy_listed(key)

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:1319, in Dataset.construct_dataarray(self, name) 1317 variable = self._variables[name] 1318 except KeyError: -> 1319 , name, variable = _get_virtual_variable(self._variables, name, self.dims) 1321 needed_dims = set(variable.dims) 1323 coords: dict[Hashable, Variable] = {}

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/xarray/core/dataset.py:171, in _get_virtual_variable(variables, key, dim_sizes) 168 return key, key, variable 170 if not isinstance(key, str): --> 171 raise KeyError(key) 173 split_key = key.split(".", 1) 174 if len(split_key) != 2:

KeyError: ['u-component_of_wind_height_above_ground'] ```

I'm not immediately sure why the elements of the list are not properly extracted, but the frozenlist class is a dagster private internal, and the problem is specific to how that object interacts with the current xarray code. As a downstream user of xarray, can dagster not just change it's code to pass the type xarray expects (i.e. a normal list)?

Having said that if you want to submit a PR which fixes this bug (and doesn't require special-casing dagster somehow) then that would be welcome too!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Incorrect handle to Dagster frozenlists in Dataset object 1422460071
1292252691 https://github.com/pydata/xarray/issues/7227#issuecomment-1292252691 https://api.github.com/repos/pydata/xarray/issues/7227 IC_kwDOAMm_X85NBjYT TomNicholas 35968931 2022-10-26T15:48:36Z 2022-10-26T15:53:14Z MEMBER

Well that's frustrating - I didn't see that.

They also mention that if PEP 637 hadn't been rejected there would be more use, which is ironic because we also wanted PEP 637 for indexing (like da[time=slice(5, 10)]).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Typing with Variadic Generics in python 3.11 (PEP 646) 1424215477
1290896977 https://github.com/pydata/xarray/pull/7214#issuecomment-1290896977 https://api.github.com/repos/pydata/xarray/issues/7214 IC_kwDOAMm_X85M8YZR TomNicholas 35968931 2022-10-25T17:25:48Z 2022-10-25T17:25:48Z MEMBER

Big moves!!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pass indexes directly to the DataArray and Dataset constructors 1422543378
1289312995 https://github.com/pydata/xarray/pull/7200#issuecomment-1289312995 https://api.github.com/repos/pydata/xarray/issues/7200 IC_kwDOAMm_X85M2Vrj TomNicholas 35968931 2022-10-24T16:47:33Z 2022-10-24T16:47:45Z MEMBER

If one wants to get fancy, we could wrap the list_engines dict into a custom dict with a nice repr (maybe even html repr).

Whilst I don't want to discourage eager devs, in one of the community calls we did discuss how this would almost certainly be overkill for a little-used advanced feature.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 2,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Backends descriptions 1419882372
1287194617 https://github.com/pydata/xarray/pull/7192#issuecomment-1287194617 https://api.github.com/repos/pydata/xarray/issues/7192 IC_kwDOAMm_X85MuQf5 TomNicholas 35968931 2022-10-21T16:34:44Z 2022-10-21T16:35:04Z MEMBER

LGTM as long as it renders nicely.

It renders nicely locally, but any idea why the RTD build fails so opaquely?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Example using Coarsen.construct to split map into regions 1417378270

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3226.942ms · About: xarray-datasette