id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1974681146,PR_kwDOAMm_X85edMm-,8404,Hypothesis strategy for generating Variable objects,35968931,closed,0,,,6,2023-11-02T17:04:03Z,2023-12-05T22:45:57Z,2023-12-05T22:45:57Z,MEMBER,,0,pydata/xarray/pulls/8404,"Breaks out just the part of #6908 needed for generating arbitrary `xarray.Variable` objects. (so ignore the ginormous number of commits) EDIT: [Check out this test](https://github.com/pydata/xarray/pull/8404#discussion_r1382313965) which performs a mean on any subset of any Variable object! ```python In [36]: from xarray.testing.strategies import variables In [37]: variables().example() array([-2.22507386e-313-6.62447795e+016j, nan-6.46207519e+185j, -2.22507386e-309+3.33333333e-001j]) ``` @andersy005 @maxrjones @jhamman I thought this might be useful for the `NamedArray` testing. (xref #8370 and #8244) @keewis and @Zac-HD sorry for letting that PR languish for literally a year :sweat_smile: This PR addresses [your feedback about accepting a callable](https://github.com/pydata/xarray/pull/6908#discussion_r974956861) that returns a strategy generating arrays. That suggestion makes some things a bit more complex in user code but actually allows me to simplify the internals of the `variables` strategy significantly. I'm actually really happy with this PR - I think it solves what we were discussing, and is a sensible checkpoint to merge before going back to making strategies for generating composite objects like DataArrays/Datasets work. - [x] Closes part of #6911 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8404/reactions"", ""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 1, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1200309334,PR_kwDOAMm_X842BOIk,6471,Support **kwargs form in `.chunk()`,35968931,closed,0,,,6,2022-04-11T17:37:38Z,2022-04-12T03:34:49Z,2022-04-11T19:36:40Z,MEMBER,,0,pydata/xarray/pulls/6471,"Also adds some explicit tests (and type hinting) for `Variable.chunk()`, as I don't think it had dedicated tests before. - [x] Closes #6459 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6471/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 1020282789,I_kwDOAMm_X8480Eel,5843,Why are `da.chunks` and `ds.chunks` properties inconsistent?,35968931,closed,0,,,6,2021-10-07T17:21:01Z,2021-10-29T18:12:22Z,2021-10-29T18:12:22Z,MEMBER,,,,"Basically the title, but what I'm referring to is this: ```python In [2]: da = xr.DataArray([[0, 1], [2, 3]], name='foo').chunk(1) In [3]: ds = da.to_dataset() In [4]: da.chunks Out[4]: ((1, 1), (1, 1)) In [5]: ds.chunks Out[5]: Frozen({'dim_0': (1, 1), 'dim_1': (1, 1)}) ``` Why does `DataArray.chunks` return a tuple and `Dataset.chunks` return a frozen dictionary? This seems a bit silly, for a few reasons: 1) it means that some perfectly reasonable code might fail unnecessarily if passed a DataArray instead of a Dataset or vice versa, such as ```python def is_core_dim_chunked(obj, core_dim): return len(obj.chunks[core_dim]) > 1 ``` which will work as intended for a dataset but raises a `TypeError` for a dataarray. 2) it breaks the pattern we use for `.sizes`, where ```python In [14]: da.sizes Out[14]: Frozen({'dim_0': 2, 'dim_1': 2}) In [15]: ds.sizes Out[15]: Frozen({'dim_0': 2, 'dim_1': 2}) ``` 3) if you want the chunks as a tuple they are always accessible via `da.data.chunks`, which is a more sensible place to look to find the chunks without dimension names. 4) It's an undocumented difference, as the docstrings for `ds.chunks` and `da.chunks` both only say `""""""Block dimensions for this dataset’s data or None if it’s not a dask array.""""""` which doesn't tell me anything about the return type, or warn me that the return types are different. EDIT: In fact `DataArray.chunk` doesn't even appear to be listed on the API docs page at all. In our codebase this difference is mostly washed out by us using `._to_temp_dataset()` all the time, and also by the way that the `.chunk()` method accepts both the tuple and dict form, so both of these invariants hold (but in different ways): ``` ds == ds.chunk(ds.chunks) da == da.chunk(da.chunks) ``` I'm not sure whether making this consistent is worth the effort of a significant breaking change though :confused: (Sort of related to https://github.com/pydata/xarray/issues/2103)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5843/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 1033884661,PR_kwDOAMm_X84tkKtA,5886,Use .to_numpy() for quantified facetgrids ,35968931,closed,0,,,6,2021-10-22T19:25:24Z,2021-10-28T22:42:43Z,2021-10-28T22:41:59Z,MEMBER,,0,pydata/xarray/pulls/5886,"Follows on from https://github.com/pydata/xarray/pull/5561 by replacing `.values` with `.to_numpy()` in more places in the plotting code. This allows `pint.Quantity` arrays to be plotted without issuing a `UnitStrippedWarning` (and will generalise better to other duck arrays later). I noticed the need for this when trying out [this example](https://pint-xarray.readthedocs.io/en/latest/examples/plotting.html#plot) (but trying it without the `.dequantify()` call first). (@Illviljan in theory `.values` should be replaced with `.to_numpy()` everywhere in the plotting code by the way) - [ ] Closes #xxxx - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5886/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 935317034,MDExOlB1bGxSZXF1ZXN0NjgyMjU1NDE5,5561,Plots get labels from pint arrays,35968931,closed,0,,,6,2021-07-02T00:44:28Z,2021-07-21T23:06:21Z,2021-07-21T22:38:34Z,MEMBER,,0,pydata/xarray/pulls/5561,"Stops you needing to call `.pint.dequantify()` before plotting. Builds on top of #5568, so that should be merged first. - [x] Closes (1) from https://github.com/pydata/xarray/issues/3245#issue-484240082 - [x] Tests added - [x] Tests passing - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5561/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 367763373,MDU6SXNzdWUzNjc3NjMzNzM=,2473,Recommended way to extend xarray Datasets using accessors?,35968931,closed,0,,,6,2018-10-08T12:19:21Z,2018-10-31T09:58:05Z,2018-10-31T09:58:05Z,MEMBER,,,,"Hi, I'm now regularly using xarray (& dask) for organising and analysing the output of the simulation code I use ([BOUT++](https://boutproject.github.io/)) and it's very helpful, thank you!. However my current approach is quite clunky at dealing the extra information and functionality that's specific to the simulation code I'm using, and I have questions about what the recommended way to extend the xarray Dataset class is. This seems like a general enough problem that I thought I would make an issue for it. ### Desired What I ideally want to do is extend the xarray.Dataset class to accommodate extra attributes and methods, while retaining as much xarray functionality as possible, but avoiding reimplementing any of the API. This might not be possible, but ideally I want to make a `BoutDataset` class which contains extra attributes to hold information about the run which doesn't naturally fit into the xarray data model, extra methods to perform analysis/plotting which only users of this code would require, but also be able to use xarray-specific methods and top-level functions: ```python bd = BoutDataset('/path/to/data') ds = bd.data # access the wrapped xarray dataset extra_data = bd.extra_data # access the BOUT-specific data bd.isel(time=-1) # use xarray dataset methods bd2 = BoutDataset('/path/to/other/data') concatenated_bd = xr.concat([bd, bd2]) # apply top-level xarray functions to the data bd.plot_tokamak() # methods implementing bout-specific functionality ``` ### Problems with my current approach I have read the documentation about [extending xarray](http://xarray.pydata.org/en/stable/internals.html#extending-xarray), and the issue threads about subclassing Datasets (#706) and accessors (#1080), but I wanted to check that what I'm doing is the recommended approach. Right now I'm [trying](https://github.com/TomNicholas/xcollect/blob/master/boutdataset.py) to do something like ```python @xr.register_dataset_accessor('bout') class BoutDataset: def __init__(self, path): self.data = collect_data(path) # collect all my numerical data from output files self.extra_data = read_extra_data(path) # collect extra data about the simulation def plot_tokamak(): plot_in_bout_specific_way(self.data, self.extra_data) ``` which works in the sense that I can do ```python bd = BoutDataset('/path/to/data') ds = bd.bout.data # access the wrapped xarray dataset extra_data = bd.bout.extra_data # access the BOUT-specific data bd.bout.plot_tokamak() # methods implementing bout-specific functionality ``` but not so well with ```python bd.isel(time=-1) # AttributeError: 'BoutDataset' object has no attribute 'isel' bd.bout.data.isel(time=-1) # have to do this instead, but this returns an xr.Dataset not a BoutDataset concatenated_bd = xr.concat([bd1, bd2]) # TypeError: can only concatenate xarray Dataset and DataArray objects, got concatenated_ds = xr.concat([bd1.bout.data, bd2.bout.data]) # again have to do this instead, which again returns an xr.Dataset not a BoutDataset ``` If I have to reimplement the APl for methods like `.isel()` and top-level functions like `concat()`, then why should I not just subclass `xr.Dataset`? There aren't very many top-level xarray functions so reimplementing them would be okay, but there are loads of Dataset methods. However I think I know how I want my `BoutDataset` class to behave when an `xr.Dataset` method is called on it: I want it to implement that method on the underlying dataset and return the full BoutDatset with extra data and attributes still attached. Is it possible to do something like: ""if calling an `xr.Dataset` method on an instance of `BoutDataset`, call the corresponding method on the wrapped dataset and return a BoutDataset that has the extra BOUT-specific data propagated through""? Thanks in advance, apologies if this is either impossible or relatively trivial, I just thought other xarray users might have the same questions.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2473/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue