id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1945654275,PR_kwDOAMm_X85c7HL_,8319,Move parallelcompat and chunkmanagers to NamedArray,35968931,closed,0,,,9,2023-10-16T16:34:26Z,2024-02-12T22:09:24Z,2024-02-12T22:09:24Z,MEMBER,,0,pydata/xarray/pulls/8319,"@dcherian I got to this point before realizing that simply moving `parallelcompat.py` over isn't [what it says in the design doc](https://github.com/pydata/xarray/blob/main/design_notes/named_array_design_doc.md#appendix-implementation-details), which instead talks about > - Could this functionality be left in Xarray proper for now? Alternative array types like JAX also have some notion of ""chunks"" for parallel arrays, but the details differ in a number of ways from the Dask/Cubed. > - Perhaps variable.chunk/load methods should become functions defined in xarray that convert Variable objects. This is easy so long as xarray can reach in and replace `.data` I personally think that simply moving parallelcompat makes sense so long as you expect people to use chunked `NamedArray` objects. I see the chunked arrays as special cases of duck arrays, and my understanding is that `NamedArray` is supposed to have full support for duckarrays. cc @andersy005 - [x] As requested in #8238 - [ ] ~~Tests added~~ - [ ] ~~User visible changes (including notable bug fixes) are documented in `whats-new.rst`~~ - [ ] ~~New functions/methods are listed in `api.rst`~~ ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8319/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 2088695240,I_kwDOAMm_X858fvXI,8619,Docs sidebar is squished,35968931,open,0,,,9,2024-01-18T16:54:55Z,2024-01-23T18:38:38Z,,MEMBER,,,,"### What happened? Since the v2024.01.0 release yesterday, there seems to be a rendering error in the website - the sidebar is squished up to the left: ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8619/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,reopened,13221727,issue 1084220684,PR_kwDOAMm_X84wDPg5,6086,Type protocol for internal variable mapping,35968931,closed,0,,,9,2021-12-19T23:32:04Z,2023-12-06T17:20:48Z,2023-12-06T17:19:30Z,MEMBER,,1,pydata/xarray/pulls/6086,"In #5961 and #6083 I've been experimenting extending `Dataset` to store variables in a custom mapping object (instead of always in a `dict`), so as to eventually fix [this mutability problem](https://github.com/TomNicholas/datatree/issues/38) with `DataTree`. I've been writing out new storage class implementations in those PRs, but on Friday @shoyer suggested that I could instead simply alter the allowed type for `._variables` in `xarray.Dataset`'s type hints. That would allow me to mess about with storage class implementations outside of xarray, whilst guaranteeing type compatibility with xarray `main` itself with absolutely minimal changes (hopefully no runtime changes to `Dataset` at all!). The idea is to define a [protocol](https://www.python.org/dev/peps/pep-0544/) in xarray which specifies the structural subtyping behaviour of any custom variable storage class that I might want to set as `Dataset._variables`. The type hint for the `._variables` attribute then refers to this protocol, and will be satisfied as long as whatever object is set as `._variables` has compatibly-typed methods. Adding type hints to the `._construct_direct` and `._replace` constructors is enough to propagate this new type specification all over the codebase. In practice this means writing a protocol which describes the type behaviour of all the methods on `dict` that currently get used by `._variable` accesses. So far I've written out a `CopyableMutableMapping` protocol which defines all the methods needed. The issues I'm stuck on at the moment are: 1) The typing behaviour of overloaded methods, specifically `update`. (`setdefault` also has similar problems but I think I can safely omit that from the protocol definition because we don't call `._variables.setdefault()` anywhere.) Mypy complains that `CopyableMutableMapping` is not a compatible type when `Dict` is specified because the type specification of overloaded methods isn't quite right somehow: ``` xarray/core/computation.py:410: error: Argument 1 to ""_construct_direct"" of ""Dataset"" has incompatible type ""Dict[Hashable, Variable]""; expected ""CopyableMutableMapping[Hashable, Variable]"" [arg-type] xarray/core/computation.py:410: note: Following member(s) of ""Dict[Hashable, Variable]"" have conflicts: xarray/core/computation.py:410: note: Expected: xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, other: Mapping[Hashable, Variable], **kwargs: Variable) -> None xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, other: Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None xarray/core/computation.py:410: note: <1 more overload not shown> xarray/core/computation.py:410: note: Got: xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, Mapping[Hashable, Variable], **kwargs: Variable) -> None xarray/core/computation.py:410: note: @overload xarray/core/computation.py:410: note: def update(self, Iterable[Tuple[Hashable, Variable]], **kwargs: Variable) -> None ``` I don't understand what the inconsistency is because I literally looked up the exact way that [the type stubs](https://github.com/python/typeshed/blob/e6911530d4d52db0fbdf05be3aff89e520ee39bc/stdlib/typing.pyi#L490) for `Dict` were written (via `MutableMapping`). 2) Making functions which expect a `Mapping` accept my `CopyableMutableMapping`. I would have thought this would just work because I think my protocol defines all the methods which `Mapping` has, so `CopyableMutableMapping` should automatically become a subtype of `Mapping`. But instead I get errors like this with no further information as to what to do about it. ```xarray/core/dataset.py:785: error: Argument 1 to ""Frozen"" has incompatible type ""CopyableMutableMapping[Hashable, Variable]""; expected ""Mapping[Hashable, Variable]"" [arg-type]``` 3) I'm expecting to get a runtime problem whenever we `assert isinstance(ds._variables, dict)`, which happens in a few places. I'm no sure what the best way to deal with that is, but I'm hoping that simply [adding `@typing.runtime_checkable`](https://www.python.org/dev/peps/pep-0544/#runtime-checkable-decorator-and-narrowing-types-by-isinstance) to the protocol class definition will be enough? Once that passes mypy I will write a test that checks that if I define my own custom variable storage class I can `_construct_direct` a `Dataset` which uses it without any errors. At that point I can be confident that `Dataset` is general enough to hold whichever exact variable storage class I end up needing for `DataTree`. @max-sixty this is entirely a typing challenge, so I'm tagging you in case you're interested :) - [ ] Would supercede #5961 and #6083 - [ ] Tests added - [ ] Passes `pre-commit run --all-files` EDIT: Also using `Protocol` at all is only available in Python 3.8+","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6086/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 602218021,MDU6SXNzdWU2MDIyMTgwMjE=,3980,Make subclassing easier?,35968931,open,0,,,9,2020-04-17T20:33:13Z,2023-10-04T16:27:28Z,,MEMBER,,,,"### Suggestion We relatively regularly have [users](https://github.com/pydata/xarray/issues/728) [asking](https://github.com/pydata/xarray/issues/3959) [about](https://groups.google.com/forum/#!topic/xarray/wzprk6M-Mfg) [subclassing](https://github.com/pydata/xarray/issues/706) `DataArray` and `Dataset`, and I know of at least a few cases where people have [gone](https://github.com/pennmem/ptsa_new/blob/master/ptsa/data/timeseries.py) [through](https://github.com/pydata/xarray/issues/2176#issuecomment-391470885) with it. However we currently [explicitly discourage doing this](https://docs.xarray.dev/en/stable/internals/extending-xarray.html#composition-over-inheritance), on the basis that basically all operations will return a bare xarray object instead of the subclassed version, it's full of trip hazards, and we have the accessor interface to point people to instead. However, while useful, the accessors aren't enough for some users, and I think we could probably do better. If we refactored internally we might be able to make it much easier to subclass. ### Example to follow in Pandas Pandas takes an interesting approach: while they also explicitly discourage subclassing, they still try to make it easier, and [show you what you need to do](https://pandas.pydata.org/docs/development/extending.html#subclassing-pandas-data-structures) in order for it to work. They ask you to override some constructor properties with your own, and allow you to define your own original properties. ### Potential complications - `.construct_dataarray` and `DataArray.__init__` are used a lot internally to reconstruct a DataArray from `dims`, `coords`, `data` etc. before returning the result of a method call. We would probably need to standardise this, before allowing users to override it. - Pandas actually has multiple constructor properties you need to override: `_constructor`, `_constructor_sliced`, and `_constructor_expanddim`. What's the minimum set of similar constructors we would need? - Blocking access to attributes - we current stop people from adding their own attributes quite aggressively, so that we can have attributes as an alias for variables and attrs, we would need to either relax this or better allow users to set a list of their own `_properties` which they want to register, similar to pandas. - `__slots__` - I think something funky can happen if you inherit from a class that defines `__slots__`? ### Documentation I think if we do this we should also slightly refactor the relevant docs to make clear the distinction between 3 groups of people: - **Users** - People who import and use xarray at the top-level with (ideally) no particular concern as to how it works. This is who the vast majority of the documentation is for. - **Developers** - People who are actually improving and developing xarray upstream. This is who the [Contributing to xarray](http://xarray.pydata.org/en/stable/contributing.html) page is for. - **Extenders** - People who want to subclass, accessorize or wrap xarray objects, in order to do something more complicated. These people are probably writing a domain-specific library which will then bring in a new set of users. There maybe aren't as many of these people, but they are really important IMO. This is implicitly who the [xarray internals](http://xarray.pydata.org/en/stable/internals.html#xarray-internals) page is aimed at, but it would be nice to make that distinction much more clear. It might also be nice to give them a guide as to ""I want to achieve X, should I use wrapping/subclassing/accessors?"" @max-sixty you had some ideas about what would need to be done for this to work?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3980/reactions"", ""total_count"": 11, ""+1"": 11, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 936045730,MDExOlB1bGxSZXF1ZXN0NjgyODYzMjgz,5568,Add to_numpy() and as_numpy() methods,35968931,closed,0,,,9,2021-07-02T20:17:40Z,2021-07-21T22:06:47Z,2021-07-21T21:42:48Z,MEMBER,,0,pydata/xarray/pulls/5568," - [x] Closes #3245 - [x] Tests added - [x] Passes `pre-commit run --all-files` - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` - [x] New functions/methods are listed in `api.rst` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/5568/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 671609109,MDU6SXNzdWU2NzE2MDkxMDk=,4300,General curve fitting method,35968931,closed,0,,,9,2020-08-02T12:35:49Z,2021-03-31T16:55:53Z,2021-03-31T16:55:53Z,MEMBER,,,,"Xarray should have a general curve-fitting function as part of its main API. ## Motivation Yesterday I wanted to fit a simple decaying exponential function to the data in a DataArray and realised there currently isn't an immediate way to do this in xarray. You have to either pull out the `.values` (losing the power of dask), or use `apply_ufunc` (complicated). This is an incredibly common, domain-agnostic task, so although I don't think we should support various kinds of unusual optimisation procedures (which could always go in an extension package instead), I think a basic fitting method is within scope for the main library. There are [SO questions](https://stackoverflow.com/questions/62987617/using-scipy-curve-fit-with-dask-xarray) asking how to achieve this. We already have [`.polyfit` and `polyval` anyway](https://github.com/pydata/xarray/pull/3733/files#), which are more specific. (@AndrewWilliams3142 and @aulemahal I expect you will have thoughts on how implement this generally.) ## Proposed syntax I want something like this to work: ```python def exponential_decay(xdata, A=10, L=5): return A*np.exp(-xdata/L) # returns a dataset containing the optimised values of each parameter fitted_params = da.fit(exponential_decay) fitted_line = exponential_decay(da.x, A=fitted_params['A'], L=fitted_params['L']) # Compare da.plot(ax) fitted_line.plot(ax) ``` It would also be nice to be able to fit in multiple dimensions. That means both for example fitting a 2D function to 2D data: ```python def hat(xdata, ydata, h=2, r0=1): r = xdata**2 + ydata**2 return h*np.exp(-r/r0) fitted_params = da.fit(hat) fitted_hat = hat(da.x, da.y, h=fitted_params['h'], r0=fitted_params['r0']) ``` but also repeatedly fitting a 1D function to 2D data: ```python # da now has a y dimension too fitted_params = da.fit(exponential_decay, fit_along=['x']) # As fitted_params now has y-dependence, broadcasting means fitted_lines does too fitted_lines = exponential_decay(da.x, A=fitted_params.A, L=fitted_params.L) ``` The latter would be useful for fitting the same curve to multiple model runs, but means we need some kind of `fit_along` or `dim` argument, which would default to all dims. So the method docstring would end up like ```python def fit(self, f, fit_along=None, skipna=None, full=False, cov=False): """""" Fits the function f to the DataArray. Expects the function f to have a signature like `result = f(*coords, **params)` for example `result_da = f(da.xcoord, da.ycoord, da.zcoord, A=5, B=None)` The names of the `**params` kwargs will be used to name the output variables. Returns ------- fit_results - A single dataset which contains the variables (for each parameter in the fitting function): `param1` The optimised fit coefficients for parameter one. `param1_residuals` The residuals of the fit for parameter one. ... """""" ``` ## Questions 1) Should it wrap `scipy.optimise.curve_fit`, or reimplement it? Wrapping it is simpler, but as it just calls `least_squares` [under the hood](https://github.com/scipy/scipy/blob/v1.5.2/scipy/optimize/minpack.py#L532-L834) then reimplementing it would mean we could use the dask-powered version of `least_squares` (like [`da.polyfit does`](https://github.com/pydata/xarray/blob/9058114f70d07ef04654d1d60718442d0555b84b/xarray/core/dataset.py#L5987)). 2) What form should we expect the curve-defining function to come in? `scipy.optimize.curve_fit` expects the curve to act as `ydata = f(xdata, *params) + eps`, but in xarray then `xdata` could be one or multiple coords or dims, not necessarily a single array. Might it work to require a signature like `result_da = f(da.xcoord, da.ycoord, da.zcoord, ..., **params)`? Then the `.fit` method would be work out how many coords to pass to `f` based on the dimension of the `da` and the `fit_along` argument. But then the order of coord arguments in the signature of `f` would matter, which doesn't seem very xarray-like. 3) Is it okay to inspect parameters of the curve-defining function? If we tell the user the curve-defining function has to have a signature like `da = func(*coords, **params)`, then we could read the names of the parameters by inspecting the function kwargs. Is that a good idea or might it end up being unreliable? Is the `inspect` standard library module the right thing to use for that? This could also be used to provide default guesses for the fitting parameters.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4300/reactions"", ""total_count"": 4, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 1}",,completed,13221727,issue 349026158,MDU6SXNzdWUzNDkwMjYxNTg=,2355,Animated plots - a suggestion for implementation,35968931,closed,0,,,9,2018-08-09T08:23:17Z,2020-08-16T08:07:12Z,2020-08-16T08:07:12Z,MEMBER,,,,"**It'd be awesome if one could animate the plots xarray creates using matplotlib just by specifying the dimension over which to animate the plot.** This would allow for rapid visualisation of time-evolving data and could potentially be very powerful (imagine a grid of faceted 2d plots, all evolving together over time). I know that there are already some libraries which can create animated plots of xarray data (e.g. Holoviews), but I think that it's within xarray's scope (#2030) to add another dimension to its default matplotlib-style plotting capabilities. **How?** I saw this new package for making it easier to animate matplotlib plots using the funcanimation module: [animatplot](https://github.com/t-makaro/animatplot). It essentially works by wrapping matplotlib commands like `plt.imshow()` to instead return ""blocks"". These blocks can then be animated by feeding them into an `animation` class. An introductory script to plot line data can be found [here](https://animatplot.readthedocs.io/en/latest/tutorial/getting_started..html), but basically has the form ```python import animatplot as amp import matplotlib.pyplot as plt X, Y = load_data_somehow block = amp.blocks.Line(X, Y) anim = amp.Animation([block]) anim.save_gif(""animated_line"") plt.show() ``` which creates a basic gif like this: ![animated line gif](https://user-images.githubusercontent.com/35968931/43885402-a3373002-9b6d-11e8-9b3d-f4e588a71a22.gif) I think that it might be possible to integrate this kind of animation-plotting tool by adding an optional dimension argument to xarray's plotting methods, which if given causes the function to call the wrapped animatplot plotting command instead of the bare matplotlib one. It would then return the corresponding ""block"" ready to be animated. Using the resulting code might only require a few lines to create an impressive visualisation: ```python turb2d = xr.load_dataset(""turbulent_fluid_data.nc"") block = turb2d[""density""].plot.imshow(animate_over='time') anim = Animation([block]) anim.save_gif(""fluid_density.gif"") plt.show() ``` ![n_over_time](https://user-images.githubusercontent.com/35968931/43887058-83d4161c-9b72-11e8-978d-fcb8e071a37a.gif) **What would need changing?** If we take the `da.plot.imshow()` example, then the way I'm imagining this would be done is to add the optional argument `animate_over` to the `plot_2d` decorator, and use it to choose between returning the matplotlib artist (as it does currently) or the ""block"". It would also mean altering the logic inside `plot_2d` and `imshow` to account for the fact you would be calling this on a 3D dataarray instead of a 2D one. I wanted to ask about this before delving into the code too much or submitting a pull request, in case there is some problem with the idea. What do you think?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2355/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue