github: issues: 15 rows where repo = 13221727, state = "open" and user = 43316012 sorted by updated

15 rows where repo = 13221727, state = "open" and user = 43316012 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	author_association	draft	pull_request	body	reactions	repo	type
2021585639	PR_kwDOAMm_X85g77tr	8503	Add option to define custom format of units in plots	headtr1ck 43316012	open	5	2023-12-01T21:09:18Z	2024-02-02T22:09:11Z	COLLABORATOR	0	pydata/xarray/pulls/8503	[x] Tests added [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst` We encountered the issue that we should plot units as `(unit)` instead of `[unit]`. This PR enables us to do exactly this, easier to change this at the source ;) I think setting this as a global option is the correct approach, but feel free to propose alternatives :)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8503/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
2024737017	PR_kwDOAMm_X85hGgaB	8520	Allow configuring open_dataset via backend instances	headtr1ck 43316012	open	9	2023-12-04T21:03:12Z	2024-01-14T21:40:38Z	COLLABORATOR	0	pydata/xarray/pulls/8520	Support passing instances of `BackendEntryPoints` as the `engine` argument. Closes #8447 Then instead of passing a long list of options to the `open_dataset` method directly, you can also configure the entrypoint in the constructor and pass it as the engine. It would look something like this: `python engine = NetCDF4BackendEntrypoint(mode="a", clobber=False) ds = xr.open_dataset("some_file.nc", engine=engine)` While this is actually even more lines of code, the main advantage is to have better discoverability of the options. TODO: [x] Adapt netcdf4 backend [x] Adapt h5netcdf backend [x] Find out if h5netcdf backend should have "autoclose" and "mode" options (https://github.com/pydata/xarray/pull/8520#pullrequestreview-1769368001_) [x] What to do with "decode_vlen_strings" option in h5netcdf (was this deprecated?) [x] Adapt zarr backend [x] Adapt scipy backend [x] Adapt pydap backend [ ] `output_grid` seems to be always set to `True`? is this intentional, why not remove it instead? [x] ~`verify` and `user_charset` are non-existent in pydap?~ > I still had pydap version 3.2, in 3.4 they exist... [x] typing is only my first impression. Not easy if upstream libs are untyped :/ [x] ~Adapt pynio backend~ > Won't adapt because deprecated [x] Fix docstrings to include init options [x] Check if `lock=True` is allowed > Not allowed, otherwise scipy backend breaks [ ] Change default to `lock=True` instead of `None`? Maybe a later PR? [ ] Rename `XXXBackendEntrypoint` > `XXXBackend` ? [x] ~The `autoclose` argument seems to do nothing?~ > Actually it is used in `BaseNetCDF4Array`, all good [x] ~Move `group` to open_dataset instead of backend option?~ > Its not really a decoder either. Not sure, for now leave it in the init... [ ] Improve `_resolve_decoders_kwargs`, this function has a lot of implicit assumtions? Maybe remove `open_dataset_parameters` alltogether? [x] Add tests for passing backend directly via engine argument [x] `open_dataset` now has `**kwargs` to support backwards compatibility. Probably we should raise if unsupported stuff is added (i.e. typos) otherwise this could be confusing? (i.e. see test in zarr that checks for deprecated `auto_chunk`)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8520/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
2038622503	I_kwDOAMm_X855gukn	8548	Shaping the future of Backends	headtr1ck 43316012	open	3	2023-12-12T22:08:50Z	2023-12-15T17:14:59Z	COLLABORATOR			What is your issue? Backends in xarray are used to read and write files (or in general objects) and transform them into useful xarray Datasets. This issue will collect ideas on how to continuously improve them. Current state Along the reading and writing process there are many implicit and explicit configuration possibilities. There are many backend specific options and many en-,decoder specific options. Most of them are currently difficult or even impossible to discover. There is the infamous `open_dataset` method which can do everything, but there are also some specialized methods like `open_zarr` or `to_netcdf`. The only really formalized way to extend xarray capabilities is via the `BackendEntrypoint`. Currently only for reading files. This has proven to work and things are going so well that people are discussing getting rid of the special reading methods (#7495). A major critique in this thread is again the discoverability of configuration options. Problems To name a few: Discoverability of configuration options is poor No distinction between backend and encoding options New options are simply added as another keyword argument to `open_dataset` No writing support for backends What already improved Adding URL and description attributes to the backends (#7000, #7200) Add static typing Allow creating instances of backends with their respective options (#8520) The future After listing all the problems, lets see how we can improve the situation and make backends an allrounder solution to reading and writing all kinds of files. What happens behind the scenes In general the reading and writing of Datasets in xarray is a three-step process. `[ done by backend.open_dataset] Dataset < chunking < decoding < opening_in_store < file Dataset > validating > encoding > storing_in_store > file` Probably you could consider combining the chunking and decoding as well as validation and encoding into a single logical step in the pipeline. This view should help decide how to set up a future architecture of backends. You can see that there is a common middle object in this process, a in-memory representation of the file on disc between en-, decoding and the abstract store. This is actually a `xarray.Dataset` and is internally called a "backend dataset". `write_dataset` method A quite natural extension of backends would be to implement a `write_dataset` method (name pending). This would allow backends to fulfill the complete right side of the pipeline. Transformer class Due to a lack of a common word for a class that handles "encoding" and "decoding" I will call them transformer here. The process of en- and decoding is currently done "hardcoded" by the respective `open_dataset` and `to_netcdf` methods. One could imagine to introduce the concept of a common class that handles both. This class could handle the implemented CF or netcdf encoding conventions. But it would also allow users to define their own storing conventions (Why not create a custom transformer that adds indexes based on variable attributes?) The possibilities are endless, and an interface that fulfills all the requirements still has to be found. This would homogenize the reading and writing process to `Dataset <> Transformer <> Backend <> file` As a bonus this would increase discoverability of the configuration options of the decoding options (then transformer arguments). The new interface then could be `python backend = Netcdf4BackendEntrypoint(group="data") decoder = CFTransformer(cftime=True) ds = xr.open_dataset("file.nc", engine=backend, decoder=decoder)` while of course still allowing to pass all options simply as kwarg (since this is still the easiest way of telling beginners how to open files) The final improvement here would be to add additional entrypoints for these transformers ;) Disclaimer Now this issue is just a bunch of random ideas that require quite some refinement or they might even turn out to be nonsense. So lets have a exciting discussion about these things :) If you have something to add to the above points I will include your ideas as well. This is meant as a collection of ideas on how to improve our backends :)	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8548/reactions", "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1943539215	PR_kwDOAMm_X85c0AkW	8309	Move variable typed ops to NamedArray	headtr1ck 43316012	open	1	2023-10-14T20:22:07Z	2023-10-26T21:55:01Z	COLLABORATOR	1	pydata/xarray/pulls/8309	xref https://github.com/pydata/xarray/issues/8238 This is highly WIP and probably everything is broken right now... Just creating this now, so other people don't work on the same :) Feel free to continue here with me. @pydata/xarray 1. what do we do with commonly used functions, is it ok to copy them? 2. Moving the typed ops requires a lot of functions to be added to NamedArray, is there a consensus of what we want to move? Is it basically everything? 3. Slowly the utils module is becomming a graveyard of stuff we dont want to put elsewhere, maybe we should at least move the typing stuff over to a types module.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8309/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
1928972239	PR_kwDOAMm_X85cC_Wb	8276	Give NamedArray Generic dimension type	headtr1ck 43316012	open	3	2023-10-05T20:02:56Z	2023-10-16T13:41:45Z	COLLABORATOR	1	pydata/xarray/pulls/8276	[x] Towards #8199 [ ] Tests added [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst` [ ] New functions/methods are listed in `api.rst` This aims at making the dimenion type a generic parameter. I thought I will start with NamedArray when testing this out because it is much less interconnected.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8276/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
1899895419	I_kwDOAMm_X85xPhp7	8199	Use Generic Types instead of Hashable or Any	headtr1ck 43316012	open	2	2023-09-17T19:41:39Z	2023-09-18T14:16:02Z	COLLABORATOR			Is your feature request related to a problem? Currently, part of the static type of a DataArray or Dataset is a `Mapping[Hashable, DataArray]`. I'm quite sure that 99% of the users will actually use `str` key values (aka. variable names), while some exotic people (me included) want to use e.g. Enums for their keys. Currently, we allow to use anything as keys as long as it is hashable, but once the DataArray/set is created, the type information of the keys is lost. Consider e.g. ```python for name, da in Dataset({"a": ("t", np.arange(5))}).items(): reveal_type(name) # hashable reveal_type(da.dims) # tuple[hashable, ...] `` Woudn't that be nice if this would actually returnstr`, so you don't have to cast it or assert it everytime? This could be solved by making these classes generic. Another related issue is the underlying data. This could be introduced as a Generic type as well. Probably, this should reach some common ground on all wrapping array libs that are out there. Every one should use a Generic Array class that keeps track of the type of the wrapped array, e.g. `dask.array.core.Array[np.ndarray]`. In return, we could do `DataArray[np.ndarray]` or then `DataArray[dask.array.core.Array[nd.ndarray]]`. Describe the solution you'd like The implementation would be something along the lines of: ```python KeyT = TypeVar("KeyT", bound=Hashable) DataT = TypeVar("DataT", bound=<some protocol?>) class DataArray(Generic[KeyT, DataT]): `_coords: dict[KeyT, Variable[DataT]] _indexes: dict[KeyT, Index[DataT]] _name: KeyT \| None _variable: Variable[DataT] def __init__( self, data: DataT = dtypes.NA, coords: Sequence[Sequence[DataT] \| pd.Index \| DataArray[KeyT]] \| Mapping[KeyT, DataT] \| None = None, dims: str \| Sequence[KeyT] \| None = None, name: KeyT \| None = None, attrs: Mapping[KeyT, Any] \| None = None, # internal parameters indexes: Mapping[KeyT, Index] \| None = None, fastpath: bool = False, ) -> None: ...` ``` Now you could create a "classical" DataArray: ```python da = DataArray(np.arange(10), {"t": np.arange(10)}, dims=["t"]) will be of type DataArray[str, np.ndarray] `while you could also create something more fancy`python da2 = DataArray(dask.array.array([1, 2, 3]), {}, dims=[("tup1", "tup2),]) will be of type DataArray[tuple[str, str], dask.array.core.Array] ``` Any whenever you access the dimensions / coord names / underlying data you will get the correct type. For now I only see three mayor problems: 1) non-array types (like lists or anything iterable) will get cast to a `np.ndarray` and I have no idea how to tell the type checker that `DataArray([1, 2, 3], {}, "a")` should be `DataArray[str, np.ndarray]` and not `DataArray[str, list[int]]`. Depending on the Protocol in the bound TypeVar this might even fail static type analysis or require tons of special casing and overloads. 2) How does the type checker extract the dimension type for Datasets? This is quite convoluted and I am not sure this can be typed correctly... 3) The parallel compute workflows are quite dynamic and I am not sure if static type checking can keep track of the underlying datatype... What does `DataArray([1, 2, 3], dims="a").chunk({"a": 2})` return? Is it `DataArray[str, dask.array.core.Array]`? But what about other chunking frameworks? Describe alternatives you've considered One could even extend this and add more Generic types. Different types for dimensions and variable names would be a first (and probably quite a nice) feature addition. One could even go so far and type the keys and values of variables and coords (for Datasets) differently. This came up e.g. in https://github.com/pydata/xarray/issues/3967 However, this would create a ridiculous amount of Generic types and is probably more confusing than helpful. Additional context Probably this feature should be done in consecutive PRs that each implement one Generic each, otherwise this will be a giant task!	{ "url": "https://api.github.com/repos/pydata/xarray/issues/8199/reactions", "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1275752720	I_kwDOAMm_X85MCnEQ	6704	Future of `DataArray.rename`	headtr1ck 43316012	open	11	2022-06-18T10:14:43Z	2023-09-11T00:53:31Z	COLLABORATOR			What is your issue? In https://github.com/pydata/xarray/pull/6665 the question came up what to do with `DataArray.rename` in light of the new index refactor. To be consistent with `Dataset` we should introduce a `DataArray.rename_dims` `DataArray.rename_vars` `DataArray.rename` Several open questions about the behavior (Similar things apply to `Dataset.rename{, _dims, _vars}`): [ ] Should `rename_dims` also rename indexes (dimension coordinates)? [ ] Should `rename_vars` also rename the DataArray? [ ] What to do if the `DataArray` has the same name as one of its coordinates? [ ] Should `rename` still rename everything (like it is now) or only the name (Possibly with some deprecation cycle)? The current implementation of `DataArray.rename` is a bit inconsistent: As stated by @max-sixty in https://github.com/pydata/xarray/issues/6665#issuecomment-1154368202_: - rename operates on DataArray as described in https://github.com/pydata/xarray/pull/6665#issuecomment-1150810485.%C2%A0Generally I'm less keen on "different types have different semantics", and here a positional arg would mean a DataArray rename, and kwarg would mean var rename. But it does work locally to DataArray quite well. - rename only exists on DataArrays for the name of the DataArray, and we use rename_vars & rename_dims for both DataArrays & Datasets. So Dataset.rename is soft-deprecated.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6704/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1548948097	I_kwDOAMm_X85cUxKB	7457	Typing of internal datatypes	headtr1ck 43316012	open	5	2023-01-19T11:08:43Z	2023-01-19T19:49:19Z	COLLABORATOR			Is your feature request related to a problem? Currently there is no static typing of the underlying data structures used in `DataArray`s. Simply running `reveal_type(da.data)` returns `Any`. Adding static typing support to that is unfortunately non-trivial since xarray supports a wide variety of duck-types. This also comes with internal typing difficulties. Describe the solution you'd like I think the way to go is making the `DataArray` class generic in it's underlying data type. Something like `DataArray[np.ndarray]` or `DataArray[dask.array]`. The implementation would require a TypeVar that is bound to some minimal required Protocol for internal consistency (I think at least it needs `dtype` and `shape` attributes). Datasets would have to be typed the same way, this means only one datatype for all variables is possible, when you mix it it will fall back to the common ancestor which will be the before mentioned protocol. This is basically the same restriction that a dict has. Now to the main issue that I see with this approach: I don't know how to type coordinates. They have the same problems than mentioned above for Datasets. I think it is very common to have dask arrays in the variables but simple numpy arrays in the coordinates, so either one excludes them from the typing or in such cases the common generic typing falls back to the protocol again. Not sure what is the best approach here. Describe alternatives you've considered Since the most common workflow for beginners and intermediate-advanced users is to stick with the DataArrays themself and never touch the underlying data, I am not sure if this change is as beneficial as I want it to be. Maybe it just complicates things and leaving it as `Any` is easier to solve for advanced users that then have to cast or ignore this. Additional context It came up in this discussion: https://github.com/pydata/xarray/pull/7020#discussion_r972617770_	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7457/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1368900431	PR_kwDOAMm_X84-u2Jv	7020	Typing of abstract base classes	headtr1ck 43316012	open	6	2022-09-11T10:27:01Z	2023-01-19T10:48:20Z	COLLABORATOR	0	pydata/xarray/pulls/7020	This PR adds some typing to several abstract base classes that are used in xarray. Most of it is working, only one major point I could not figure out: What is the type of `NDArrayMixin.array`??? I would appreciate it if someone that has more insight into this would help me. Several minor open points: What is the return value of `ExplicitlyIndexed.__getitem__` What is the return value of `ExplicitlyIndexed.transpose` What is the return value of `AbstractArray.data` `Variable.values` seems to be able to return scalar values which is incompatible with the `AbstractArray` definition. Overall it seems that typing has helped to find some problems again :) Mypy should fail for tests, I did not adopt them yet, want to solve the outstanding issues first.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7020/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
1388372090	I_kwDOAMm_X85SwOB6	7094	Align typing of dimension inputs	headtr1ck 43316012	open	5	2022-09-27T20:59:17Z	2022-10-13T18:02:16Z	COLLABORATOR			What is your issue? Currently the input type for "one or more dims" is changing from function to function. There are some open PRs that move to `str \| Iterable[Hashable]` which allows the use of tuples as dimensions. Some changes are still required: - [ ] Accept None in all functions that accept dims as default, this would simplify typing alot (see https://github.com/pydata/xarray/pull/7048#discussion_r973813607) - [ ] Check if we can always include ellipsis "..." in dim arguments (see https://github.com/pydata/xarray/pull/7048#pullrequestreview-1111498309) - [ ] `Iterable[Hashable]` includes sets, which do not preserve the ordering (see https://github.com/pydata/xarray/pull/6971#discussion_r981166670). This means we need to distinguish between the cases where the order matters (constructor, transpose etc.) and where it does not (drop_dims, reductions etc.). Probably this needs to be typed as a `str \| Sequence[Hashable]` (a numpy.ndarray is not a Sequence, but who uses this for dimensions anyway?).	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7094/reactions", "total_count": 5, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1395053809	PR_kwDOAMm_X85AEpA1	7117	Expermimental mypy plugin	headtr1ck 43316012	open	2	2022-10-03T17:07:59Z	2022-10-03T18:53:10Z	COLLABORATOR	1	pydata/xarray/pulls/7117	I was playing around a bit with a mypy plugin and this was the best I could come up with. Unfortunately the mypy docu about the plugins is not very detailed... This plugin makes mypy recognize the user defined accessors. There is a quite severe bug in there (due to my lack of understanding of mypy internals probably) which makes it work only on the first run but when you change a line in your code and run mypy again it will crash... (you can delete the cache to make it work one more time again :) Any chance that a mypy expert can figure this out? haha	{ "url": "https://api.github.com/repos/pydata/xarray/issues/7117/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	pull
1292284929	I_kwDOAMm_X85NBrQB	6749	What should `Dataset.count` return for missing dims?	headtr1ck 43316012	open	5	2022-07-03T11:49:12Z	2022-07-14T17:27:23Z	COLLABORATOR			What is your issue? When using a dataset with multiple variables and using `Dataset.count("x")` it will return ones for variables that are missing dimension "x", e.g.: ```python import xarray as xr ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])}) ds.count("x") returns: <xarray.Dataset> Dimensions: (y: 2) Dimensions without coordinates: y Data variables: a int32 3 b (y) int32 1 1 ``` I can understand why "1" can be a valid answer, but the result is probably a bit philosophical. For my usecase I would like it to return an array of `ds.sizes["x"]` / 0. I think this is also a valid return value, considering the broadcasting rules, where the size of the missing dimension is actually known in the dataset. Maybe one could make this behavior adjustable with a kwarg, e.g. "missing_dim_value: {int, "size"}, default 1.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6749/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1275747776	I_kwDOAMm_X85MCl3A	6703	Add coarsen, rolling and weighted to generate_reductions	headtr1ck 43316012	open	1	2022-06-18T09:49:22Z	2022-06-18T16:04:15Z	COLLABORATOR			Is your feature request related to a problem? Coarsen reductions are currently added dynamically which is not very useful for typing. This is a follow-up to @Illviljan in https://github.com/pydata/xarray/pull/6702#discussion_r900700532_ Same goes for Weighted. And similar for Rolling (not sure if it is exactly the same though?) Describe the solution you'd like Extend the generate_reductions script to include `DataArrayCoarsen` and `DatasetCoarsen`. Once finished: use type checking in all test_coarsen tests.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6703/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1150618439	I_kwDOAMm_X85ElQtH	6306	Assigning to dataset with missing dim raises ValueError	headtr1ck 43316012	open	1	2022-02-25T16:08:04Z	2022-05-21T20:35:52Z	COLLABORATOR			What happened? I tried to assign values to a dataset with a selector-dict where a variable is missing the dim from the selector-dict. This raises a ValueError. What did you expect to happen? I expect that assigning works the same as selecting and it will ignore the missing dims. Minimal Complete Verifiable Example ```Python import xarray as xr ds = xr.Dataset({"a": ("x", [1, 2, 3]), "b": ("y", [4, 5])}) ds[{"x": 1}] this works and returns: <xarray.Dataset> Dimensions: (y: 2) Dimensions without coordinates: y Data variables: a int64 2 b (y) int64 4 5 ds[{"x": 1}] = 1 this fails and raises a ValueError ValueError: Variable 'b': indexer {'x': 1} not available ``` Relevant log output ```Python Traceback (most recent call last): File "xarray/core/dataset.py", line 1591, in _setitem_check var_k = var[key] File "xarray/core/dataarray.py", line 740, in getitem return self.isel(indexers=self._item_key_to_dict(key)) File "xarray/core/dataarray.py", line 1204, in isel variable = self._variable.isel(indexers, missing_dims=missing_dims) File "xarray/core/variable.py", line 1181, in isel indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims) File "xarray/core/utils.py", line 834, in drop_dims_from_indexers raise ValueError( ValueError: Dimensions {'x'} do not exist. Expected one or more of ('y',) The above exception was the direct cause of the following exception: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "xarray/core/dataset.py", line 1521, in setitem value = self._setitem_check(key, value) File "xarray/core/dataset.py", line 1593, in _setitem_check raise ValueError( ValueError: Variable 'b': indexer {'x': 1} not available ``` Anything else we need to know? No response Environment INSTALLED VERSIONS commit: None python: 3.9.1 (default, Jan 13 2021, 15:21:08) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.0 libnetcdf: 4.7.4 xarray: 0.21.1 pandas: 1.4.0 numpy: 1.21.5 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.5.1 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 49.2.1 pip: 22.0.3 conda: None pytest: 6.2.5 IPython: 8.0.0 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6306/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
1221885425	I_kwDOAMm_X85I1H3x	6549	Improved Dataset broadcasting	headtr1ck 43316012	open	3	2022-04-30T17:51:37Z	2022-05-01T14:37:43Z	COLLABORATOR			Is your feature request related to a problem? I am a bit puzzled about how xarrays is broadcasting Datasets. It seems to always add all dimensions to all variables. Is this what you want in general? See this example: ```python import xarray as xr da = xr.DataArray([[1, 2, 3]], dims=("x", "y")) <xarray.DataArray (x: 1, y: 3)> array([[1, 2, 3]]) ds = xr.Dataset({"a": ("x", [1]), "b": ("z", [2, 3])}) <xarray.Dataset> Dimensions: (x: 1, z: 2) Dimensions without coordinates: x, z Data variables: a (x) int32 1 b (z) int32 2 3 ds.broadcast_like(da) returns: <xarray.Dataset> Dimensions: (x: 1, y: 3, z: 2) Dimensions without coordinates: x, y, z Data variables: a (x, y, z) int32 1 1 1 1 1 1 b (x, y, z) int32 2 3 2 3 2 3 I think it should return: <xarray.Dataset> Dimensions: (x: 1, y: 3, z: 2) Dimensions without coordinates: x, y, z Data variables: a (x, y) int32 1 1 1 # notice here without "z" dim b (x, y, z) int32 2 3 2 3 2 3 ``` Describe the solution you'd like I would like broadcasting to behave the same way as e.g. a simple addition. In the upper example `da + ds` produces the dimensions that I want. Describe alternatives you've considered `ds + xr.zeros_like(da)` this works, but seems more like a "dirty hack". Additional context Maybe one can add an option to broadcasting that controls this behavior?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6549/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

15 rows where repo = 13221727, state = "open" and user = 43316012 sorted by updated_at descending

What is your issue?

Current state

Problems

What already improved

The future

What happens behind the scenes

write_dataset method

Transformer class

Disclaimer

Is your feature request related to a problem?

Describe the solution you'd like

will be of type

DataArray[str, np.ndarray]

will be of type

DataArray[tuple[str, str], dask.array.core.Array]

Describe alternatives you've considered

Additional context

What is your issue?

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

What is your issue?

What is your issue?

returns:

<xarray.Dataset>

Dimensions: (y: 2)

Dimensions without coordinates: y

Data variables:

a int32 3

b (y) int32 1 1

Is your feature request related to a problem?

Describe the solution you'd like

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

this works and returns:

<xarray.Dataset>

Dimensions: (y: 2)

Dimensions without coordinates: y

Data variables:

a int64 2

b (y) int64 4 5

this fails and raises a ValueError

ValueError: Variable 'b': indexer {'x': 1} not available

Relevant log output

Anything else we need to know?

Environment

INSTALLED VERSIONS

Is your feature request related to a problem?

<xarray.DataArray (x: 1, y: 3)>

array([[1, 2, 3]])

<xarray.Dataset>

Dimensions: (x: 1, z: 2)

Dimensions without coordinates: x, z

Data variables:

a (x) int32 1

b (z) int32 2 3

returns:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y, z) int32 1 1 1 1 1 1

b (x, y, z) int32 2 3 2 3 2 3

I think it should return:

<xarray.Dataset>

Dimensions: (x: 1, y: 3, z: 2)

Dimensions without coordinates: x, y, z

Data variables:

a (x, y) int32 1 1 1 # notice here without "z" dim

b (x, y, z) int32 2 3 2 3 2 3

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Advanced export

`write_dataset` method