id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 2234142680,PR_kwDOAMm_X85sK0g8,8923,"`""source""` encoding for datasets opened from `fsspec` objects",14808389,open,0,,,5,2024-04-09T19:12:45Z,2024-04-23T16:54:09Z,,MEMBER,,0,pydata/xarray/pulls/8923,"When opening files from path-like objects (`str`, `pathlib.Path`), the backend machinery (`_dataset_from_backend_dataset`) sets the `""source""` encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when using `fsspec` to open remote files. In this PR, I'm extracting the `path` attribute that most `fsspec` objects have to set that value. I've considered using `isinstance` checks instead of the `getattr`-with-default, but the list of potential classes is too big to be practical (at least 4 classes just within `fsspec` itself). If this sounds like a good idea, I'll update the documentation of the `""source""` encoding to mention this feature. - [x] Tests added - [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8923/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 683142059,MDU6SXNzdWU2ODMxNDIwNTk=,4361,restructure the contributing guide,14808389,open,0,,,5,2020-08-20T22:51:39Z,2023-03-31T17:39:00Z,,MEMBER,,,,"From #4355 @max-sixty: > Stepping back on the contributing doc — I admit I haven't look at it in a while — I wonder whether we can slim it down a bit, for example by linking to other docs for generic tooling — I imagine we're unlikely to have the best docs on working with GH, for example. Or referencing our PR template rather than the (now out-of-date) PR checklist. We could also add a docstring guide since the `numpydoc` guide does not cover every little detail (for example, `default` notation, type spec vs. type hint, space before the colon separating parameter names from types, no colon for parameters without types, etc.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4361/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 789106802,MDU6SXNzdWU3ODkxMDY4MDI=,4825,clean up the API for renaming and changing dimensions / coordinates,14808389,open,0,,,5,2021-01-19T15:11:55Z,2021-09-10T15:04:14Z,,MEMBER,,,,"From #4108: I wonder if it would be better to first ""reorganize"" all of the existing functions: we currently have `rename` (and `Dataset.rename_dims` / `Dataset.rename_vars`), `set_coords`, `reset_coords`, `set_index`, `reset_index` and `swap_dims`, which overlap partially. For example, the code sample from #4417 works if instead of ```python ds = ds.rename(b='x') ds = ds.set_coords('x') ``` we use ```python ds = ds.set_index(x=""b"") ``` and something similar for the code sample in #4107. I believe we currently have these use cases (not sure if that list is complete, though): - rename a `DataArray` → `rename` - rename a existing variable to a name that is not yet in the object → `rename` / `Dataset.rename_vars` / `Dataset.rename_dims` - convert a data variable to a coordinate (not a dimension coordinate) → `set_coords` - convert a coordinate (not a dimension coordinate) to a data variable → `reset_coords` - swap a existing dimension coordinate with a coordinate (which may not exist) and rename the dimension → `swap_dims` - use a existing coordinate / data variable as a dimension coordinate (do not rename the dimension) → `set_index` - stop using a coordinate as dimension coordinate and append `_` to its name (do not rename the dimension) → `reset_index` - use two existing coordinates / data variables as a MultiIndex → `set_index` - stop using a MultiIndex as a dimension coordinate and use its levels as coordinates → `reset_index` Sometimes, some of these can be emulated by combinations of others, for example: ```python # x is a dimension without coordinates assert_identical(ds.set_index({""x"": ""b""}), ds.swap_dims({""x"": ""b""}).rename({""b"": ""x""})) assert_identical(ds.swap_dims({""x"": ""b""}), ds.set_index({""x"": ""b""}).rename({""x"": ""b""})) ``` and, with this PR: ```python assert_identical(ds.set_index({""x"": ""b""}), ds.set_coords(""b"").rename({""b"": ""x""})) assert_identical(ds.swap_dims({""x"": ""b""}), ds.rename({""b"": ""x""})) ``` which means that it would increase the overlap of `rename`, `set_index`, and `swap_dims`. In any case I think we should add a guide which explains which method to pick in which situation (or extend `howdoi`). _Originally posted by @keewis in https://github.com/pydata/xarray/issues/4108#issuecomment-761907785_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4825/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 791277757,MDU6SXNzdWU3OTEyNzc3NTc=,4837,expose _to_temp_dataset / _from_temp_dataset as semi-public API?,14808389,open,0,,,5,2021-01-21T16:11:32Z,2021-01-22T02:07:08Z,,MEMBER,,,,"When writing accessors which behave the same for both `Dataset` and `DataArray`, it would be incredibly useful to be able to use `DataArray._to_temp_dataset` / `DataArray._from_temp_dataset` to deduplicate code. Is it safe to use those in external packages (like `pint-xarray`)? Otherwise I guess it would be possible to use ```python name = da.name if da.name is None else ""__temp"" temp_ds = da.to_dataset(name=name) new_da = temp_ds[name] if da.name is None: new_da = new_da.rename(da.name) assert_identical(da, new_da) ``` but that seems less efficient.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4837/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 517195073,MDU6SXNzdWU1MTcxOTUwNzM=,3483,assign_coords with mixed DataArray / array args removes coords,14808389,open,0,,,5,2019-11-04T14:38:40Z,2019-11-07T15:46:15Z,,MEMBER,,,,"I'm not sure if using `assign_coords` to overwrite the data of coords is the best way to do so, but using mixed args (on current master) turns out to have surprising results: ```python >>> obj = xr.DataArray( ... data=[6, 3, 4, 6], ... coords={""x"": list(""abcd""), ""y"": (""x"", range(4))}, ... dims=""x"", ... ) >>> obj array([6, 3, 4, 6]) Coordinates: * x (x) >> # works as expected >>> obj.assign_coords(coords={""x"": list(""efgh""), ""y"": (""x"", [0, 2, 4, 6])}) array([6, 3, 4, 6]) Coordinates: * x (x) >> # works, too (same as .data / .values) >>> obj.assign_coords(coords={ ... ""x"": obj.x.copy(data=list(""efgh"")).variable, ... ""y"": (""x"", [0, 2, 4, 6]), ... }) array([6, 3, 4, 6]) Coordinates: * x (x) >> # this drops ""y"" >>> obj.assign_coords(coords={ ... ""x"": obj.x.copy(data=list(""efgh"")), ... ""y"": (""x"", [0, 2, 4, 6]), ... }) array([6, 3, 4, 6]) Coordinates: * x (x) >> obj.assign_coords(x=list(""efgh""), y=obj.y * 2) xarray.core.merge.MergeError: conflicting values for index 'x' on objects to be combined: first value: Index(['e', 'f', 'g', 'h'], dtype='object', name='x') second value: Index(['a', 'b', 'c', 'd'], dtype='object', name='x') ``` I would expect the result to be the same regardless of the type of the new coords. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3483/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue