id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2234142680,PR_kwDOAMm_X85sK0g8,8923,"`""source""` encoding for datasets opened from `fsspec` objects",14808389,open,0,,,5,2024-04-09T19:12:45Z,2024-04-23T16:54:09Z,,MEMBER,,0,pydata/xarray/pulls/8923,"When opening files from path-like objects (`str`, `pathlib.Path`), the backend machinery (`_dataset_from_backend_dataset`) sets the `""source""` encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when using `fsspec` to open remote files.

In this PR, I'm extracting the `path` attribute that most `fsspec` objects have to set that value. I've considered using `isinstance` checks instead of the `getattr`-with-default, but the list of potential classes is too big to be practical (at least 4 classes just within `fsspec` itself).

If this sounds like a good idea, I'll update the documentation of the `""source""` encoding to mention this feature.

<!-- Feel free to remove check-list items aren't relevant to your change -->

- [x] Tests added
- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8923/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
683142059,MDU6SXNzdWU2ODMxNDIwNTk=,4361,restructure the contributing guide,14808389,open,0,,,5,2020-08-20T22:51:39Z,2023-03-31T17:39:00Z,,MEMBER,,,,"From #4355

@max-sixty:
> Stepping back on the contributing doc — I admit I haven't look at it in a while — I wonder whether we can slim it down a bit, for example by linking to other docs for generic tooling — I imagine we're unlikely to have the best docs on working with GH, for example. Or referencing our PR template rather than the (now out-of-date) PR checklist.

We could also add a docstring guide since the `numpydoc` guide does not cover every little detail (for example, `default` notation, type spec vs. type hint, space before the colon separating parameter names from types, no colon for parameters without types, etc.)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4361/reactions"", ""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
789106802,MDU6SXNzdWU3ODkxMDY4MDI=,4825,clean up the API for renaming and changing dimensions / coordinates,14808389,open,0,,,5,2021-01-19T15:11:55Z,2021-09-10T15:04:14Z,,MEMBER,,,,"From #4108:

I wonder if it would be better to first ""reorganize"" all of the existing functions: we currently have `rename` (and `Dataset.rename_dims` / `Dataset.rename_vars`), `set_coords`, `reset_coords`, `set_index`, `reset_index` and `swap_dims`, which overlap partially. For example, the code sample from #4417 works if instead of
```python
ds = ds.rename(b='x')
ds = ds.set_coords('x')
```
we use
```python
ds = ds.set_index(x=""b"")
```
and something similar for the code sample in #4107.

I believe we currently have these use cases (not sure if that list is complete, though):
- rename a `DataArray` → `rename`
- rename a existing variable to a name that is not yet in the object → `rename` / `Dataset.rename_vars` / `Dataset.rename_dims`
- convert a data variable to a coordinate (not a dimension coordinate) → `set_coords`
- convert a coordinate (not a dimension coordinate) to a data variable → `reset_coords`
- swap a existing dimension coordinate with a coordinate (which may not exist) and rename the dimension → `swap_dims`
- use a existing coordinate / data variable as a dimension coordinate (do not rename the dimension) → `set_index`
- stop using a coordinate as dimension coordinate and append `_` to its name (do not rename the dimension) → `reset_index`
- use two existing coordinates / data variables as a MultiIndex → `set_index`
- stop using a MultiIndex as a dimension coordinate and use its levels as coordinates → `reset_index`

Sometimes, some of these can be emulated by combinations of others, for example:
```python
# x is a dimension without coordinates
assert_identical(ds.set_index({""x"": ""b""}), ds.swap_dims({""x"": ""b""}).rename({""b"": ""x""}))
assert_identical(ds.swap_dims({""x"": ""b""}), ds.set_index({""x"": ""b""}).rename({""x"": ""b""}))
```
and, with this PR:
```python
assert_identical(ds.set_index({""x"": ""b""}), ds.set_coords(""b"").rename({""b"": ""x""}))
assert_identical(ds.swap_dims({""x"": ""b""}), ds.rename({""b"": ""x""}))
```
which means that it would increase the overlap of `rename`, `set_index`, and `swap_dims`.

In any case I think we should add a guide which explains which method to pick in which situation (or extend `howdoi`).

_Originally posted by @keewis in https://github.com/pydata/xarray/issues/4108#issuecomment-761907785_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4825/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
791277757,MDU6SXNzdWU3OTEyNzc3NTc=,4837,expose _to_temp_dataset / _from_temp_dataset as semi-public API?,14808389,open,0,,,5,2021-01-21T16:11:32Z,2021-01-22T02:07:08Z,,MEMBER,,,,"When writing accessors which behave the same for both `Dataset` and `DataArray`, it would be incredibly useful to be able to use `DataArray._to_temp_dataset` / `DataArray._from_temp_dataset` to deduplicate code. Is it safe to use those in external packages (like `pint-xarray`)?

Otherwise I guess it would be possible to use
```python
name = da.name if da.name is None else ""__temp""
temp_ds = da.to_dataset(name=name)
new_da = temp_ds[name]
if da.name is None:
    new_da = new_da.rename(da.name)
assert_identical(da, new_da)
```
but that seems less efficient.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4837/reactions"", ""total_count"": 2, ""+1"": 2, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
517195073,MDU6SXNzdWU1MTcxOTUwNzM=,3483,assign_coords with mixed DataArray / array args removes coords,14808389,open,0,,,5,2019-11-04T14:38:40Z,2019-11-07T15:46:15Z,,MEMBER,,,,"I'm not sure if using `assign_coords` to overwrite the data of coords is the best way to do so, but using mixed args (on current master) turns out to have surprising results:
```python
>>> obj = xr.DataArray(
...     data=[6, 3, 4, 6],
...     coords={""x"": list(""abcd""), ""y"": (""x"", range(4))},
...     dims=""x"",
... )
>>> obj
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'a' 'b' 'c' 'd'
    y        (x) int64 0 1 2 3
>>> # works as expected
>>> obj.assign_coords(coords={""x"": list(""efgh""), ""y"": (""x"", [0, 2, 4, 6])})
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'e' 'f' 'g' 'h'
    y        (x) int64 0 2 4 6
>>> # works, too (same as .data / .values)
>>> obj.assign_coords(coords={
...     ""x"": obj.x.copy(data=list(""efgh"")).variable,
...     ""y"": (""x"", [0, 2, 4, 6]),
... })
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'e' 'f' 'g' 'h'
    y        (x) int64 0 2 4 6
>>> # this drops ""y""
>>> obj.assign_coords(coords={
...     ""x"": obj.x.copy(data=list(""efgh"")),
...     ""y"": (""x"", [0, 2, 4, 6]),
... })
<xarray.DataArray 'obj' (x: 4)>
array([6, 3, 4, 6])
Coordinates:
  * x        (x) <U1 'e' 'f' 'g' 'h'
```
Passing a `DataArray` for `y`, like `obj.y * 2` while also changing `x` (the type does not matter) always results in a `MergeError`:
```python
>>> obj.assign_coords(x=list(""efgh""), y=obj.y * 2)
xarray.core.merge.MergeError: conflicting values for index 'x' on objects to be combined:
first value: Index(['e', 'f', 'g', 'h'], dtype='object', name='x')
second value: Index(['a', 'b', 'c', 'd'], dtype='object', name='x')
```

I would expect the result to be the same regardless of the type of the new coords.

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3483/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue