id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1198668507,I_kwDOAMm_X85Hcjrb,6462,Provide protocols for creating structural subtypes of DataArray/Dataset,29104956,open,0,,,5,2022-04-09T15:09:40Z,2023-09-16T19:55:59Z,,NONE,,,,"### Is your feature request related to a problem?

I frequently find myself wanting to annotate functions in terms of xarray objects that adhere to a particular schema. Given that a dataset's adherence to a schema is a matter of its structure/contents, it is unnatural to try to describe a schema as a subtype of `xr.Dataset` (or `DataArray`) (i.e. a type-checker ought not care that a dataset is an instance of a specific subclass of `Dataset`). 

### Describe the solution you'd like

Instead, it would be ideal to define a schema as a [Protocol (structural subtype)](https://peps.python.org/pep-0544/) of `xr.Dataset`. Unfortunately, one cannot [subclass a normal class to create a protocol](https://peps.python.org/pep-0544/#protocols-subclassing-normal-classes).

Thus, I am proposing that `xarray` provide Protocol-based descriptions of `DataArray` and `Dataset` so that users can describe schemas as **structural subtypes** of these classes. E.g.

```python
from typing import Protocol

from xarray import DataArray
from xarray.typing import DatasetProtocol

class ClimateData(DatasetProtocol, Protocol):
    lat: DataArray
    lon: DataArray
    temp: DataArray
    precip: DataArray

def process_climate_data(ds: ClimateData):
    ds.banana  # type checker flags as unknown attribute
    ds.temp  # type checker sees ""DataArray"" (as informed by ClimateData)
    ds.sel(lat=1.0)  # type checker sees `Dataset` (as informed by `DatasetProtocol`)
```

The contents of `DatasetProtocol` would essentially look like a modified type stub for `xarray.Dataset` so the implementation details are relatively simple, I believe.




### Describe alternatives you've considered

Creating a strict subtype of `Dataset` is not ideal for a few reasons:

1. Static type checkers would then expect to see that datasets must derive from that particular subclass, which is generally not the case.
2. The annotations / design of `xarray.Dataset` is too broad for describing a schema. E.g. the presence of `__getattr__` prevents type checkers from flagging access to non-existent data variables and coordinates during static analysis. `DatasetProtocol` would need to be designed to be less permissive than this.

### Additional context

Hopefully this could be leveraged by the likes of [xarray-schema](https://github.com/carbonplan/xarray-schema) so that xarray schemas can be used to provide both runtime *and* static validation capabilities.

I'd love to get feedback on this, and would be happy to open a PR if xarray devs are willing to weigh in on the design of these protocols.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6462/reactions"", ""total_count"": 11, ""+1"": 11, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1226931933,I_kwDOAMm_X85JIX7d,6576,Basic examples for creating data structures fail type-checking,29104956,closed,0,,,2,2022-05-05T16:42:00Z,2022-05-27T18:01:33Z,2022-05-27T18:01:33Z,NONE,,,,"### What happened?

The examples provided by [this documentation](https://docs.xarray.dev/en/stable/user-guide/data-structures.html) reveal issues with the type-annotations for `DataArray` and `Dataset`. Running mypy and pyright on these basic use-cases, only slightly modified, produce type-checking errors.



### What did you expect to happen?

The annotations for these classes should accommodate these common use-cases.

### Minimal Complete Verifiable Example

```Python
# run mypy or pyright on the following file to reproduce the errors

import numpy as np
import xarray as xr
import pandas as pd

data = np.random.rand(4, 3)
locs = [""IA"", ""IL"", ""IN""]
times = pd.date_range(""2000-01-01"", periods=4)

foo = xr.DataArray(
    data,
    coords=[times, locs],  # error: List item 1 has incompatible type ""List[str]""; expected ""Tuple[Any, ...]""
    dims=[""time"", ""space""],
)


temp = 15 + 8 * np.random.randn(2, 2, 3)
precip = 10 * np.random.rand(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]

A = {
    ""temperature"": ([""x"", ""y"", ""time""], temp), 
    ""precipitation"": ([""x"", ""y"", ""time""], precip),
}

C = {
    ""lon"": ([""x"", ""y""], lon),
    ""lat"": ([""x"", ""y""], lat),
    ""time"": pd.date_range(""2014-09-06"", periods=3),
    ""reference_time"": pd.Timestamp(""2014-09-05""),
}

ds = xr.Dataset(
    A, # error: Argument 1 to ""Dataset"" has incompatible type ""Dict[str, Tuple[List[str], Any]]""; expected ""Optional[Mapping[Hashable, Any]]""
    coords=C,  # error: Argument ""coords"" to ""Dataset"" has incompatible type ""Dict[str, Any]""; expected ""Optional[Mapping[Hashable, Any]]""
)
```


### MVCE confirmation

- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

Some of these errors are circumvented when one provides a literal inline, and thus exploit [bidrectional inference](https://github.com/microsoft/pyright/blob/main/docs/type-inference.md#bidirectional-type-inference-expected-types), which may be why the current mypy tests ran in your CI miss these.

E.g.

```python
from typing import Dict, Hashable, Any

def f(x: Dict[Hashable, Any]): ...

f({""hi"": 1})  # this is ok -- uses bidirectional inference to see Dict[Hashable, Any]

x = {""hi"": 1}
f(x)  # error: Dict[Hashable, Any] is invariant in Hashable, and is incompatible with str
```

This is a sticky situation as key is invariant even in `Mapping`: https://github.com/python/typing/issues/445. IMHO it would be great to tweak these annotations, e.g. `Hashable -> Hashable | str | <other common coord types>` to ensure that users don't face such  false positives.

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18) 
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-153-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 0.19.0
pandas: 1.3.3
numpy: 1.20.3
scipy: 1.7.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: None
distributed: None
matplotlib: 3.5.2
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 59.5.0
pip: 21.3
conda: None
pytest: 6.2.5
IPython: 7.28.0
sphinx: 4.5.0


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6576/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
634869703,MDU6SXNzdWU2MzQ4Njk3MDM=,4131,Why am I able to load data from a closed dataset?,29104956,closed,0,,,10,2020-06-08T19:14:46Z,2022-04-05T18:35:06Z,2022-04-05T18:35:06Z,NONE,,,,"<!-- A short summary of the issue, if appropriate -->
I don't understand why I am able to open and close a dataset, but then proceed to read data from said dataset.

I can open a 4 GB dataset and promptly close is, and then still access the data within, which appears to still be loading lazily. Does querying a closed dataset automatically reopen it?


#### MCVE Code Sample
<!-- In order for the maintainers to efficiently understand and prioritize issues, we ask you post a ""Minimal, Complete and Verifiable Example"" (MCVE): http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports -->

```python
import numpy as np
import xarray as xr

ds = xr.Dataset({""foo"": ((""x"",), np.random.rand(4,))}, coords={""x"": [10, 20, 30, 40]})
ds.to_netcdf(""tmp_example.nc"")
```
```python
>>> data = xr.open_dataset(""tmp_example.nc"")
>>> data.close()
>>> data.foo
<xarray.DataArray 'foo' (x: 4)>
array([0.894788, 0.017935, 0.696086, 0.827004])
Coordinates:
  * x        (x) int64 10 20 30 40
```

#### Expected Output
Because netCDF data sets are loaded lazily, I would imagine that, having not been touched when opened, that closing the data set would render it inaccessible



#### Versions

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3 (default, Mar 27 2019, 22:11:17) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.4.0-166-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.1

xarray: 0.15.0
pandas: 1.0.3
numpy: 1.16.3
scipy: 1.4.1
netCDF4: 1.4.1
pydap: None
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.14.0
distributed: None
matplotlib: 3.0.3
cartopy: None
seaborn: None
numbagg: None
setuptools: 46.1.3.post20200330
pip: 20.0.2
conda: None
pytest: 5.4.1
IPython: 7.5.0
sphinx: 2.4.4


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4131/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue