github: issues: 3 rows where repo = 13221727, type = "issue" and user = 29104956 sorted by updated

3 rows where repo = 13221727, type = "issue" and user = 29104956 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	body	reactions	state_reason	repo	type
1198668507	I_kwDOAMm_X85Hcjrb	6462	Provide protocols for creating structural subtypes of DataArray/Dataset	rsokl 29104956	open	5	2022-04-09T15:09:40Z	2023-09-16T19:55:59Z		NONE	Is your feature request related to a problem? I frequently find myself wanting to annotate functions in terms of xarray objects that adhere to a particular schema. Given that a dataset's adherence to a schema is a matter of its structure/contents, it is unnatural to try to describe a schema as a subtype of `xr.Dataset` (or `DataArray`) (i.e. a type-checker ought not care that a dataset is an instance of a specific subclass of `Dataset`). Describe the solution you'd like Instead, it would be ideal to define a schema as a Protocol (structural subtype) of `xr.Dataset`. Unfortunately, one cannot subclass a normal class to create a protocol. Thus, I am proposing that `xarray` provide Protocol-based descriptions of `DataArray` and `Dataset` so that users can describe schemas as structural subtypes of these classes. E.g. ```python from typing import Protocol from xarray import DataArray from xarray.typing import DatasetProtocol class ClimateData(DatasetProtocol, Protocol): lat: DataArray lon: DataArray temp: DataArray precip: DataArray def process_climate_data(ds: ClimateData): ds.banana # type checker flags as unknown attribute ds.temp # type checker sees "DataArray" (as informed by ClimateData) ds.sel(lat=1.0) # type checker sees `Dataset` (as informed by `DatasetProtocol`) ``` The contents of `DatasetProtocol` would essentially look like a modified type stub for `xarray.Dataset` so the implementation details are relatively simple, I believe. Describe alternatives you've considered Creating a strict subtype of `Dataset` is not ideal for a few reasons: Static type checkers would then expect to see that datasets must derive from that particular subclass, which is generally not the case. The annotations / design of `xarray.Dataset` is too broad for describing a schema. E.g. the presence of `__getattr__` prevents type checkers from flagging access to non-existent data variables and coordinates during static analysis. `DatasetProtocol` would need to be designed to be less permissive than this. Additional context Hopefully this could be leveraged by the likes of xarray-schema so that xarray schemas can be used to provide both runtime and static validation capabilities. I'd love to get feedback on this, and would be happy to open a PR if xarray devs are willing to weigh in on the design of these protocols.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6462/reactions", "total_count": 11, "+1": 11, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	issue
1226931933	I_kwDOAMm_X85JIX7d	6576	Basic examples for creating data structures fail type-checking	rsokl 29104956	closed	2	2022-05-05T16:42:00Z	2022-05-27T18:01:33Z	2022-05-27T18:01:33Z	NONE	What happened? The examples provided by this documentation reveal issues with the type-annotations for `DataArray` and `Dataset`. Running mypy and pyright on these basic use-cases, only slightly modified, produce type-checking errors. What did you expect to happen? The annotations for these classes should accommodate these common use-cases. Minimal Complete Verifiable Example ```Python run mypy or pyright on the following file to reproduce the errors import numpy as np import xarray as xr import pandas as pd data = np.random.rand(4, 3) locs = ["IA", "IL", "IN"] times = pd.date_range("2000-01-01", periods=4) foo = xr.DataArray( data, coords=[times, locs], # error: List item 1 has incompatible type "List[str]"; expected "Tuple[Any, ...]" dims=["time", "space"], ) temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]] A = { "temperature": (["x", "y", "time"], temp), "precipitation": (["x", "y", "time"], precip), } C = { "lon": (["x", "y"], lon), "lat": (["x", "y"], lat), "time": pd.date_range("2014-09-06", periods=3), "reference_time": pd.Timestamp("2014-09-05"), } ds = xr.Dataset( A, # error: Argument 1 to "Dataset" has incompatible type "Dict[str, Tuple[List[str], Any]]"; expected "Optional[Mapping[Hashable, Any]]" coords=C, # error: Argument "coords" to "Dataset" has incompatible type "Dict[str, Any]"; expected "Optional[Mapping[Hashable, Any]]" ) ``` MVCE confirmation [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. [x] Complete example — the example is self-contained, including all data and the text of any traceback. [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result. [x] New issue — a search of GitHub Issues suggests this is not a duplicate. Relevant log output No response Anything else we need to know? Some of these errors are circumvented when one provides a literal inline, and thus exploit bidrectional inference, which may be why the current mypy tests ran in your CI miss these. E.g. ```python from typing import Dict, Hashable, Any def f(x: Dict[Hashable, Any]): ... f({"hi": 1}) # this is ok -- uses bidirectional inference to see Dict[Hashable, Any] x = {"hi": 1} f(x) # error: Dict[Hashable, Any] is invariant in Hashable, and is incompatible with str ``` This is a sticky situation as key is invariant even in `Mapping`: https://github.com/python/typing/issues/445. IMHO it would be great to tweak these annotations, e.g. `Hashable -> Hashable \| str \| <other common coord types>` to ensure that users don't face such false positives. Environment INSTALLED VERSIONS ------------------ commit: None python: 3.8.13 \| packaged by conda-forge \| (default, Mar 25 2022, 06:04:18) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 4.15.0-153-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 0.19.0 pandas: 1.3.3 numpy: 1.20.3 scipy: 1.7.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: None distributed: None matplotlib: 3.5.2 cartopy: None seaborn: None numbagg: None pint: None setuptools: 59.5.0 pip: 21.3 conda: None pytest: 6.2.5 IPython: 7.28.0 sphinx: 4.5.0	{ "url": "https://api.github.com/repos/pydata/xarray/issues/6576/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
634869703	MDU6SXNzdWU2MzQ4Njk3MDM=	4131	Why am I able to load data from a closed dataset?	rsokl 29104956	closed	10	2020-06-08T19:14:46Z	2022-04-05T18:35:06Z	2022-04-05T18:35:06Z	NONE	I don't understand why I am able to open and close a dataset, but then proceed to read data from said dataset. I can open a 4 GB dataset and promptly close is, and then still access the data within, which appears to still be loading lazily. Does querying a closed dataset automatically reopen it? MCVE Code Sample ```python import numpy as np import xarray as xr ds = xr.Dataset({"foo": (("x",), np.random.rand(4,))}, coords={"x": [10, 20, 30, 40]}) ds.to_netcdf("tmp_example.nc") python data = xr.open_dataset("tmp_example.nc") data.close() data.foo <xarray.DataArray 'foo' (x: 4)> array([0.894788, 0.017935, 0.696086, 0.827004]) Coordinates: * x (x) int64 10 20 30 40 ``` Expected Output Because netCDF data sets are loaded lazily, I would imagine that, having not been touched when opened, that closing the data set would render it inaccessible Versions Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-166-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.1 xarray: 0.15.0 pandas: 1.0.3 numpy: 1.16.3 scipy: 1.4.1 netCDF4: 1.4.1 pydap: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.14.0 distributed: None matplotlib: 3.0.3 cartopy: None seaborn: None numbagg: None setuptools: 46.1.3.post20200330 pip: 20.0.2 conda: None pytest: 5.4.1 IPython: 7.5.0 sphinx: 2.4.4	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4131/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

3 rows where repo = 13221727, type = "issue" and user = 29104956 sorted by updated_at descending

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

What happened?

What did you expect to happen?

Minimal Complete Verifiable Example

run mypy or pyright on the following file to reproduce the errors

MVCE confirmation

Relevant log output

Anything else we need to know?

Environment

MCVE Code Sample

Expected Output

Versions

Advanced export