home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where state = "open" and user = 45271239 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 2

  • issue 4
  • pull 1

state 1

  • open · 5 ✖

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2140225209 PR_kwDOAMm_X85nLLgJ 8761 Use ruff for formatting etienneschalk 45271239 open 0     10 2024-02-17T16:04:18Z 2024-02-27T20:11:57Z   CONTRIBUTOR   1 pydata/xarray/pulls/8761
  • [ ] Closes #8760 ~~- [ ] Tests added~~ ~~- [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst~~ ~~- [ ] New functions/methods are listed in api.rst~~

Note: many inline ... obtain their own line. Running black . would have produced the same result

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8761/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2140173727 I_kwDOAMm_X85_kHWf 8760 Use `ruff` for formatting etienneschalk 45271239 open 0     0 2024-02-17T15:07:17Z 2024-02-26T05:58:53Z   CONTRIBUTOR      

What is your issue?

Use ruff for formatting

Context

Ruff was introduced in https://github.com/pydata/xarray/issues/7458. Arguments in favor were that it is faster, and combines multiple tools in a single tool (eg flake8, pyflakes, isort, pyupgrade ).

This switches our primary linter to Ruff. As adervertised, Ruff is very fast. Plust we get the benefit of using a single tool that combines the previous functionality of pyflakes, isort, and pyupgrade.

Suggestion

Suggestion: To move on with ruff replacement of tools, introduce ruff-format to replace black (See ruff Usage for integration with pre-commit). See issue

Pandas uses ruff and ruff-format: https://github.com/pandas-dev/pandas/blob/63dc0f76faa208450b8aaa57246029fcf94d015b/.pre-commit-config.yaml#L24

Ruff is capable of dosctring formatting: https://docs.astral.sh/ruff/formatter/#docstring-formatting

Ruff can format Jupyter Notebooks: https://docs.astral.sh/ruff/faq/#does-ruff-support-jupyter-notebooks

So, introducing the ruff formatter might remove the need for black and blackdoc

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8760/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2140968762 I_kwDOAMm_X85_nJc6 8763 Documentation 404 not found for "Suggest Edit" link in "API Reference" pages etienneschalk 45271239 open 0     0 2024-02-18T12:39:25Z 2024-02-18T12:39:25Z   CONTRIBUTOR      

What happened?

Concrete example: let's say I am currently reading the documentation of DataArray.resample. I would like to have a look at the internals and see the code directly on GitHub.

We can see a GitHub icon, with 3 links: - Repositry: leads to the home page of the repo: https://github.com/pydata/xarray - Suggest edit: leads to a 404 not found as it points to the generated documentation - Open issue (generic link to open an issue)

The [source] link does what is expected: it leads to the source code https://github.com/pydata/xarray/blob/main/xarray/core/dataset.py#L10471-L10565

What did you expect to happen?

The second link "Suggest edit" should actually lead to the source code, as the documentation is auto-generated from the docstrings themselves. Maybe it could be renamed like "View source"

Example of other repos having this feature:

Minimal Complete Verifiable Example

Python N/A

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

Python N/A

Anything else we need to know?

No response

Environment

N/A

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8763/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2135262747 I_kwDOAMm_X85_RYYb 8749 Lack of resilience towards missing `_ARRAY_DIMENSIONS` xarray's special zarr attribute #280 etienneschalk 45271239 open 0     2 2024-02-14T21:52:34Z 2024-02-15T19:15:59Z   CONTRIBUTOR      

What is your issue?

Original issue: https://github.com/xarray-contrib/datatree/issues/280

Note: this issue description was generated from a notebook. You can use it to reproduce locally the bug.

Lack of resilience towards missing _ARRAY_DIMENSIONS xarray's special Zarr attribute

```python from pathlib import Path import json from typing import Any

import numpy as np import xarray as xr ```

_Utilities

This section only declares utilities functions and do not contain any additional value for the reader

```python

Set to True to get rich HTML representations in an interactive Notebook session

Set to False to get textual representations ready to be converted to markdown for issue report

INTERACTIVE = False

Convert to markdown with

jupyter nbconvert --to markdown notebooks/datatree-zarr.ipynb

```

```python def show(obj: Any) -> Any: if isinstance(obj, Path): if INTERACTIVE: return obj.resolve() else: print(obj) else: if INTERACTIVE: return obj else: print(obj)

def load_json(path: Path) -> dict: with open(path, encoding="utf-8") as fp: return json.load(fp) ```

Data Creation

I create a dummy Dataset containing a single (label, z)-dimensional DataArray named my_xda.

python xda = xr.DataArray( np.arange(3 * 18).reshape(3, 18), coords={"label": list("abc"), "z": list(range(18))}, ) xda = xda.chunk({"label": 2, "z": 4}) show(xda)

<xarray.DataArray (label: 3, z: 18)>
dask.array<xarray-<this-array>, shape=(3, 18), dtype=int64, chunksize=(2, 4), chunktype=numpy.ndarray>
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

python xds = xr.Dataset({"my_xda": xda}) show(xds)

<xarray.Dataset>
Dimensions:  (label: 3, z: 18)
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Data variables:
    my_xda   (label, z) int64 dask.array<chunksize=(2, 4), meta=np.ndarray>

Data Writing

I persist the Dataset to Zarr

python zarr_path = Path() / "../generated/zarrounet.zarr" xds.to_zarr(zarr_path, mode="w") show(zarr_path)

../generated/zarrounet.zarr

Data Initial Reading

I read successfully the Dataset

python show(xr.open_zarr(zarr_path).my_xda)

<xarray.DataArray 'my_xda' (label: 3, z: 18)>
dask.array<open_dataset-my_xda, shape=(3, 18), dtype=int64, chunksize=(2, 4), chunktype=numpy.ndarray>
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Data Alteration

Then, I alter the Zarr by removing successively all of the _ARRAY_DIMENSIONS from all of the variables' .zattrs: z, label, my_xda, and try to reopen the Zarr. It is in all cases a success. ✔️

```python

corrupt the variables' _ARRAY_DIMENSIONS xarray's attribute

for varname in ("z/.zattrs", "label/.zattrs", "my_xda/.zattrs"): zattrs_path = zarr_path / varname assert zattrs_path.is_file() zattrs_path.write_text("{}")

Note: it has no impact, only the root .zmetdata seems to be used

```

python show(xr.open_zarr(zarr_path))

<xarray.Dataset>
Dimensions:  (label: 3, z: 18)
Coordinates:
  * label    (label) <U1 'a' 'b' 'c'
  * z        (z) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Data variables:
    my_xda   (label, z) int64 dask.array<chunksize=(2, 4), meta=np.ndarray>

However, the last alteration, which is removing the _ARRAY_DIMENSIONS key-value pair from one of the variables in the .zmetadata file present at the root of the zarr, results in an exception when reading. The error message is explicit: KeyError: '_ARRAY_DIMENSIONS' ❌

This means xarray cannot open any Zarr file, but only those who possess an xarray's special private attribute, _ARRAY_DIMENSIONS.

Because of these choices, Xarray cannot read arbitrary array data, but only Zarr data with valid _ARRAY_DIMENSIONS

See https://docs.xarray.dev/en/latest/internals/zarr-encoding-spec.html

In a first phase, the error message can probably be more explicit (better than a low-level KeyError), explaining that xarray cannot yet open arbitrary Zarr data.

python zmetadata_path = zarr_path / ".zmetadata" assert zmetadata_path.is_file() zmetadata = load_json(zmetadata_path) zmetadata["metadata"]["z/.zattrs"] = {} zmetadata_path.write_text(json.dumps(zmetadata, indent=4))

1925

python show(xr.open_zarr(zarr_path))

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/zarr.py:212, in _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr)
    210 try:
    211     # Xarray-Zarr
--> 212     dimensions = zarr_obj.attrs[dimension_key]
    213 except KeyError as e:


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/zarr/attrs.py:73, in Attributes.__getitem__(self, item)
     72 def __getitem__(self, item):
---> 73     return self.asdict()[item]


KeyError: '_ARRAY_DIMENSIONS'


During handling of the above exception, another exception occurred:


TypeError                                 Traceback (most recent call last)

Cell In[11], line 1
----> 1 show(xr.open_zarr(zarr_path))


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/zarr.py:900, in open_zarr(store, group, synchronizer, chunks, decode_cf, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, consolidated, overwrite_encoded_chunks, chunk_store, storage_options, decode_timedelta, use_cftime, zarr_version, chunked_array_type, from_array_kwargs, **kwargs)
    886     raise TypeError(
    887         "open_zarr() got unexpected keyword arguments " + ",".join(kwargs.keys())
    888     )
    890 backend_kwargs = {
    891     "synchronizer": synchronizer,
    892     "consolidated": consolidated,
   (...)
    897     "zarr_version": zarr_version,
    898 }
--> 900 ds = open_dataset(
    901     filename_or_obj=store,
    902     group=group,
    903     decode_cf=decode_cf,
    904     mask_and_scale=mask_and_scale,
    905     decode_times=decode_times,
    906     concat_characters=concat_characters,
    907     decode_coords=decode_coords,
    908     engine="zarr",
    909     chunks=chunks,
    910     drop_variables=drop_variables,
    911     chunked_array_type=chunked_array_type,
    912     from_array_kwargs=from_array_kwargs,
    913     backend_kwargs=backend_kwargs,
    914     decode_timedelta=decode_timedelta,
    915     use_cftime=use_cftime,
    916     zarr_version=zarr_version,
    917 )
    918 return ds


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/api.py:573, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, inline_array, chunked_array_type, from_array_kwargs, backend_kwargs, **kwargs)
    561 decoders = _resolve_decoders_kwargs(
    562     decode_cf,
    563     open_backend_dataset_parameters=backend.open_dataset_parameters,
   (...)
    569     decode_coords=decode_coords,
    570 )
    572 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 573 backend_ds = backend.open_dataset(
    574     filename_or_obj,
    575     drop_variables=drop_variables,
    576     **decoders,
    577     **kwargs,
    578 )
    579 ds = _dataset_from_backend_dataset(
    580     backend_ds,
    581     filename_or_obj,
   (...)
    591     **kwargs,
    592 )
    593 return ds


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/zarr.py:982, in ZarrBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, synchronizer, consolidated, chunk_store, storage_options, stacklevel, zarr_version)
    980 store_entrypoint = StoreBackendEntrypoint()
    981 with close_on_error(store):
--> 982     ds = store_entrypoint.open_dataset(
    983         store,
    984         mask_and_scale=mask_and_scale,
    985         decode_times=decode_times,
    986         concat_characters=concat_characters,
    987         decode_coords=decode_coords,
    988         drop_variables=drop_variables,
    989         use_cftime=use_cftime,
    990         decode_timedelta=decode_timedelta,
    991     )
    992 return ds


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/store.py:43, in StoreBackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta)
     29 def open_dataset(  # type: ignore[override]  # allow LSP violation, not supporting **kwargs
     30     self,
     31     filename_or_obj: str | os.PathLike[Any] | BufferedIOBase | AbstractDataStore,
   (...)
     39     decode_timedelta=None,
     40 ) -> Dataset:
     41     assert isinstance(filename_or_obj, AbstractDataStore)
---> 43     vars, attrs = filename_or_obj.load()
     44     encoding = filename_or_obj.get_encoding()
     46     vars, attrs, coord_names = conventions.decode_cf_variables(
     47         vars,
     48         attrs,
   (...)
     55         decode_timedelta=decode_timedelta,
     56     )


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/common.py:210, in AbstractDataStore.load(self)
    188 def load(self):
    189     """
    190     This loads the variables and attributes simultaneously.
    191     A centralized loading function makes it easier to create
   (...)
    207     are requested, so care should be taken to make sure its fast.
    208     """
    209     variables = FrozenDict(
--> 210         (_decode_variable_name(k), v) for k, v in self.get_variables().items()
    211     )
    212     attributes = FrozenDict(self.get_attrs())
    213     return variables, attributes


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/zarr.py:519, in ZarrStore.get_variables(self)
    518 def get_variables(self):
--> 519     return FrozenDict(
    520         (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
    521     )


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/core/utils.py:471, in FrozenDict(*args, **kwargs)
    470 def FrozenDict(*args, **kwargs) -> Frozen:
--> 471     return Frozen(dict(*args, **kwargs))


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/zarr.py:520, in <genexpr>(.0)
    518 def get_variables(self):
    519     return FrozenDict(
--> 520         (k, self.open_store_variable(k, v)) for k, v in self.zarr_group.arrays()
    521     )


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/zarr.py:496, in ZarrStore.open_store_variable(self, name, zarr_array)
    494 data = indexing.LazilyIndexedArray(ZarrArrayWrapper(name, self))
    495 try_nczarr = self._mode == "r"
--> 496 dimensions, attributes = _get_zarr_dims_and_attrs(
    497     zarr_array, DIMENSION_KEY, try_nczarr
    498 )
    499 attributes = dict(attributes)
    501 # TODO: this should not be needed once
    502 # https://github.com/zarr-developers/zarr-python/issues/1269 is resolved.


File ~/.cache/pypoetry/virtualenvs/datatree-experimentation-Sa4oWCLA-py3.10/lib/python3.10/site-packages/xarray/backends/zarr.py:222, in _get_zarr_dims_and_attrs(zarr_obj, dimension_key, try_nczarr)
    220 # NCZarr defines dimensions through metadata in .zarray
    221 zarray_path = os.path.join(zarr_obj.path, ".zarray")
--> 222 zarray = json.loads(zarr_obj.store[zarray_path])
    223 try:
    224     # NCZarr uses Fully Qualified Names
    225     dimensions = [
    226         os.path.basename(dim) for dim in zarray["_NCZARR_ARRAY"]["dimrefs"]
    227     ]


File ~/.pyenv/versions/3.10.12/lib/python3.10/json/__init__.py:339, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    337 else:
    338     if not isinstance(s, (bytes, bytearray)):
--> 339         raise TypeError(f'the JSON object must be str, bytes or bytearray, '
    340                         f'not {s.__class__.__name__}')
    341     s = s.decode(detect_encoding(s), 'surrogatepass')
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):


TypeError: the JSON object must be str, bytes or bytearray, not dict

xr.show_versions()

```python import warnings

warnings.filterwarnings("ignore") xr.show_versions() ```

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 (main, Aug 15 2023, 11:50:32) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-92-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2023.10.1
pandas: 2.1.3
numpy: 1.25.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.16.1
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.11.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.8.0
pip: 23.1.2
conda: None
pytest: None
mypy: None
IPython: 8.17.2
sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8749/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
2117299976 I_kwDOAMm_X85-M28I 8705 More granularity in the CI, separating code and docs changes? etienneschalk 45271239 open 0     7 2024-02-04T20:54:30Z 2024-02-15T14:51:12Z   CONTRIBUTOR      

What is your issue?

Hi,

TLDR: Is there a way to only run relevant CI checks (eg documentation) when a new commit is pushed on a PR's branch?

The following issue is written from a naive user point of view. Indeed I do not know how the CI works on this project. I constated that when updating an existing Pull Request, the whole test battery is re-executed. However, it is a common scenario that someone wants to update only the documentation, for instance. In that case, it might make sense to only retrigger the documentation checks. A little bit like pre-commit that only runs on the updated files. Achieving such a level of granularity is not desirable as even a small code change could make geographically remote tests in the code fail, however, a high-level separation between code and docs for instance, might relieve a little bit the pipelines. This is assuming the code does not depend at all on the code. Maybe other separations exists, but the first I can think of is code vs docs.

Another separation would be to have an "order" / "dependency system" in the pipeline. Eg, A -> B -> C ; if A fails, there is no point into taking resources to compute B as we know for sure the rest will fail. Such a hierarchy might be difficult for the test matrix that is unordered (eg Python Version x OS, on this project it seems to be more or less (3.9, 3.10, 3.11, 3.12) x (Ubuntu, macOS, Windows)

There is also a notion of frequency and execution time: pipelines' stages that are the most empirically likely to fail and the shortest to runshould be ran first, to avoid having them fail due to flakiness and out of bad luck when all the other checks passed before. Such a stage exists: CI / ubuntu-latest py3.10 flaky (it is in the name). Taking that into account, the CI Additional / Mypy stage qualifies for both criteria should be ran before everything else for instance. Indeed, it is static code checking and very likely to fail, something a developer might also run locally before committing / pushing, and only takes one minute to run (compared to several minutes for each of stages of the Python Version x OS matrix). The goal here is to save resources (at the cost of losing the "completeness" of the CI run)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8705/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 950.837ms · About: xarray-datasette