home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

7 rows where comments = 3, state = "open" and user = 35968931 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 2

  • issue 5
  • pull 2

state 1

  • open · 7 ✖

repo 1

  • xarray 7
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2120340151 PR_kwDOAMm_X85mHqI0 8714 Avoid coercing to numpy in `as_shared_dtypes` TomNicholas 35968931 open 0     3 2024-02-06T09:35:22Z 2024-03-28T18:31:50Z   MEMBER   0 pydata/xarray/pulls/8714
  • [x] Solves the problem in https://github.com/pydata/xarray/pull/8712#issuecomment-1929037299
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8714/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2116695961 I_kwDOAMm_X85-KjeZ 8699 Wrapping a `kerchunk.Array` object directly with xarray TomNicholas 35968931 open 0     3 2024-02-03T22:15:07Z 2024-02-04T21:15:14Z   MEMBER      

What is your issue?

In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using kerchunk.combine.MultiZarrToZarr.

The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores

```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel )

ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays

ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ```

I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals.

For this to work xarray has to: - Wrap a kerchunk.Array object which barely defines any array API methods, including basically not supporting indexing at all, - Store all the information present in a kerchunked Zarr store but without ever loading any data, - Not create any indexes by default during dataset construction or during xr.concat, - Not try to do anything else that can't be defined for a kerchunk.Array. - Possibly we need the Lazy Indexing classes to support concatenation https://github.com/pydata/xarray/issues/4628

It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8699/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
1812811751 I_kwDOAMm_X85sDU_n 8008 "Deep linking" disparate documentation resources together TomNicholas 35968931 open 0     3 2023-07-19T22:18:55Z 2023-10-12T18:36:52Z   MEMBER      

What is your issue?

Our docs have a general issue with having lots of related resources that are not necessarily linked together in a useful way. This results in users (including myself!) getting "stuck" in one part of the docs and being unaware of material that would help them solve their specific issue.

To give a concrete example, if a user wants to know about coarsen, there is relevant material:

  • In the coarsen class docstring
  • On the reshaping page
  • On the computations page
  • On the "how do I?" page
  • On the tutorial repository

Different types of material are great, but only some of these resources are linked to others. Coarsen is actually pretty well covered overall, but for other functions there might be no useful linking at all, or no examples in the docstrings.


The biggest missed opportunity here is the way all the great content on the tutorial.xarray.dev repository is not linked from anywhere on the main documentation site (I believe). To address that we could either (a) integrate the tutorial.xarray.dev material into the main site or (b) add a lot more cross-linking between the two sites.

Identifying sections that could be linked and adding links would be a great task for new contributors.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8008/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1694956396 I_kwDOAMm_X85lBvts 7813 Task naming for general chunkmanagers TomNicholas 35968931 open 0     3 2023-05-03T22:56:46Z 2023-05-05T10:30:39Z   MEMBER      

What is your issue?

(Follow-up to #7019)

When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on, or whether they represent an open_dataset call.

Currently for cubed this doesn't work, for example this graph from https://github.com/pangeo-data/distributed-array-examples/issues/2#issuecomment-1533852877:

cc @tomwhite @dcherian

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1512290017 I_kwDOAMm_X85aI7bh 7403 Zarr error when trying to overwrite part of existing store TomNicholas 35968931 open 0     3 2022-12-28T00:40:16Z 2023-01-11T21:26:10Z   MEMBER      

What happened?

to_zarr threw an error when I tried to overwrite part of an existing zarr store.

What did you expect to happen?

With mode w I was expecting it to overwrite part of the store with no complaints.

I expected that because that's what the docstring of to_zarr says:

mode ({"w", "w-", "a", "r+", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist);

The default mode is "w", so I was expecting it to overwrite.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np np.random.seed(0)

ds = xr.Dataset() ds["data"] = (['x', 'y'], np.random.random((100,100))) ds.to_zarr("test.zarr") print(ds["data"].mean().compute())

returns array(0.49645889) as expected

ds = xr.open_dataset("test.zarr", engine='zarr', chunks={}) ds["data"].mean().compute() print(ds["data"].mean().compute())

still returns array(0.49645889) as expected

ds.to_zarr("test.zarr", mode="a") ```

python <xarray.DataArray 'data' ()> array(0.49645889) <xarray.DataArray 'data' ()> array(0.49645889) Traceback (most recent call last): File "/home/tom/Documents/Work/Code/experimentation/bugs/datatree_nans/mwe_xarray.py", line 16, in <module> ds.to_zarr("test.zarr") File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/dataset.py", line 2091, in to_zarr return to_zarr( # type: ignore File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/api.py", line 1628, in to_zarr zstore = backends.ZarrStore.open_group( File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/zarr.py", line 420, in open_group zarr_group = zarr.open_group(store, **open_kwargs) File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/zarr/hierarchy.py", line 1389, in open_group raise ContainsGroupError(path) zarr.errors.ContainsGroupError: path '' contains a group

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I would like to know what the intended result is supposed to be here, so that I can make sure datatree behaves the same way, see https://github.com/xarray-contrib/datatree/issues/168.

Environment

Main branch of xarray, zarr v2.13.3

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7403/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
906023492 MDExOlB1bGxSZXF1ZXN0NjU3MDYxODI5 5400 Multidimensional histogram TomNicholas 35968931 open 0     3 2021-05-28T20:38:53Z 2022-11-21T22:41:01Z   MEMBER   0 pydata/xarray/pulls/5400

Initial work on integrating the multi-dimensional dask-powered histogram functionality from xhistogram into xarray. Just working on the skeleton to fit around the histogram algorithm for now, to be filled in later.

  • [x] Closes #4610
  • [x] API skeleton
  • [x] Input checking
  • [ ] Internal blockwise algorithm from https://github.com/xgcm/xhistogram/pull/49
  • [x] Redirect plot.hist
  • [x] da.weighted().hist()
  • [ ] Tests added for results
  • [x] Hypothesis tests for different chunking patterns
  • [ ] Examples in documentation
  • [ ] Examples in docstrings
  • [x] Type hints (first time trying these so might be wrong)
  • [ ] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
  • [x] Range argument
  • [ ] Handle multidimensional bins (for a future PR? - See https://github.com/xgcm/xhistogram/pull/59)
  • [ ] Handle np.datetime64 dtypes by refactoring to use np.searchsorted (for a future PR? See discussion)
  • [ ] Fast path for uniform bin widths (for a future PR? See suggestion)

Question: da.hist() or da.histogram()?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5400/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
400289716 MDU6SXNzdWU0MDAyODk3MTY= 2686 Is `create_test_data()` public API? TomNicholas 35968931 open 0     3 2019-01-17T14:00:20Z 2022-04-09T01:48:14Z   MEMBER      

We want to encourage people to use and extend xarray, and we already provide testing functions as public API to help with this.

One function I keep using when writing code which uses xarray is xarray.tests.test_dataset.create_test_data(). This is very useful for quickly writing tests for the same reasons that it's useful in xarray's internal tests, but it's not explicitly public API. This means that there's no guarantee it won't change/disappear, which is not ideal if you're trying to write a test suite for separate software. But so many tests in xarray rely on it that presumably it's not going to get changed.

Is there any reason why it shouldn't be public API? Is there something I should use instead?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2686/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 57.257ms · About: xarray-datasette