home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

6 rows where state = "open", type = "issue" and user = 1386642 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 1

  • issue · 6 ✖

state 1

  • open · 6 ✖

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
416962458 MDU6SXNzdWU0MTY5NjI0NTg= 2799 Performance: numpy indexes small amounts of data 1000 faster than xarray nbren12 1386642 open 0     42 2019-03-04T19:44:17Z 2024-03-18T17:51:25Z   CONTRIBUTOR      

Machine learning applications often require iterating over every index along some of the dimensions of a dataset. For instance, iterating over all the (lat, lon) pairs in a 4D dataset with dimensions (time, level, lat, lon). Unfortunately, this is very slow with xarray objects compared to numpy (or h5py) arrays. When the Pangeo machine learning working group met today, we found that several of us have struggled with this.

I made some simplified benchmarks, which show that xarray is about 1000 times slower than numpy when repeatedly grabbing a small amount of data from an array. This is a problem with both isel or [] indexing. After doing some profiling, the main culprits seem to be xarray routines like _validate_indexers and _broadcast_indexes.

While python will always be slower than C when iterating over an array in this fashion, I would hope that xarray could be nearly as fast as numpy. I am not sure what the best way to improve this is though.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2799/reactions",
    "total_count": 9,
    "+1": 9,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1473152374 I_kwDOAMm_X85XzoV2 7348 Using entry_points to register dataset and dataarray accessors? nbren12 1386642 open 0     4 2022-12-02T16:48:42Z 2023-09-14T19:53:46Z   CONTRIBUTOR      

Is your feature request related to a problem?

External libraries often use the dataset/dataarray accessor pattern (e.g. metpy). These accessors are not available until importing the external package where the registration occurs. This means scripts using these accessors must include an often-unused import that linters will complain about e.g.

``` import metpy # linter complains here

some data

ds: xr.Dataset = ...

ds.metpy.... ```

Describe the solution you'd like

Use importlib entrypoints to register these as entrypoints so that registration is automatically handled. This is currently enabled for the array backend, but not for accessors (e.g. metpy's setup.cfg).

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7348/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
753852119 MDU6SXNzdWU3NTM4NTIxMTk= 4628 Lazy concatenation of arrays nbren12 1386642 open 0     5 2020-11-30T22:32:08Z 2022-05-10T17:02:34Z   CONTRIBUTOR      

Is your feature request related to a problem? Please describe. Concatenating xarray objects forces the data to load. I recently learned about this object allowing lazy indexing into an DataArrays/sets without using dask. Concatenation along a single dimension is the inverse operation of slicing, so it seems natural to also support it. Also, concatenating along dimensions (e.g. "run"/"simulation"/"ensemble") can be a common merging workflow.

Describe the solution you'd like

xr.concat([a, b], dim=...) does not load any data in a or b.

Describe alternatives you've considered One could rename the variables in a and b to allow them to be merged (e.g. a['air_temperature'] -> "air_temperature_a"), but it's more natural to make a new dimension.

Additional context

This is useful when not using dask for performance reasons (e.g. using another parallelism engine like Apache Beam).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4628/reactions",
    "total_count": 8,
    "+1": 8,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
588112617 MDU6SXNzdWU1ODgxMTI2MTc= 3894 Add public API for Dataset._copy_listed nbren12 1386642 open 0     15 2020-03-26T02:39:34Z 2022-04-18T16:41:39Z   CONTRIBUTOR      

In my data pipelines, I have been repeatedly burned using indexing notation to grab a few variables from a dataset in the following way: ds = xr.Dataset(...) vars = ('a' , 'b', 'c') ds[vars] # this errors ds[list(vars)] # this is ok Moreover, because Dataset__getitem__ is type unstable, it makes it hard to detect this kind of error using mypy, so it often appears 30 minutes into a long data pipeline. It would be great to have a type-stable method that can take any sequence of variable names and return the Dataset consisting of those variables and their coordinates only. In fact, this method already exists, but it currently not public API. Could we make it so? Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3894/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
224846826 MDU6SXNzdWUyMjQ4NDY4MjY= 1387 FacetGrid with independent colorbars nbren12 1386642 open 0     7 2017-04-27T16:47:44Z 2022-04-13T11:07:49Z   CONTRIBUTOR      

Sometimes the magnitude of a variable can vary dramatically across a given coordinate, which makes 2d plots generated by xr.FacetGrid difficult to interpret. It would be useful if an option to xr.FacetGrid could be specified which allows each subplot to have its own colorbar.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1387/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1132894350 I_kwDOAMm_X85DhpiO 6269 Adding CDL Parser/`open_cdl`? nbren12 1386642 open 0     7 2022-02-11T17:31:36Z 2022-02-14T17:18:38Z   CONTRIBUTOR      

Is your feature request related to a problem?

No.

Describe the solution you'd like

It would be nice to load/generate xarray datasets from Common Data Language (CDL) descriptions. CDL is a DSL that that defines a netCDF dataset, and is quite nice for testing. We use it to build mock datasets for e.g. integration testing of plotting routines/complex data analysis etc. CDL provides a concise format for storing the schema of this data. This schema can be used for validation or generation (using the CLI ncgen).

CDL is basically the format produced by xarray.Dataset.info. It looks like this: netcdf example { // example of CDL notation dimensions: lon = 3 ; lat = 8 ; variables: float rh(lon, lat) ; rh:units = "percent" ; rh:long_name = "Relative humidity" ; // global attributes :title = "Simple example, lacks some conventions" ; data: /// optional ...ncgen will still build rh = 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89 ; }

I wrote a small pure python parser for CDL last night and it seems work! There are similar projects on github. Sadly, these projects seem to be abandoned so it would be nice to attach to an effort like xarray.

Describe alternatives you've considered

Some kind of schema object that can be used to validate or generate an xarray Dataset, but does not contain any data.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6269/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 963.399ms · About: xarray-datasette