home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

7 rows where comments = 8, type = "issue" and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 6
  • open 1

type 1

  • issue · 7 ✖

repo 1

  • xarray 7
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
98587746 MDU6SXNzdWU5ODU4Nzc0Ng== 508 Ignore missing variables when concatenating datasets? shoyer 1217238 closed 0     8 2015-08-02T06:03:57Z 2023-01-20T16:04:28Z 2023-01-20T16:04:28Z MEMBER      

Several users (@raj-kesavan, @richardotis, now myself) have wondered about how to concatenate xray Datasets with different variables.

With the current xray.concat, you need to awkwardly create dummy variables filled with NaN in datasets that don't have them (or drop mismatched variables entirely). Neither of these are great options -- concat should have an option (the default?) to take care of this for the user.

This would also be more consistent with pd.concat, which takes a more relaxed approach to matching dataframes with different variables (it does an outer join).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/508/reactions",
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
711626733 MDU6SXNzdWU3MTE2MjY3MzM= 4473 Wrap numpy-groupies to speed up Xarray's groupby aggregations shoyer 1217238 closed 0     8 2020-09-30T04:43:04Z 2022-05-15T02:38:29Z 2022-05-15T02:38:29Z MEMBER      

Is your feature request related to a problem? Please describe.

Xarray's groupby aggregations (e.g., groupby(..).sum()) are very slow compared to pandas, as described in https://github.com/pydata/xarray/issues/659.

Describe the solution you'd like

We could speed things up considerably (easily 100x) by wrapping the numpy-groupies package.

Additional context

One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now.

In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the "grouped" dimension (depending on the size of the unique group values).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4473/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
269700511 MDU6SXNzdWUyNjk3MDA1MTE= 1672 Append along an unlimited dimension to an existing netCDF file shoyer 1217238 open 0     8 2017-10-30T18:09:54Z 2020-11-29T17:35:04Z   MEMBER      

This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks

It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to to_netcdf(), e.g., extend='time' to indicate the extended dimension.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1672/reactions",
    "total_count": 21,
    "+1": 21,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
169274464 MDU6SXNzdWUxNjkyNzQ0NjQ= 939 Consider how to deal with the proliferation of decoder options on open_dataset shoyer 1217238 closed 0     8 2016-08-04T01:57:26Z 2020-10-06T15:39:11Z 2020-10-06T15:39:11Z MEMBER      

There are already lots of keyword arguments, and users want even more! (#843)

Maybe we should use some sort of object to encapsulate desired options?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/939/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
454168102 MDU6SXNzdWU0NTQxNjgxMDI= 3009 Xarray test suite failing with dask-master shoyer 1217238 closed 0     8 2019-06-10T13:21:50Z 2019-06-23T16:49:23Z 2019-06-23T16:49:23Z MEMBER      

There are a wide variety of failures, mostly related to backends and indexing, e.g., AttributeError: 'tuple' object has no attribute 'tuple'. By the looks of it, something is going wrong with xarray's internal ExplicitIndexer objects, which are getting converted into something else.

I'm pretty sure this is due to the recent merge of the Array._meta pull request: https://github.com/dask/dask/pull/4543

There are 81 test failures, but my guess is that there that probably only a handful (at most) of underlying causes.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3009/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
171077425 MDU6SXNzdWUxNzEwNzc0MjU= 967 sortby() or sort_index() method for Dataset and DataArray shoyer 1217238 closed 0   1.0 741199 8 2016-08-14T20:40:13Z 2017-05-12T00:29:12Z 2017-05-12T00:29:12Z MEMBER      

They should function like the pandas methods of the same name.

Under the covers, I believe it would suffice to simply remap ds.sort_index('time') -> ds.isel(time=ds.indexes['time'].argsort()).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/967/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
40225000 MDU6SXNzdWU0MDIyNTAwMA== 212 Get ride of "noncoordinates" as a name? shoyer 1217238 closed 0   0.3 740776 8 2014-08-14T05:52:30Z 2014-09-22T00:55:22Z 2014-09-22T00:55:22Z MEMBER      

As @ToddSmall has pointed out (in #202), "noncoordinates" is a confusing name -- it's something defined by what it isn't, not what it is.

Unfortunately, our best alternative is "variables", which already has a lot of meaning from the netCDF world (and which we already use).

Related: #211

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/212/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 91.471ms · About: xarray-datasette