home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

12 rows where comments = 8, repo = 13221727 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: updated_at (date), closed_at (date)

type 2

  • issue 7
  • pull 5

state 2

  • closed 11
  • open 1

repo 1

  • xarray · 12 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
98587746 MDU6SXNzdWU5ODU4Nzc0Ng== 508 Ignore missing variables when concatenating datasets? shoyer 1217238 closed 0     8 2015-08-02T06:03:57Z 2023-01-20T16:04:28Z 2023-01-20T16:04:28Z MEMBER      

Several users (@raj-kesavan, @richardotis, now myself) have wondered about how to concatenate xray Datasets with different variables.

With the current xray.concat, you need to awkwardly create dummy variables filled with NaN in datasets that don't have them (or drop mismatched variables entirely). Neither of these are great options -- concat should have an option (the default?) to take care of this for the user.

This would also be more consistent with pd.concat, which takes a more relaxed approach to matching dataframes with different variables (it does an outer join).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/508/reactions",
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
711626733 MDU6SXNzdWU3MTE2MjY3MzM= 4473 Wrap numpy-groupies to speed up Xarray's groupby aggregations shoyer 1217238 closed 0     8 2020-09-30T04:43:04Z 2022-05-15T02:38:29Z 2022-05-15T02:38:29Z MEMBER      

Is your feature request related to a problem? Please describe.

Xarray's groupby aggregations (e.g., groupby(..).sum()) are very slow compared to pandas, as described in https://github.com/pydata/xarray/issues/659.

Describe the solution you'd like

We could speed things up considerably (easily 100x) by wrapping the numpy-groupies package.

Additional context

One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now.

In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the "grouped" dimension (depending on the size of the unique group values).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4473/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
269700511 MDU6SXNzdWUyNjk3MDA1MTE= 1672 Append along an unlimited dimension to an existing netCDF file shoyer 1217238 open 0     8 2017-10-30T18:09:54Z 2020-11-29T17:35:04Z   MEMBER      

This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks

It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to to_netcdf(), e.g., extend='time' to indicate the extended dimension.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1672/reactions",
    "total_count": 21,
    "+1": 21,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
169274464 MDU6SXNzdWUxNjkyNzQ0NjQ= 939 Consider how to deal with the proliferation of decoder options on open_dataset shoyer 1217238 closed 0     8 2016-08-04T01:57:26Z 2020-10-06T15:39:11Z 2020-10-06T15:39:11Z MEMBER      

There are already lots of keyword arguments, and users want even more! (#843)

Maybe we should use some sort of object to encapsulate desired options?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/939/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
187625917 MDExOlB1bGxSZXF1ZXN0OTI1MjQzMjg= 1087 WIP: New DataStore / Encoder / Decoder API for review shoyer 1217238 closed 0     8 2016-11-07T05:02:04Z 2020-04-17T18:37:45Z 2020-04-17T18:37:45Z MEMBER   0 pydata/xarray/pulls/1087

The goal here is to make something extensible that we can live with for quite some time, and to clean up the internals of xarray's backend interface.

Most of these are analogues of existing xarray classes with a cleaned up interface. I have not yet worried about backwards compatibility or tests -- I would appreciate feedback on the approach here.

Several parts of the logic exist for the sake of dask. I've included the word "dask" in comments to facilitate inspection by mrocklin.

CC @rabernat, @pwolfram, @jhamman, @mrocklin -- for review

CC @mcgibbon, @JoyMonteiro -- this is relevant to our discussion today about adding support for appending to netCDF files. Don't let this stop you from getting started on that with the existing interface, though.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1087/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
454168102 MDU6SXNzdWU0NTQxNjgxMDI= 3009 Xarray test suite failing with dask-master shoyer 1217238 closed 0     8 2019-06-10T13:21:50Z 2019-06-23T16:49:23Z 2019-06-23T16:49:23Z MEMBER      

There are a wide variety of failures, mostly related to backends and indexing, e.g., AttributeError: 'tuple' object has no attribute 'tuple'. By the looks of it, something is going wrong with xarray's internal ExplicitIndexer objects, which are getting converted into something else.

I'm pretty sure this is due to the recent merge of the Array._meta pull request: https://github.com/dask/dask/pull/4543

There are 81 test failures, but my guess is that there that probably only a handful (at most) of underlying causes.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3009/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
290320242 MDExOlB1bGxSZXF1ZXN0MTY0MjAzNzAz 1847 Use getitem_with_mask in reindex_variables shoyer 1217238 closed 0     8 2018-01-22T00:19:20Z 2018-05-23T21:13:42Z 2018-02-14T13:11:48Z MEMBER   0 pydata/xarray/pulls/1847

This is an internal refactor of reindexing/alignment to use Variable.getitem_with_mask.

As noted back in https://github.com/pydata/xarray/pull/1751#issuecomment-348380756, there is a nice improvement for alignment with dask (~100x improvement) but we are slower in several cases with NumPy (2-3x).

ASV results (smaller ratio is better): before after ratio [e31cf43e] [5830f2f8] 4.85ms 4.86ms 1.00 reindexing.Reindex.time_1d_coarse 98.15ms 98.97ms 1.01 reindexing.Reindex.time_1d_fine_all_found + 96.88ms 210.71ms 2.17 reindexing.Reindex.time_1d_fine_some_missing 24.47ms 25.18ms 1.03 reindexing.Reindex.time_2d_coarse 433.26ms 437.19ms 1.01 reindexing.Reindex.time_2d_fine_all_found + 245.20ms 711.36ms 2.90 reindexing.Reindex.time_2d_fine_some_missing - 23.78ms 12.79ms 0.54 reindexing.Reindex.time_reindex_coarse - 409.89ms 230.75ms 0.56 reindexing.Reindex.time_reindex_fine_all_found + 233.41ms 369.48ms 1.58 reindexing.Reindex.time_reindex_fine_some_missing 14.39ms 14.20ms 0.99 reindexing.ReindexDask.time_1d_coarse 184.07ms 182.64ms 0.99 reindexing.ReindexDask.time_1d_fine_all_found - 1.44s 277.03ms 0.19 reindexing.ReindexDask.time_1d_fine_some_missing 95.49ms 94.49ms 0.99 reindexing.ReindexDask.time_2d_coarse 910.11ms 916.47ms 1.01 reindexing.ReindexDask.time_2d_fine_all_found failed 997.33ms n/a reindexing.ReindexDask.time_2d_fine_some_missing Note that reindexing.ReindexDask.time_2d_fine_some_missing timed out previously, which I think indicates that it took longer than 60 seconds.

  • [x] Tests passed (for all non-documentation changes)
  • [x] Passes git diff upstream/master **/*py | flake8 --diff (remove if you did not edit any Python files)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1847/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
171077425 MDU6SXNzdWUxNzEwNzc0MjU= 967 sortby() or sort_index() method for Dataset and DataArray shoyer 1217238 closed 0   1.0 741199 8 2016-08-14T20:40:13Z 2017-05-12T00:29:12Z 2017-05-12T00:29:12Z MEMBER      

They should function like the pandas methods of the same name.

Under the covers, I believe it would suffice to simply remap ds.sort_index('time') -> ds.isel(time=ds.indexes['time'].argsort()).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/967/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
197083082 MDExOlB1bGxSZXF1ZXN0OTkwNDA2MzE= 1179 Switch to shared Lock (SerializableLock if possible) for reading/writing shoyer 1217238 closed 0     8 2016-12-22T02:50:43Z 2017-01-04T17:12:58Z 2017-01-04T17:12:46Z MEMBER   0 pydata/xarray/pulls/1179

Fixes #1172

The serializable lock will be useful for dask.distributed or multi-processing (xref #798, #1173, among others).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1179/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
118711154 MDExOlB1bGxSZXF1ZXN0NTE3MjI1MDY= 666 Shift method for shifting data shoyer 1217238 closed 0     8 2015-11-24T21:53:11Z 2015-12-02T23:32:28Z 2015-12-02T23:32:28Z MEMBER   0 pydata/xarray/pulls/666

Fixes #624

New shift method for shifting datasets or arrays along a dimension:

``` In [1]: import xray

In [2]: array = xray.DataArray([5, 6, 7, 8], dims='x')

In [3]: array.shift(x=2) Out[3]: <xray.DataArray (x: 4)> array([ nan, nan, 5., 6.]) Coordinates: * x (x) int64 0 1 2 3 ```

Based on the API proposed for roll in https://github.com/xray/xray/issues/624

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/666/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
40225000 MDU6SXNzdWU0MDIyNTAwMA== 212 Get ride of "noncoordinates" as a name? shoyer 1217238 closed 0   0.3 740776 8 2014-08-14T05:52:30Z 2014-09-22T00:55:22Z 2014-09-22T00:55:22Z MEMBER      

As @ToddSmall has pointed out (in #202), "noncoordinates" is a confusing name -- it's something defined by what it isn't, not what it is.

Unfortunately, our best alternative is "variables", which already has a lot of meaning from the netCDF world (and which we already use).

Related: #211

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/212/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
33772168 MDExOlB1bGxSZXF1ZXN0MTYwMzc5NTA= 134 Fix concatenating Variables with dtype=datetime64 shoyer 1217238 closed 0   0.1.1 664063 8 2014-05-19T05:39:46Z 2014-06-28T01:08:03Z 2014-05-20T19:09:28Z MEMBER   0 pydata/xarray/pulls/134

This is an alternative to #125 which I think is a little cleaner.

Basically, there was a bug where Variable.values for datetime64 arrays always made a copy of values. This made it impossible to edit variable values in-place.

@akleeman would appreciate your thoughts.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/134/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1073.536ms · About: xarray-datasette