home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

14 rows where comments = 7, state = "closed" and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 9
  • pull 5

state 1

  • closed · 14 ✖

repo 1

  • xarray 14
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
864249974 MDU6SXNzdWU4NjQyNDk5NzQ= 5202 Make creating a MultiIndex in stack optional shoyer 1217238 closed 0     7 2021-04-21T20:21:03Z 2022-03-17T17:11:42Z 2022-03-17T17:11:42Z MEMBER      

As @Hoeze notes in https://github.com/pydata/xarray/issues/5179, calling stack() can be "incredibly slow and memory-demanding, since it creates a MultiIndex of every possible coordinate in the array."

This is true with how stack() works currently, but I'm not sure this is necessary. I suspect it's a vestigial design choice from copying pandas, back from before Xarray had optional indexes. One benefit is that it's convenient for making unstack() the inverse of stack(), but isn't always required.

Regardless of how we define the semantics for boolean indexing (https://github.com/pydata/xarray/issues/1887), it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as ds.stack(index=False). This would be equivalent to calling reset_index() after stack() but would be cheaper because the MultiIndex is never created in the first place.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5202/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
416554477 MDU6SXNzdWU0MTY1NTQ0Nzc= 2797 Stalebot is being overly aggressive shoyer 1217238 closed 0     7 2019-03-03T19:37:37Z 2021-06-03T21:31:46Z 2021-06-03T21:22:48Z MEMBER      

E.g., see https://github.com/pydata/xarray/issues/1151 where stalebot closed an issue even after another comment.

Is this something we need to reconfigure or just a bug?

cc @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2797/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
645154872 MDU6SXNzdWU2NDUxNTQ4NzI= 4179 Consider revising our minimum dependency version policy shoyer 1217238 closed 0     7 2020-06-25T05:04:38Z 2021-02-22T05:02:25Z 2021-02-22T05:02:25Z MEMBER      

Our current policy is that xarray supports "the minor version (X.Y) initially published no more than N months ago" where N is:

  • Python: 42 months (NEP 29)
  • numpy: 24 months (NEP 29)
  • pandas: 12 months
  • scipy: 12 months
  • sparse, pint and other libraries that rely on NEP-18 for integration: very latest available versions only,
  • all other libraries: 6 months

I think this policy is too aggressive, particularly for pandas, SciPy and other libraries. Some of these projects can go 6+ months between minor releases. For example, version 2.3 of zarr is currently more than 6 months old. So if zarr released 2.4 today and xarray issued a new release tomorrow, and then our policy would dictate that we could ask users to upgrade to the new version.

In https://github.com/pydata/xarray/pull/4178, I misinterpreted our policy as supporting "the most recent minor version (X.Y) initially published more than N months ago". This version makes a bit more sense to me: users only need to upgrade dependencies at least every N months to use the latest xarray release.

I understand that NEP-29 chose its language intentionally, so that distributors know ahead of time when they can drop support for a Python or NumPy version. But this seems like a (very) poor fit for projects without regular releases. At the very least we should adjust the specific time windows.

I'll see if I can gain some understanding of the motivation for this particular language over on the NumPy tracker...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4179/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
644821435 MDU6SXNzdWU2NDQ4MjE0MzU= 4176 Pre-expand data and attributes in DataArray/Variable HTML repr? shoyer 1217238 closed 0     7 2020-06-24T18:22:35Z 2020-09-21T20:10:26Z 2020-06-28T17:03:40Z MEMBER      

Proposal

Given that a major purpose for plotting an array is to look at data or attributes, I wonder if we should expand these sections by default? - I worry that clicking on icons to expand sections may not be easy to discover - This would also be consistent with the text repr, which shows these sections by default (the Dataset repr is already consistent by default between text and HTML already)

Context

Currently the HTML repr for DataArray/Variable looks like this:

To see array data, you have to click on the icon:

(thanks to @max-sixty for making this a little bit more manageably sized in https://github.com/pydata/xarray/pull/3905!)

There's also a really nice repr for nested dask arrays:

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4176/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
459569339 MDExOlB1bGxSZXF1ZXN0MjkwODg0OTY5 3039 Set up CI with Azure Pipelines (and remove Appveyor) shoyer 1217238 closed 0     7 2019-06-23T12:16:56Z 2019-06-28T14:44:53Z 2019-06-27T20:44:12Z MEMBER   0 pydata/xarray/pulls/3039

xref https://github.com/astropy/astropy/pull/8445

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3039/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
127068208 MDU6SXNzdWUxMjcwNjgyMDg= 719 Follow-ups on MultIndex support shoyer 1217238 closed 0     7 2016-01-17T01:42:59Z 2019-02-23T09:47:00Z 2019-02-23T09:47:00Z MEMBER      

xref #702 - [ ] Serialization to NetCDF - [x] Better repr, showing level names/dtypes? - [x] Indexing a scalar at a particular level should drop that level from the MultiIndex (#767) - [x] Make levels accessible as coordinate variables (e.g., ds['time'] can pull out the 'time' level of a multi-index) - [x] Support indexing with levels, e.g., ds.sel(time='2000-01'). - [x] ~~Make isel_points/sel_points return objects with a MultiIndex? (probably after the previous TODO, so we can preserve basic backwards compatibility)~~ (differed until we figure out #974) - [x] Add set_index/reset_index/swaplevel to make it easier to create and manipulate multi-indexes

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/719/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
68759727 MDU6SXNzdWU2ODc1OTcyNw== 392 Non-aggregating grouped operations on dask arrays are painfully slow to construct shoyer 1217238 closed 0     7 2015-04-15T18:45:28Z 2019-02-01T23:06:35Z 2019-02-01T23:06:35Z MEMBER      

These are both entirely lazy operations:

```

%time res = ds.groupby('time.month').mean('time') CPU times: user 142 ms, sys: 20.3 ms, total: 162 ms Wall time: 159 ms %time res = ds.groupby('time.month').apply(lambda x: x - x.mean()) CPU times: user 46.1 s, sys: 4.9 s, total: 51 s Wall time: 50.4 s ```

I suspect the issue (in part) is that _interleaved_concat_slow indexes out single elements from each dask array along the grouped axis prior to concatenating them together (unit tests for interleaved_concat can be found here). So we end up creating way too many small dask arrays.

Profiling results on slightly smaller data are in this gist.

It would be great if we could figure out a way to make this faster, because these sort of operations are a really nice show case for xray + dask.

CC @mrocklin in case you have any ideas.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/392/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
148757289 MDU6SXNzdWUxNDg3NTcyODk= 824 Disable lock=True in open_mfdataset when reading netCDF3 files shoyer 1217238 closed 0     7 2016-04-15T20:14:07Z 2019-01-30T04:37:50Z 2019-01-30T04:37:36Z MEMBER      

This slows things down unnecessarily.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/824/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
344093951 MDExOlB1bGxSZXF1ZXN0MjAzNTcwODMz 2309 DOC: add initial draft of a development roadmap for xarray shoyer 1217238 closed 0     7 2018-07-24T15:39:49Z 2018-08-02T17:01:36Z 2018-07-27T23:50:55Z MEMBER   0 pydata/xarray/pulls/2309

See here for the rendered version: https://github.com/shoyer/xarray/blob/4dd4463d5f79c07eb3a01f67b7329f2f4a60dd31/doc/roadmap.rst

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2309/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
156793282 MDExOlB1bGxSZXF1ZXN0NzE0MjE2MTE= 860 Switch py2.7 CI build to use conda-forge shoyer 1217238 closed 0     7 2016-05-25T16:23:19Z 2016-05-31T05:23:35Z 2016-05-26T01:53:28Z MEMBER   0 pydata/xarray/pulls/860
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/860/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
108271509 MDExOlB1bGxSZXF1ZXN0NDU5ODA2NjU= 589 New encoding keyword argument for to_netcdf shoyer 1217238 closed 0   0.6.1 1307323 7 2015-09-25T06:24:53Z 2015-10-21T07:08:00Z 2015-10-08T01:08:52Z MEMBER   0 pydata/xarray/pulls/589

Fixes #548

In particular, it would be helpful if someone else could review the new documentation section to see if it makes sense.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/589/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
33639540 MDU6SXNzdWUzMzYzOTU0MA== 133 Functions for converting to and from CDAT cdms2 variables shoyer 1217238 closed 0     7 2014-05-16T01:09:14Z 2015-04-24T22:39:03Z 2014-12-19T09:11:39Z MEMBER      

Apparently CDAT has a number of useful modules for working with weather and climate data, especially for things like computing climatologies (related: #112). There's no point in duplicating that work in xray, of course (also, climatologies may be too domain specific for xray), so we should make it possible to use both xray and CDAT interchangeably.

Unfortunately, I haven't used CDAT, so it not obvious to me what the right interface is. Also, CDAT seems to be somewhat difficult (impossible?) to install as a Python library, so it may be hard to setup automated testing.

CC @DamienIrving

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/133/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
58310637 MDU6SXNzdWU1ODMxMDYzNw== 328 Support out-of-core computation using dask shoyer 1217238 closed 0   0.5 987654 7 2015-02-20T05:02:22Z 2015-04-17T21:03:12Z 2015-04-17T21:03:12Z MEMBER      

Dask is a library for out of core computation somewhat similar to biggus in conception, but with slightly grander aspirations. For examples of how Dask could be applied to weather data, see this blog post by @mrocklin: http://matthewrocklin.com/blog/work/2015/02/13/Towards-OOC-Slicing-and-Stacking/

It would be interesting to explore using dask internally in xray, so that we can implement lazy/out-of-core aggregations, concat and groupby to complement the existing lazy indexing. This functionality would be quite useful for xray, and even more so than merely supporting datasets-on-disk (#199).

A related issue is #79: we can easily imagine using Dask with groupby/apply to power out-of-core and multi-threaded computation.

Todos for xray: - [x] refactor Variable.concat to make use of functions like concatenate and stack instead of in-place array modification (Dask arrays do not support mutation, for good reasons) - [x] refactor reindex_variables to not make direct use of mutation (e.g., by using da.insert below) - [x] add some sort of internal abstraction to represent "computable" arrays that are not necessarily numpy.ndarray objects (done: this is the data attribute) - [x] expose reblock in the public API - [x] load datasets into dask arrays from disk - [x] load dataset from multiple files into dask - [x] ~~some sort of API for user controlled lazy apply on dask arrays (using groupby, mostly likely)~~ (not necessary for initial release) - [x] save from dask arrays - [x] an API for lazy ufuncs like sin and sqrt - [x] robustly handle indexing along orthogonal dimensions if dask can't handle it directly.

Todos for dask (to be clear, none of these are blockers for a proof of concept): - [x] support for NaN skipping aggregations - [x] ~~support for interleaved concatenation (necessary for transformations by group, which are quite common)~~ (turns out to be a one-liner with concatenate and take, see below) - [x] ~~support for something like take_nd from pandas: like np.take, but with -1 as a sentinel value for "missing" (necessary for many alignment operations)~~ da.insert, modeled after np.insert would solve this problem. - [x] ~~support "orthogonal" MATLAB-like array-based indexing along multiple dimensions~~ (taking along one axis at a time is close enough) - [x] broadcast_to: see https://github.com/numpy/numpy/pull/5371

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/328/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
44718119 MDExOlB1bGxSZXF1ZXN0MjIxNTg1MTc= 245 Modular encodings (rebased) shoyer 1217238 closed 0   0.3.1 799012 7 2014-10-02T18:05:50Z 2014-10-23T06:27:16Z 2014-10-11T21:30:07Z MEMBER   0 pydata/xarray/pulls/245

This change is rebased on master and should let us pick up from #175. CC @akleeman


Restructured Backends to make CF convention application more consistent.

Amongst other things this includes: - EncodedDataStores which can wrap other stores and allow for modular encoding/decoding. - Trivial indices ds['x'] = ('x', np.arange(10)) are no longer stored on disk and are only created when accessed. - AbstractDataStore API change. Shouldn't effect external users. - missing_value attributes now function like _FillValue

All current tests are passing (though it could use more new ones).


Post rebase notes (shoyer, Oct 2, 2014): Most tests are passing, though a couple are broken: - test_roundtrip_mask_and_scale (because this change needs a fix to not break the current API) - test_roundtrip_strings_with_fill_value on TestCFEncodedDataStore (I don't entirely understand why, let's come back to it later)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/245/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 90.286ms · About: xarray-datasette