home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

13 rows where author_association = "NONE" and user = 1634164 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 9

  • How should xarray use/support sparse arrays? 3
  • Handle bool in NetCDF4 conversion 2
  • Decorators for registering custom accessors in xarray 2
  • pd.Period can't be used as a 1-element coord 1
  • Importing xarray fails if old version of bottleneck is installed 1
  • sparse and other duck array issues 1
  • concat() fails when args have sparse.COO data and different fill values 1
  • Duck array compatibility meeting 1
  • RuntimeError when formatting sparse-backed DataArray in f-string 1

user 1

  • khaeru · 13 ✖

author_association 1

  • NONE · 13 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1534695467 https://github.com/pydata/xarray/issues/3213#issuecomment-1534695467 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X85beZgr khaeru 1634164 2023-05-04T12:31:22Z 2023-05-04T12:31:22Z NONE

That's a totally valid scope limitation for the sparse package, and I understand the motivation.

I'm just saying that the principle of least astonishment is not being followed: the user cannot at the moment read either the xarray or sparse docs and know which portions of the xarray API will work when giving …, sparse=True, and which instead require a deliberate choice to densify, or see examples of how best to mix the two. It would be helpful to clarify—that's all.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1534231523 https://github.com/pydata/xarray/issues/3213#issuecomment-1534231523 https://api.github.com/repos/pydata/xarray/issues/3213 IC_kwDOAMm_X85bcoPj khaeru 1634164 2023-05-04T07:40:26Z 2023-05-04T07:40:26Z NONE

@jbbutler please also see this comment et seq. https://github.com/pydata/sparse/issues/1#issuecomment-792342987 and related pydata/sparse#438.

To add to @rabernat's point about sparse support being "not well documented", I suspect (but don't know, as I'm just a user of xarray, not a developer) that it's also not thoroughly tested. I expected to be able to use e.g. DataArray.cumprod when the underlying data was sparse, but could not.

IMHO, I/O to/from sparse-backed objects is less valuable if only a small subset of xarray functionality is available on those objects. Perhaps explicitly testing/confirming which parts of the API do/do not currently work with sparse would support the improvements to the docs that Ryan mentioned, and reveal the work remaining to provide full(er) support.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
1209127586 https://github.com/pydata/xarray/issues/6822#issuecomment-1209127586 https://api.github.com/repos/pydata/xarray/issues/6822 IC_kwDOAMm_X85IEdKi khaeru 1634164 2022-08-09T09:17:39Z 2022-08-09T09:17:39Z NONE

Thanks @Illviljan for the fix! 🙏🏾

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  RuntimeError when formatting sparse-backed DataArray in f-string 1316423844
896866803 https://github.com/pydata/xarray/issues/5648#issuecomment-896866803 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X841dRnz khaeru 1634164 2021-08-11T14:18:05Z 2021-08-11T14:18:05Z NONE

👂🏾

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
541160571 https://github.com/pydata/xarray/issues/3381#issuecomment-541160571 https://api.github.com/repos/pydata/xarray/issues/3381 MDEyOklzc3VlQ29tbWVudDU0MTE2MDU3MQ== khaeru 1634164 2019-10-11T17:49:09Z 2019-10-11T17:49:09Z NONE

Thanks both for the comments. I understand sparse's behaviour; to clarify, the bug (IMO) is that xarray doesn't handle this for the user. To condense my example: ```python

Same as above to ---

import numpy as np import pandas as pd import xarray as xr

foo = [f'foo{i}' for i in range(6)] bar = [f'bar{i}' for i in range(6)] raw = np.random.rand(len(foo) // 2, len(bar))

b_series = pd.DataFrame(raw, index=foo[3:], columns=bar) \ .stack() \ .rename_axis(index=['foo', 'bar'])

---

b = xr.DataArray.from_series(b_series, sparse=True) c = b.sum(dim='foo').expand_dims({'foo': ['total']}) d = xr.concat([b, c], dim='foo') ```

This succeeds when sparse=False and fails when sparse=True. - Shouldn't it succeed automatically? I feel like it should. - If it does, what should be the fill value on d? I'm not clear what the intended behaviour is.

I haven't touched xarray internals before, but if time allows I will try to add some tests.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat() fails when args have sparse.COO data and different fill values 503711327
539215442 https://github.com/pydata/xarray/issues/3245#issuecomment-539215442 https://api.github.com/repos/pydata/xarray/issues/3245 MDEyOklzc3VlQ29tbWVudDUzOTIxNTQ0Mg== khaeru 1634164 2019-10-07T21:37:53Z 2019-10-07T21:37:53Z NONE

As far as I can tell, the proposal here will require either python s = pd.Series(...) xr.DataArray.from_series(s).to_series() or: python xr.DataArray.from_series(s, sparse=True).to_dense().to_series()

For any code that can't guarantee sparse/non-sparse input, the first will fail sometimes, so it will always be necessary to write the latter everywhere, which IMO is unnecessarily verbose.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  sparse and other duck array issues 484240082
520741706 https://github.com/pydata/xarray/issues/3213#issuecomment-520741706 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDUyMDc0MTcwNg== khaeru 1634164 2019-08-13T08:31:30Z 2019-08-13T08:31:30Z NONE

This is very exciting! In energy-economic research (unlike, e.g., earth systems research), data are almost always sparse, so first-class sparse support will be broadly useful.

I'm leaving a comment here (since this seems to be a meta-issue; please link from wherever else, if needed) with two example use-cases. For the moment, #3206 seems to cover them, so I can't name any specific additional features.

  1. MESSAGEix is an energy systems optimization model framework, formulated as a linear program.
  2. Some variables have many dimensions, for instance, the input coefficient for a technology has the dimensions (node_loc, technology, year_vintage, year_active, mode, node_origin, commodity, level, time, time_origin).
    • In the global version of our model, the technology dimension has over 400 labels.
    • Often two or more dimensions are tied, eg technology='coal power plant' will only take input from (commodity='coal', level='primary energy'); all other combinations of (commodity, level) are empty for this technology.
    • So, this data is inherently sparse.
  3. For modeling research, specifying quantities in this way is a good design because (a) it is intuitive to researchers in this domain, and (b) the optimization model is solved using various LP solvers via GAMS, which automatically prune zero rows in the resulting matrices.

    • When we were developing a dask/DAG-based system for model results post-processing, we wanted to use xarray, but had some quantities with tens of millions of elements that were less than 1% full. Here is some test code that triggered MemoryErrors using xarray. We chose to fall back on using a pd.Series subclass that mocks xarray methods.
  4. In transportation research, stock models of vehicle fleets are often used.

    • These models always have at least two time dimensions: cohort (the time period in which a vehicle was sold) and period(s) in which it is used (and thus consumes fuel, etc.).
    • Since a vehicle sold in 2020 can't be used in 2015, these data are always triangular w.r.t. these two dimensions. (The dimensions year_vintage and year_active in example #1 above have the same relationship.)
    • Once multiplied by other dimensions (technology; fuel; size or shape or market segment; embodied materials; different variables; model runs across various scenarios or input assumptions) the overhead of dense arrays can become problematic.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
372148226 https://github.com/pydata/xarray/issues/1761#issuecomment-372148226 https://api.github.com/repos/pydata/xarray/issues/1761 MDEyOklzc3VlQ29tbWVudDM3MjE0ODIyNg== khaeru 1634164 2018-03-11T20:53:49Z 2018-03-11T20:59:08Z NONE

~Also experiencing this, though for a different method & version of bottleneck:~

Sorry, turns out this was due to a Python 3.5 → 3.6 upgrade without re-install of pip packages. Please disregard!

$ pip list | grep -Ei "(bottleneck|xarray)" Bottleneck 1.2.1 xarray 0.10.1 $ python3 -c "import xarray" Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/khaeru/.local/lib/python3.6/site-packages/xarray/__init__.py", line 10, in <module> from .core.extensions import (register_dataarray_accessor, File "/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/extensions.py", line 7, in <module> from .dataarray import DataArray File "/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/dataarray.py", line 16, in <module> from . import rolling File "/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/rolling.py", line 377, in <module> inject_bottleneck_rolling_methods(DataArrayRolling) File "/home/khaeru/.local/lib/python3.6/site-packages/xarray/core/ops.py", line 362, in inject_bottleneck_rolling_methods f = getattr(bn, bn_name) AttributeError: module 'bottleneck' has no attribute 'move_sum'

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Importing xarray fails if old version of bottleneck is installed 279456192
221960807 https://github.com/pydata/xarray/pull/401#issuecomment-221960807 https://api.github.com/repos/pydata/xarray/issues/401 MDEyOklzc3VlQ29tbWVudDIyMTk2MDgwNw== khaeru 1634164 2016-05-26T18:51:06Z 2016-05-26T18:51:06Z NONE

@jhamman thanks for taking this up and finishing it!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle bool in NetCDF4 conversion 70805273
202527380 https://github.com/pydata/xarray/pull/806#issuecomment-202527380 https://api.github.com/repos/pydata/xarray/issues/806 MDEyOklzc3VlQ29tbWVudDIwMjUyNzM4MA== khaeru 1634164 2016-03-28T18:52:01Z 2016-03-28T18:53:35Z NONE

@fmaussion that's still helpful, thanks.

For cases where you want to do custom initialization, the suggestion (which I should add) is to simply write your own function to use in place of xarray.open_dataset.

Now that I think of it, it should also be possible to use some other in logic in __init__()—or even as a kludge store something like xarray_obj.attrs['_geoaccessor_state']—to determine whether the object is already, or needs to be, "initialized" (whatever that happens to mean for each accessor).

For instance, if the accessor creates and uses certain variables in a Dataset, it could check for their presence, and skip any initialization code if they already exist.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decorators for registering custom accessors in xarray 143877458
202473784 https://github.com/pydata/xarray/pull/806#issuecomment-202473784 https://api.github.com/repos/pydata/xarray/issues/806 MDEyOklzc3VlQ29tbWVudDIwMjQ3Mzc4NA== khaeru 1634164 2016-03-28T16:31:59Z 2016-03-28T16:31:59Z NONE

Of the two different projects I'm working (sporadically) on that both subclass Dataset, it seems like one (pyGDX) should more properly be a backend, while the other could work as an accessor. This code looks good!

Just to be clear—xarray_obj is passed to the __init__() method of an accessor. Will this happen before, or after Dataset.__init__()/DataArray.__init__() is invoked?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decorators for registering custom accessors in xarray 143877458
202469160 https://github.com/pydata/xarray/issues/805#issuecomment-202469160 https://api.github.com/repos/pydata/xarray/issues/805 MDEyOklzc3VlQ29tbWVudDIwMjQ2OTE2MA== khaeru 1634164 2016-03-28T16:19:05Z 2016-03-28T16:19:05Z NONE

@jhamman — you're right. In truth, I was working with some more complex code using a PeriodIndex and getting errors I couldn't decipher, so I pulled those lines from the docs and played with them to try to understand what was happening. I don't know why it's that way in the docs…maybe because ds['reference_time'] or ds.reference_time is more concise than ds.attrs['reference_time']?

@shoyer — thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  pd.Period can't be used as a 1-element coord 143764621
96086024 https://github.com/pydata/xarray/pull/401#issuecomment-96086024 https://api.github.com/repos/pydata/xarray/issues/401 MDEyOklzc3VlQ29tbWVudDk2MDg2MDI0 khaeru 1634164 2015-04-24T22:40:17Z 2015-04-24T22:40:17Z NONE

Thanks—putting this up was evidently the fastest ways to get pointers to those examples in the code!

I'll add those items and comment again once I have.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle bool in NetCDF4 conversion 70805273

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1283.872ms · About: xarray-datasette