home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 902009258 and user = 4160723 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • benbovy · 2 ✖

issue 1

  • Multi-scale datasets and custom indexes · 2 ✖

author_association 1

  • MEMBER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
852833408 https://github.com/pydata/xarray/issues/5376#issuecomment-852833408 https://api.github.com/repos/pydata/xarray/issues/5376 MDEyOklzc3VlQ29tbWVudDg1MjgzMzQwOA== benbovy 4160723 2021-06-02T08:07:38Z 2021-06-02T08:07:38Z MEMBER

What would be other examples like ImagePyramidIndex, outside of the multi-scale context?

There can be many examples like spatial indexes, complex grid indexes (select cell centers/faces of a staggered grid), distributed indexes, etc. Some of them are illustrated in a presentation I gave a couple of weeks ago (slides here). Although all those examples actually do data indexing.

In the multi-scale context, I admit that the name "index" may sound confusing since an ImagePyramidIndex would not really perform any data indexing based on some coordinate labels. Perhaps ImageRescaler would be a better name?

Such ImageRescaler might still fit well the broad purpose Xarray indexes IMHO since it would enable efficient data visualization through extraction and resampling.

The goal with Xarray custom indexes is to allow (many) kinds of objects with a scope possibly much more narrow than, e.g., pandas.Index, and that could possibly be reused in a broader range of operations like data selection, resampling, alignment, etc. Xarray indexes will be explicitly part of Xarray's Dataset/DataArray data model alongside data variables, coordinates and attributes, but unlike the latter they're not intended to wrap any (meta)data. Instead, they could wrap any structure or object that may be built from the (meta)data and that would enable efficient operations on the data (a-priori based on coordinate labels, although in some contexts like multi-scale this might be more accessory?).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-scale datasets and custom indexes 902009258
850307092 https://github.com/pydata/xarray/issues/5376#issuecomment-850307092 https://api.github.com/repos/pydata/xarray/issues/5376 MDEyOklzc3VlQ29tbWVudDg1MDMwNzA5Mg== benbovy 4160723 2021-05-28T10:04:52Z 2021-05-28T10:04:52Z MEMBER

I think there's certainly something to be won just by having a data structure which says these arrays/datasets represent a multiscale series.

I agree, but I'm wondering whether the multiscale series couldn't be also viewed as something that can be abstracted away, i.e., the original dataset (level 0) is the "real" dataset while all other levels are some derived datasets that are convenient for some specific applications (e.g., visualization) but not very useful for general use.

Having a single xarray.Dataset with a custom index (+ custom Dataset extension) taking care of all the multiscale stuff may have benefits too. For example, it would be pretty straightforward reusing a tool like https://github.com/xarray-contrib/xpublish to interactively (pre)fetch data to web-based clients (via some custom API endpoints). More generally, I guess it's easier to integrate with some existing tools built on top of Xarray vs. adding support for a new data structure.

Some related questions (out of curiosity):

  • Are there cases in practice where on-demand downsampling computation would be preferred over pre-computing and storing all pyramid levels for the full dataset? I admit it's probably a very naive question since most workflows on the client side would likely start by loading the top level (lowest resolution) dataset at full extent, which would require pre-computing the whole thing?
  • Are there cases where it makes sense to pre-compute all the the pyramid levels in-memory (could be, e.g., chunked dask arrays persisted on a distributed cluster) without the need to store them?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-scale datasets and custom indexes 902009258

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 25.181ms · About: xarray-datasette