home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where user = 17055041 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • MultiIndex serialization to NetCDF 5
  • Multi-index indexing 2

user 1

  • tippetts · 7 ✖

author_association 1

  • NONE 7
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
261106183 https://github.com/pydata/xarray/issues/1077#issuecomment-261106183 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDI2MTEwNjE4Mw== tippetts 17055041 2016-11-16T23:27:05Z 2016-11-16T23:27:05Z NONE

Yes, I suppose it doesn't really need to live in core xarray, unless you did want to allow a Dataset to contain other Datasets.

@benbovy , do you plan to put your DatasetNode code into some other package?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
260167167 https://github.com/pydata/xarray/issues/1077#issuecomment-260167167 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDI2MDE2NzE2Nw== tippetts 17055041 2016-11-13T04:56:43Z 2016-11-13T04:56:43Z NONE

Would it be too simplistic to think that xarray.Dataset (or a subclass of it) could be made to contain other Datasets? That would extend the conceptual map of xarray.Dataset <==> HDF5 group. The contained Datasets would probably also want to have a reference to their parent Dataset, for walking back up the tree.

I think that is similar to what you've done, @benbovy , but with inheritance rather than composition. I understand that is an often disfavored design pattern, but it would it make sense in this case and keep the overall xarray interface simple?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
260148070 https://github.com/pydata/xarray/issues/1077#issuecomment-260148070 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDI2MDE0ODA3MA== tippetts 17055041 2016-11-12T21:00:31Z 2016-11-12T21:00:31Z NONE

Here's a new, related question: @shoyer , do you have any interest in adding a class to xarray that contains a hierarchical tree of Datasets, analogous to the groups in a netCDF or HDF5 file? Then opening or saving such an object would be an easy but powerful one-liner.

Or is that something you would rather leave to someone else's module?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
258560862 https://github.com/pydata/xarray/issues/1077#issuecomment-258560862 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDI1ODU2MDg2Mg== tippetts 17055041 2016-11-04T22:14:05Z 2016-11-04T22:14:05Z NONE

So if I'm properly understanding and synthesizing your ( @benbovy and @shoyer ) comments: We want the hybrid format for maximum compatibility, with the MultiIndex split into separate 1D raw value coordinates. Using the example above, these would be [1, 1, 2, 2, 3, 3] and ['a', 'b', 'a', 'b', 'a', 'b']. The information about which coordinates are in a MultiIndex (and their order) gets saved in an attribute on the data in the file, like data.attrs['multiindex_levels'] = 'numbers letters'. So 3rd-party tools (or older xarray) will have the non-MultiIndex coords to use, but newer xarray will see the 'multiindex_levels' and automatically reconstruct the MultiIndex when the file is read.

@shoyer , I see what you mean about Variable or future DataArrays not needing a placeholder index. Would that still be backwards-compatible with older xarrays if a saved DataArray has one dim that is a MultiIndex and other dims that are not?

@benbovy , what does the encoding attribute do? It seems to me that, for a DataArray that's already created or loaded, xarray knows about its MultiIndexes and could do the right thing while writing to the backend file without being told to. Are you referring to the metadata in the file (like 'multiindex_levels') that ensures proper interpretation and automatic reconstruction when reading?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
258351232 https://github.com/pydata/xarray/issues/1077#issuecomment-258351232 https://api.github.com/repos/pydata/xarray/issues/1077 MDEyOklzc3VlQ29tbWVudDI1ODM1MTIzMg== tippetts 17055041 2016-11-04T05:59:37Z 2016-11-04T05:59:37Z NONE

Personally I'd vote for the category encoded values. If I make files with a newer xarray, I'll be reading them later with the same (or newer) xarray and I'd definitely want the exact MultiIndex back.

I don't want to be too self-centered in my perspective in all of this. But my applications are definitely in the large-scale scientific computing area that seems to be the community norm for xarray, so I would guess many others would have a similar situation.

I generate data that are associated with nodes or elements in a mesh. The mesh is naturally split into named regions. Sometimes I need to operate on the entire dataset (including all regions) and sometimes I want to select one or more regions. So I make a MultiIndex where the first index is the region name strings, and the second index is the node (or element) number inside the region (i.e. starts over counting from 1 for each region).

So the full index is 1e5 to 1e7 long, of which there are only maybe a few hundred unique values in the string column. I would think that would greatly benefit from the category-encoded storage. And fast and reliable reconstruction of the MultiIndex is a big plus. Does this seem like a common user scenario?

The one thing I'm wondering is, what happens in an application like this if you select on one index (say, all data rows with region_name='FOOBAR-1') from the HDF5 file before doing anything else? Would it hard to make the MultiIndex/NetCDF reader smart enough not to reconstruct the whole MultiIndex before picking out the relevant rows? And, related question for us to think about, how would we make this all play nicely with dask?

Sorry for the long post. I've been very impressed and happy working with xarray, and I'm just eager to get the last bit of features I need so I can really start pushing my colleagues into using it. :)

Nuts and bolts questions: So each of index.levels would be easy to store as its own little DataArray, yeah? Then would each of the index.labels be in its own DataArray, or would you want them all in the same 2D DataArray? And then would the actual data in the original DataArray just have a generic integer index as a placeholder, to be replaced by the MultiIndex?

For these dummy DataArrays and the multiindex_levels metadata attr, how do you feel about using a single leading underscore in the name? If I were to low-level grunge around in the file for some reason, that would indicate to me that they are private-by-convention implementation details.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  MultiIndex serialization to NetCDF 187069161
233482724 https://github.com/pydata/xarray/pull/802#issuecomment-233482724 https://api.github.com/repos/pydata/xarray/issues/802 MDEyOklzc3VlQ29tbWVudDIzMzQ4MjcyNA== tippetts 17055041 2016-07-18T22:50:50Z 2016-07-18T22:50:50Z NONE

Mind if I ask if this will get merged into master? It looks like a lot of work went into the pull request, and the discussion + passed checks lead me to believe it could be close to going in. Is there anything a third party can do to push it across the finish line?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-index indexing 143264649
206410319 https://github.com/pydata/xarray/pull/802#issuecomment-206410319 https://api.github.com/repos/pydata/xarray/issues/802 MDEyOklzc3VlQ29tbWVudDIwNjQxMDMxOQ== tippetts 17055041 2016-04-06T14:45:17Z 2016-04-06T14:45:17Z NONE

This will be a great feature. I for one am really looking forward to using it.

Will this work also allow saving to/reading from hdf5 and netcdf files with a MultiIndex? If not, can you give a sketch outline of the approach you (Stephan or Benoit) would take? I assume it would involve saving the information about the MultiIndex structure in some transformed way that fits into an hdf5 file, then reconstructing it on the read. I might need to hack together something for that before MultiIndex serialization makes it into xarray, but I'd like to make sure I don't veer too far off from the real solution that will ultimately come out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Multi-index indexing 143264649

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.874ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows