home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 258460719

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/1077#issuecomment-258460719 https://api.github.com/repos/pydata/xarray/issues/1077 258460719 MDEyOklzc3VlQ29tbWVudDI1ODQ2MDcxOQ== 1217238 2016-11-04T15:22:12Z 2016-11-04T15:22:12Z MEMBER

Personally I'd vote for the category encoded values. If I make files with a newer xarray, I'll be reading them later with the same (or newer) xarray and I'd definitely want the exact MultiIndex back.

Point taken -- let's see what others think!

One consideration in favor of this is that it will soon be very easy to switch a MultiIndex back into separate coordinate variables, which could be our recommendation for how to save netCDF files for maximum portability.

The one thing I'm wondering is, what happens in an application like this if you select on one index (say, all data rows with region_name='FOOBAR-1') from the HDF5 file before doing anything else? Would it hard to make the MultiIndex/NetCDF reader smart enough not to reconstruct the whole MultiIndex before picking out the relevant rows?

We could do this, but note that we are contemplating switching xarray to always load indexes into memory eagerly, which would negate that advantage. See this PR and mailing list discussion: https://github.com/pydata/xarray/pull/1024#issuecomment-256114879 https://groups.google.com/forum/#!topic/xarray/dK2RHUls1nQ

Nuts and bolts questions: So each of index.levels would be easy to store as its own little DataArray, yeah? Then would each of the index.labels be in its own DataArray, or would you want them all in the same 2D DataArray?

pandas stores levels separately, automatically putting each of them in the smallest possible dtype (int8, int16, int32 or int64). So we also probably want to store them in separate 1D variables.

And then would the actual data in the original DataArray just have a generic integer index as a placeholder, to be replaced by the MultiIndex?

Just a note: for interacting with backends, we use Variable objects instead of DataArrays: http://xarray.pydata.org/en/stable/internals.html#variable-objects

This means that we don't need the generic integer placeholder index (which will also be going away shortly in general, see https://github.com/pydata/xarray/pull/1017).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  187069161
Powered by Datasette · Queries took 0.562ms · About: xarray-datasette