issue_comments
8 rows where author_association = "NONE" and issue = 187069161 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- MultiIndex serialization to NetCDF · 8 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1268380486 | https://github.com/pydata/xarray/issues/1077#issuecomment-1268380486 | https://api.github.com/repos/pydata/xarray/issues/1077 | IC_kwDOAMm_X85LmfNG | lucianopaz 5230109 | 2022-10-05T12:38:05Z | 2022-10-05T12:38:05Z | NONE | Hi everyone, first of all, thanks for your amazing work! I came across this issue today because I have a dataset with multiple variables and multiple multi index dimensions, some of which aren't used in some variable. I had to slightly adapt the workaround posted by @dcherian to get things to work. I'll post it here if someone else finds the patch useful. I'm not sure if it would be a viable fix for the issue though, let me know if it is and I'll open a PR. ```python def encode_multiindex(ds, idxname): encoded = ds.reset_index(idxname) coords = dict(zip(ds.indexes[idxname].names, ds.indexes[idxname].levels)) for coord in coords: encoded[coord] = coords[coord].values shape = [encoded.sizes[coord] for coord in coords] encoded[idxname] = np.ravel_multi_index(ds.indexes[idxname].codes, shape) encoded[idxname].attrs["compress"] = " ".join(ds.indexes[idxname].names) return encoded def decode_to_multiindex(encoded, idxname): names = encoded[idxname].attrs["compress"].split(" ") shape = [encoded.sizes[dim] for dim in names] indices = np.unravel_index(encoded[idxname].values, shape) arrays = np.array([encoded[dim].values[index] for dim, index in zip(names, indices)]) mindex = pd.MultiIndex.from_arrays(arrays, names=names)
``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 | |
1101089866 | https://github.com/pydata/xarray/issues/1077#issuecomment-1101089866 | https://api.github.com/repos/pydata/xarray/issues/1077 | IC_kwDOAMm_X85BoUxK | stale[bot] 26384082 | 2022-04-18T04:43:45Z | 2022-04-18T04:43:45Z | NONE | In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 | |
285972485 | https://github.com/pydata/xarray/issues/1077#issuecomment-285972485 | https://api.github.com/repos/pydata/xarray/issues/1077 | MDEyOklzc3VlQ29tbWVudDI4NTk3MjQ4NQ== | mullenkamp 2656596 | 2017-03-12T20:12:42Z | 2017-03-12T20:12:42Z | NONE | I would love to have this functionality as well. Unfortunately, I'm not knowledgeable enough to help decide on the internal structure for multiindeces though. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 | |
261106183 | https://github.com/pydata/xarray/issues/1077#issuecomment-261106183 | https://api.github.com/repos/pydata/xarray/issues/1077 | MDEyOklzc3VlQ29tbWVudDI2MTEwNjE4Mw== | tippetts 17055041 | 2016-11-16T23:27:05Z | 2016-11-16T23:27:05Z | NONE | Yes, I suppose it doesn't really need to live in core xarray, unless you did want to allow a @benbovy , do you plan to put your |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 | |
260167167 | https://github.com/pydata/xarray/issues/1077#issuecomment-260167167 | https://api.github.com/repos/pydata/xarray/issues/1077 | MDEyOklzc3VlQ29tbWVudDI2MDE2NzE2Nw== | tippetts 17055041 | 2016-11-13T04:56:43Z | 2016-11-13T04:56:43Z | NONE | Would it be too simplistic to think that I think that is similar to what you've done, @benbovy , but with inheritance rather than composition. I understand that is an often disfavored design pattern, but it would it make sense in this case and keep the overall xarray interface simple? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 | |
260148070 | https://github.com/pydata/xarray/issues/1077#issuecomment-260148070 | https://api.github.com/repos/pydata/xarray/issues/1077 | MDEyOklzc3VlQ29tbWVudDI2MDE0ODA3MA== | tippetts 17055041 | 2016-11-12T21:00:31Z | 2016-11-12T21:00:31Z | NONE | Here's a new, related question: @shoyer , do you have any interest in adding a class to xarray that contains a hierarchical tree of Datasets, analogous to the groups in a netCDF or HDF5 file? Then opening or saving such an object would be an easy but powerful one-liner. Or is that something you would rather leave to someone else's module? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 | |
258560862 | https://github.com/pydata/xarray/issues/1077#issuecomment-258560862 | https://api.github.com/repos/pydata/xarray/issues/1077 | MDEyOklzc3VlQ29tbWVudDI1ODU2MDg2Mg== | tippetts 17055041 | 2016-11-04T22:14:05Z | 2016-11-04T22:14:05Z | NONE | So if I'm properly understanding and synthesizing your ( @benbovy and @shoyer ) comments: We want the hybrid format for maximum compatibility, with the MultiIndex split into separate 1D raw value coordinates. Using the example above, these would be @shoyer , I see what you mean about @benbovy , what does the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 | |
258351232 | https://github.com/pydata/xarray/issues/1077#issuecomment-258351232 | https://api.github.com/repos/pydata/xarray/issues/1077 | MDEyOklzc3VlQ29tbWVudDI1ODM1MTIzMg== | tippetts 17055041 | 2016-11-04T05:59:37Z | 2016-11-04T05:59:37Z | NONE | Personally I'd vote for the category encoded values. If I make files with a newer xarray, I'll be reading them later with the same (or newer) xarray and I'd definitely want the exact MultiIndex back. I don't want to be too self-centered in my perspective in all of this. But my applications are definitely in the large-scale scientific computing area that seems to be the community norm for xarray, so I would guess many others would have a similar situation. I generate data that are associated with nodes or elements in a mesh. The mesh is naturally split into named regions. Sometimes I need to operate on the entire dataset (including all regions) and sometimes I want to select one or more regions. So I make a MultiIndex where the first index is the region name strings, and the second index is the node (or element) number inside the region (i.e. starts over counting from 1 for each region). So the full index is 1e5 to 1e7 long, of which there are only maybe a few hundred unique values in the string column. I would think that would greatly benefit from the category-encoded storage. And fast and reliable reconstruction of the MultiIndex is a big plus. Does this seem like a common user scenario? The one thing I'm wondering is, what happens in an application like this if you select on one index (say, all data rows with region_name='FOOBAR-1') from the HDF5 file before doing anything else? Would it hard to make the MultiIndex/NetCDF reader smart enough not to reconstruct the whole MultiIndex before picking out the relevant rows? And, related question for us to think about, how would we make this all play nicely with dask? Sorry for the long post. I've been very impressed and happy working with xarray, and I'm just eager to get the last bit of features I need so I can really start pushing my colleagues into using it. :) Nuts and bolts questions: So each of index.levels would be easy to store as its own little DataArray, yeah? Then would each of the index.labels be in its own DataArray, or would you want them all in the same 2D DataArray? And then would the actual data in the original DataArray just have a generic integer index as a placeholder, to be replaced by the MultiIndex? For these dummy DataArrays and the multiindex_levels metadata attr, how do you feel about using a single leading underscore in the name? If I were to low-level grunge around in the file for some reason, that would indicate to me that they are private-by-convention implementation details. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
MultiIndex serialization to NetCDF 187069161 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4