issue_comments: 286176727

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1077#issuecomment-286176727	https://api.github.com/repos/pydata/xarray/issues/1077	286176727	MDEyOklzc3VlQ29tbWVudDI4NjE3NjcyNw==	1217238	2017-03-13T17:14:37Z	2017-03-13T17:14:37Z	MEMBER	Let's recap the options, which I'll illustrate for the second level of my MultiIndex from above (https://github.com/pydata/xarray/issues/1077#issuecomment-258323743): "categories and codes": e.g., `['a', 'b']` and `[0, 1, 0, 1, 0, 1]`. Highest speed, low memory requirements, faithful round-trip to xarray/pandas, less obvious representation. "categories and values": e.g., `['a', 'b']` and `['a', 'b', 'a', 'b', 'a', 'b']`. Moderate speed (need recreate codes), high memory requirements, faithful round-trip to xarray/pandas, more obvious representation (categories can be safely ignored). "raw values": e.g., `['a', 'b', 'a', 'b', 'a', 'b']`. Moderate speed (only slightly slower than 2), high memory requirements (slightly better than 2), does not support completely faithful roundtrip, most obvious representation. "category codes and values": e.g., `[0, 1]` and `['a', 'b', 'a', 'b', 'a', 'b']`. Moderate speed, high memory requirements, also does not support faithful roundtrip (it's possible for some levels to not be represented in the `MultiIndex` values), more obvious representation (like 2). 3 uses only slightly less memory than 2 and can be easily achieved with `reset_index()`, so I don't see a reason to support it for writing (read support would be fine). 4 looks like a faithful roundtrip, but actually isn't in some rare edge cases. That seems like a recipe for disaster, so it should be OK. This leaves 1 and 2. Both are reasonably performant and roundtrip xarray objects with complete fidelity, so I would be happy with either them. In principle we could even support both, with an argument to switch between the modes (one would need to be the default). My inclination is start with only supporting 1, because it has a potentially large advantage from a speed/memory perspective, and it's easy to achieve the "raw values" representation with `.reset_index()` (and convert back with `.set_index()`). If we do this, the documentation for writing netCDF files should definitely include a suggestion to consider using `.reset_index()` when distributing files not intended strictly for use by xarray users.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		187069161