issue_comments: 632924467

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/4047#issuecomment-632924467	https://api.github.com/repos/pydata/xarray/issues/4047	632924467	MDEyOklzc3VlQ29tbWVudDYzMjkyNDQ2Nw==	1197350	2020-05-22T21:58:19Z	2020-05-22T21:59:09Z	MEMBER	Thanks for the useful questions @DennisHeimbigner Suppose I am given an array X with shape(10,20,30) and an _ARRAY_DIMENSION attribute on X with the contents _ARRAY_DIMENSION=["time", "lon", "lat"] Then this is equivalent to the following partial netcdf CDL: netcdf ... { dims: time=10; lon=20; lat=30; ...} Correct? Yes, correct I assume that if there are conflicts where two variables end up assigning different sIzes to the same named dimension, then that generates an error. Yes, correct as well. Understanding how this works requires me to describe some xarray internals. When decoding a Dataset, each array is decoded as an xarray.Variable. According to those docs "a single Variable object is not fully described outside the context of its parent Dataset". The Zarr decoding process returns a Variable, which is basically a tuple of `dims, data, attributes, encoding`, where `dims` is the list we got from `_ARRAY_DIMENSIONS`. Once the variables have all been decoded, then we put them together into a Dataset object. At that point, if there are inconsistent shapes across the different variables, an error will be raised. So far we haven't encountered this situation, because all the Zarr data we read tends to have been also written by Xarray, so it is consistent. But you could definitely manually hack a Zarr store to break this consistency, rendering it un-decodable by Xarray. Finally it is unclear where xarray puts these dimensions. In the closest enclosIng Group? or in the root group? I hoped this was clear in the documentation I wrote which is now live here: http://xarray.pydata.org/en/latest/internals.html#zarr-encoding-specification. What I said was To accomplish this, Xarray developers decided to define a special Zarr array attribute: `_ARRAY_DIMENSIONS`. The value of this attribute is a list of dimension names (strings), for example `["time", "lon", "lat"]`. When writing data to Zarr, Xarray sets this attribute on all variables based on the variable dimensions. When reading a Zarr group, Xarray looks for this attribute on all arrays, raising an error if it can’t be found. The attribute is used to define the variable dimension names and then removed from the attributes dictionary returned to the user. An "array attribute" has a specific meaning in Zarr: it is the user metadata associated with an individual array. So the `_ARRAY_DIMENSIONS` attribute lives in the `.zattrs` file of each Zarr array. It is not a group-level attribute. As you pointed out on the last call, there are clearly some downsides to having chosen to store this important property with the rest of user metadata (.zattrs). However, it allowed us to move forward without any changes to the zarr spec.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		614814400