issue_comments: 528920519

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/issues/1378#issuecomment-528920519	https://api.github.com/repos/pydata/xarray/issues/1378	528920519	MDEyOklzc3VlQ29tbWVudDUyODkyMDUxOQ==	47244312	2019-09-06T16:22:12Z	2019-09-06T16:22:12Z	CONTRIBUTOR	I'm not too fond of having multiple dimensions with the same name because, whenever you need to operate on one but not the other, you have little to no choice but revert to positional indexing. Consider also how many methods expect either *kwargs or a dict-like parameter with the dimension or variable names as the keys. I would not be surprised to find that many API design choices fall apart in the face of this use case. Also, having two non positional* (as it should always be in xarray!) dimensions with the same name only makes sense when modelling symmetric N:N relationships. Two good examples are covariance matrices and the weights for a Dijkstra algorithm. The problems start when the object represents an asymmetric relationship, e.g: - Cost (for the purpose of graph resolution, so time/money/other) of transportation via river, where going from A->B (downstream) is cheaper than going back from B->A (upstream) - Currency conversion, where `EUR->USD` is not identical to `1/(USD->EUR)` because of arbitrage and illiquidity - In financial Monte Carlo simulations, I had to deal with credit rating transition matrices which define the probability of a company to change its credit rating. In unfavourable market conditions, the chances of being downgraded from AAA to AA are higher than being promoted from AA to AAA. I could easily come up with many other cases. In case of asymmetric N:N relationships, it is highly desirable to share the same index across multiple dimensions with different names (that would typically convey the direction of the relationship, e.g. "from" and "to"). What if, instead of allowing for duplicate dimensions, we allowed sharing an index across different dimensions? Something like `python river_transport = Dataset( coords={ 'station': ['Kingston', 'Montreal'], 'station_from': ('station', ) 'station_to': ('station', ) }, data_vars={ cost=(('station_from', 'station_to'), [[0, 20], [15, 0]]), } }` or, for DataArrays: `python river_transport = DataArray( [[0, 20], [15, 0]], dims=('station_from', 'station_to'), coords={ 'station': ['Kingston', 'Montreal'], 'station_from': ('station', ) 'station_to': ('station', ) }, }` Note how this syntax doesn't exist as of today: `python 'station_from': ('station', ) 'station_to': ('station', )` From an implementation point of view, I think it could be easily implemented by keeping track of a map of aliases and with some `__geitem__` magic. More effort would be needed to convince DataArrays to accept (and not accidentally drop) a coordinate whose dims don't match any of the data variable's. This design would not resolve the issue of compatibility with NetCDF though. I'd be surprised if the NetCDF designers never came across this - maybe it's a good idea to have a chat with them?	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		222676855