issues: 241290234

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
241290234	MDU6SXNzdWUyNDEyOTAyMzQ=	1471	sharing dimensions across dataarrays in a dataset	2927161	closed	1			8	2017-07-07T14:58:18Z	2023-09-12T15:51:24Z	2023-09-12T15:51:24Z	NONE				I have two questions regarding proper implementation of an xarray dataset when defining dimensions. First, I am wondering whether I can share the same dimension across multiple arrays in a dataset without storing NaN values for coordinates not present in each respective array. As a simple example, I am interested in creating two data arrays that involve the shared dimensions x and y; however, in the first data array, I only care about x-coordinates from (0->5) whereas in the second data array I only care about x-coordinates from (10-> 12) `vals1 = np.random.normal(size=(6,2)) vals2 = np.random.normal(size=(3,3)) x1 = xr.Dataset( {'table1': (['x', 'y'], vals1)}, coords={ 'x': np.arange(6), 'y': np.arange(2) } ) x2 = xr.Dataset( {'table2': (['x', 'y'], vals2)}, coords={ 'x': np.arange(10, 10+3), 'y': np.arange(8, 8+3) } )` If I naively merge the two datasets, then the dimensions and coordinates get merged correctly but not each of the data variables within the dataset are much larger than they need to be (store unnecessary nan values) ``` merged = xr.merge([x1,x2]) merged['table1'] <xarray.DataArray 'table1' (x: 9, y: 5)> array([[ 0.553098, -1.157813, nan, nan, nan], [-0.259999, -0.476526, nan, nan, nan], [ 1.650893, -0.364517, nan, nan, nan], [ 0.16149 , -0.037587, nan, nan, nan], [ 0.799689, -0.128728, nan, nan, nan], [-0.613603, -1.410235, nan, nan, nan], [ nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan]]) Coordinates: * y (y) int64 0 1 8 9 10 * x (x) int64 0 1 2 3 4 5 10 11 12 ``` In my second question, I want to add an extra layer of complexity to this and add a third variable that uses multi-indexing. Again naively, I would have wanted the multi-index in the third table to share dimensions (x and y) from the previous data variables ``` I would have preferred to do this index=pd.MultiIndex.from_tuples([(0, 0, 1), (1, 1, 1), (2, 2, 1)], names=('x', 'y', 'z')) vals3 = np.random.normal(size=(3,3)) x3 = xr.Dataset( {'table3-multiindex': (['multi-index', 'cols'], vals3)}, coords={'multi-index': index} ) Except, merging with previous dataset raises an error due to name conflicts xr.merge([x1, x2, x3]) ValueError: conflicting MultiIndex level name(s): 'y' (multi-index), (y) 'x' (multi-index), (x) ``` Currently my solution is just to rename each of the dimensions in each respective data array so that they do not overlap. While this is not ideal, I can probably get away with this, but since I would prefer the ability to share dimensions without adding in NaN values, is there another way to achieve this? (Im also assuming that I can still do joins later on using values within different dimension names.) ``` current solution, merge data arrays but have each dimension be unique vals1 = np.random.normal(size=(6,2)) vals2 = np.random.normal(size=(3,3)) x1 = xr.Dataset( {'table1': (['x1', 'y1'], vals1)}, coords={ 'x1': np.arange(6), 'y1': np.arange(2) } ) x2 = xr.Dataset( {'table2': (['x2', 'y2'], vals2)}, coords={ 'x2': np.arange(10, 10+3), 'y2': np.arange(8, 8+3) } ) index=pd.MultiIndex.from_tuples([(0, 0, 1), (1, 1, 1), (2, 2, 1)], names=('x3', 'y3', 'z3')) vals3 = np.random.normal(size=(3,3)) x3 = xr.Dataset( {'table3-multiindex': (['multi-index', 'cols'], vals3)}, coords={'multi-index': index} ) xr.merge([x1, x2, x3]) ```	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1471/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

1 row from issues_id in issues_labels
7 rows from issue in issue_comments