issues: 241290234
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
241290234 | MDU6SXNzdWUyNDEyOTAyMzQ= | 1471 | sharing dimensions across dataarrays in a dataset | 2927161 | closed | 1 | 8 | 2017-07-07T14:58:18Z | 2023-09-12T15:51:24Z | 2023-09-12T15:51:24Z | NONE | I have two questions regarding proper implementation of an xarray dataset when defining dimensions. First, I am wondering whether I can share the same dimension across multiple arrays in a dataset without storing NaN values for coordinates not present in each respective array. As a simple example, I am interested in creating two data arrays that involve the shared dimensions x and y; however, in the first data array, I only care about x-coordinates from (0->5) whereas in the second data array I only care about x-coordinates from (10-> 12)
``` merged = xr.merge([x1,x2]) merged['table1'] <xarray.DataArray 'table1' (x: 9, y: 5)> array([[ 0.553098, -1.157813, nan, nan, nan], [-0.259999, -0.476526, nan, nan, nan], [ 1.650893, -0.364517, nan, nan, nan], [ 0.16149 , -0.037587, nan, nan, nan], [ 0.799689, -0.128728, nan, nan, nan], [-0.613603, -1.410235, nan, nan, nan], [ nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan]]) Coordinates: * y (y) int64 0 1 8 9 10 * x (x) int64 0 1 2 3 4 5 10 11 12 ``` In my second question, I want to add an extra layer of complexity to this and add a third variable that uses multi-indexing. Again naively, I would have wanted the multi-index in the third table to share dimensions (x and y) from the previous data variables ``` I would have preferred to do thisindex=pd.MultiIndex.from_tuples([(0, 0, 1), (1, 1, 1), (2, 2, 1)], names=('x', 'y', 'z'))
vals3 = np.random.normal(size=(3,3))
x3 = xr.Dataset(
{'table3-multiindex': (['multi-index', 'cols'], vals3)}, Except, merging with previous dataset raises an error due to name conflictsxr.merge([x1, x2, x3]) ValueError: conflicting MultiIndex level name(s): 'y' (multi-index), (y) 'x' (multi-index), (x) ``` Currently my solution is just to rename each of the dimensions in each respective data array so that they do not overlap. While this is not ideal, I can probably get away with this, but since I would prefer the ability to share dimensions without adding in NaN values, is there another way to achieve this? (Im also assuming that I can still do joins later on using values within different dimension names.) ``` current solution, merge data arrays but have each dimension be uniquevals1 = np.random.normal(size=(6,2)) vals2 = np.random.normal(size=(3,3)) x1 = xr.Dataset( {'table1': (['x1', 'y1'], vals1)}, coords={ 'x1': np.arange(6), 'y1': np.arange(2) } ) x2 = xr.Dataset( {'table2': (['x2', 'y2'], vals2)}, coords={ 'x2': np.arange(10, 10+3), 'y2': np.arange(8, 8+3) } ) index=pd.MultiIndex.from_tuples([(0, 0, 1), (1, 1, 1), (2, 2, 1)], names=('x3', 'y3', 'z3'))
vals3 = np.random.normal(size=(3,3))
x3 = xr.Dataset(
{'table3-multiindex': (['multi-index', 'cols'], vals3)}, xr.merge([x1, x2, x3]) ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1471/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |