home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 241290234

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
241290234 MDU6SXNzdWUyNDEyOTAyMzQ= 1471 sharing dimensions across dataarrays in a dataset 2927161 closed 1     8 2017-07-07T14:58:18Z 2023-09-12T15:51:24Z 2023-09-12T15:51:24Z NONE      

I have two questions regarding proper implementation of an xarray dataset when defining dimensions. First, I am wondering whether I can share the same dimension across multiple arrays in a dataset without storing NaN values for coordinates not present in each respective array.

As a simple example, I am interested in creating two data arrays that involve the shared dimensions x and y; however, in the first data array, I only care about x-coordinates from (0->5) whereas in the second data array I only care about x-coordinates from (10-> 12)

vals1 = np.random.normal(size=(6,2)) vals2 = np.random.normal(size=(3,3)) x1 = xr.Dataset( {'table1': (['x', 'y'], vals1)}, coords={ 'x': np.arange(6), 'y': np.arange(2) } ) x2 = xr.Dataset( {'table2': (['x', 'y'], vals2)}, coords={ 'x': np.arange(10, 10+3), 'y': np.arange(8, 8+3) } ) If I naively merge the two datasets, then the dimensions and coordinates get merged correctly but not each of the data variables within the dataset are much larger than they need to be (store unnecessary nan values)

``` merged = xr.merge([x1,x2]) merged['table1']

<xarray.DataArray 'table1' (x: 9, y: 5)> array([[ 0.553098, -1.157813, nan, nan, nan], [-0.259999, -0.476526, nan, nan, nan], [ 1.650893, -0.364517, nan, nan, nan], [ 0.16149 , -0.037587, nan, nan, nan], [ 0.799689, -0.128728, nan, nan, nan], [-0.613603, -1.410235, nan, nan, nan], [ nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan], [ nan, nan, nan, nan, nan]]) Coordinates: * y (y) int64 0 1 8 9 10 * x (x) int64 0 1 2 3 4 5 10 11 12 ```

In my second question, I want to add an extra layer of complexity to this and add a third variable that uses multi-indexing. Again naively, I would have wanted the multi-index in the third table to share dimensions (x and y) from the previous data variables

```

I would have preferred to do this

index=pd.MultiIndex.from_tuples([(0, 0, 1), (1, 1, 1), (2, 2, 1)], names=('x', 'y', 'z')) vals3 = np.random.normal(size=(3,3)) x3 = xr.Dataset( {'table3-multiindex': (['multi-index', 'cols'], vals3)},
coords={'multi-index': index} )

Except, merging with previous dataset raises an error due to name conflicts

xr.merge([x1, x2, x3])

ValueError: conflicting MultiIndex level name(s): 'y' (multi-index), (y) 'x' (multi-index), (x) ```

Currently my solution is just to rename each of the dimensions in each respective data array so that they do not overlap. While this is not ideal, I can probably get away with this, but since I would prefer the ability to share dimensions without adding in NaN values, is there another way to achieve this? (Im also assuming that I can still do joins later on using values within different dimension names.)

```

current solution, merge data arrays but have each dimension be unique

vals1 = np.random.normal(size=(6,2)) vals2 = np.random.normal(size=(3,3)) x1 = xr.Dataset( {'table1': (['x1', 'y1'], vals1)}, coords={ 'x1': np.arange(6), 'y1': np.arange(2) } ) x2 = xr.Dataset( {'table2': (['x2', 'y2'], vals2)}, coords={ 'x2': np.arange(10, 10+3), 'y2': np.arange(8, 8+3) } )

index=pd.MultiIndex.from_tuples([(0, 0, 1), (1, 1, 1), (2, 2, 1)], names=('x3', 'y3', 'z3')) vals3 = np.random.normal(size=(3,3)) x3 = xr.Dataset( {'table3-multiindex': (['multi-index', 'cols'], vals3)},
coords={'multi-index': index} )

xr.merge([x1, x2, x3]) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1471/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed 13221727 issue

Links from other tables

  • 1 row from issues_id in issues_labels
  • 7 rows from issue in issue_comments
Powered by Datasette · Queries took 0.913ms · About: xarray-datasette