home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 579203497

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/3545#issuecomment-579203497 https://api.github.com/repos/pydata/xarray/issues/3545 579203497 MDEyOklzc3VlQ29tbWVudDU3OTIwMzQ5Nw== 5821660 2020-01-28T11:33:41Z 2020-01-28T11:52:02Z MEMBER

@scottcha @shoyer below is a minimal example where one variable is missing in each file.

```python import random random.seed(123) random.randint(0, 10)

create var names list with one missing value

orig = [f'd{i:02}' for i in range(10)] datasets = [] for i in range(1, 9): l1 = orig.copy() l1.remove(f'd{i:02}') datasets.append(l1)

create files

for i, dsl in enumerate(datasets): foo_data = np.arange(24).reshape(2, 3, 4) with nc.Dataset(f'test{i:02}.nc', 'w') as ds: ds.createDimension('x', size=2) ds.createDimension('y', size=3) ds.createDimension('z', size=4) for k in dsl: ds.createVariable(k, int, ('x', 'y', 'z')) ds.variables[k][:] = foo_data

flist = glob.glob('test*.nc') dslist = [] for f in flist: dslist.append(xr.open_dataset(f))

ds2 = xr.concat(dslist, dim='time') ds2 ``` Output:

<xarray.Dataset> Dimensions: (time: 8, x: 2, y: 3, z: 4) Dimensions without coordinates: time, x, y, z Data variables: d01 (x, y, z) int64 0 1 2 3 4 5 6 7 8 9 ... 15 16 17 18 19 20 21 22 23 d00 (time, x, y, z) int64 0 1 2 3 4 5 6 7 8 ... 16 17 18 19 20 21 22 23 d02 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d03 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d04 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d05 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d06 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d07 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... 20.0 21.0 22.0 23.0 d08 (time, x, y, z) float64 0.0 1.0 2.0 3.0 4.0 ... nan nan nan nan nan d09 (time, x, y, z) int64 0 1 2 3 4 5 6 7 8 ... 16 17 18 19 20 21 22 23

Three cases here:

  • d00 and d09 are available in all datasets, and they are concatenated correctly (keeping dtype)
  • d02 to d08 are missing in one dataset and are filled with the created dummy variable, but the dtype is converted to float64
  • d01 is not handled properly, because it is missing in the first dataset, this is due to checking only variables of first dataset in _calc_concat_over

python elif opt == "all": concat_over.update( set(getattr(datasets[0], subset)) - set(datasets[0].dims) ) and from putting d01 in result_vars before iterating to find missing variables.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  524043729
Powered by Datasette · Queries took 0.54ms · About: xarray-datasette