home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 693385409

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/4428#issuecomment-693385409 https://api.github.com/repos/pydata/xarray/issues/4428 693385409 MDEyOklzc3VlQ29tbWVudDY5MzM4NTQwOQ== 6582745 2020-09-16T12:54:39Z 2020-09-16T12:54:39Z NONE

Finally managed to reproduce. Here it is: ```python import xarray import dask.array as da import numpy as np

if name == "main":

data = da.random.random([10000, 16, 4], chunks=(10000, 16, 4))

dtype = np.float32

xds = xarray.Dataset(
    data_vars={"DATA1": (("x", "y", "z"), data.astype(dtype))})

upsample_factor = 1024//xds.dims["y"]

# Create a selection which will upsample the y axis.
selection = np.repeat(np.arange(xds.dims["y"]), upsample_factor)

print("xarray.Dataset prior to resampling:\n", xds)

xds = xds.sel({"y": selection})

print("xarray.Dataset post resampling:\n", xds)

```

With dask==2.25.0 this gives: xarray.Dataset prior to resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 16, 4), meta=np.ndarray> xarray.Dataset post resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 1024, 4), meta=np.ndarray>

With dask==2.26.0 this gives: xarray.Dataset prior to resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 16, 4), meta=np.ndarray> xarray.Dataset post resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 512, 4), meta=np.ndarray>

And finally, the most distressing part - changing the dtype changes the chunking! With dtype = np.complex64, dask==2.26.0 gives: xarray.Dataset prior to resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) complex64 dask.array<chunksize=(10000, 16, 4), meta=np.ndarray> xarray.Dataset post resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) complex64 dask.array<chunksize=(10000, 342, 4), meta=np.ndarray>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  702646191
Powered by Datasette · Queries took 0.611ms · About: xarray-datasette