issue_comments: 460694818
This data as json
html_url | issue_url | id | node_id | user | created_at | updated_at | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
https://github.com/pydata/xarray/issues/2745#issuecomment-460694818 | https://api.github.com/repos/pydata/xarray/issues/2745 | 460694818 | MDEyOklzc3VlQ29tbWVudDQ2MDY5NDgxOA== | 1217238 | 2019-02-05T16:03:40Z | 2019-02-05T16:03:40Z | MEMBER | To understand what's going on here, it may be helpful to look at what's going on inside dask: ``` In [16]: x = np.arange(5) In [17]: da = xr.DataArray(np.ones(5), coords=[('x', x)]).chunk(-1) In [18]: da Out[18]: <xarray.DataArray (x: 5)> dask.array<shape=(5,), dtype=float64, chunksize=(5,)> Coordinates: * x (x) int64 0 1 2 3 4 In [19]: da.reindex({'x': np.arange(20)}) Out[19]: <xarray.DataArray (x: 20)> dask.array<shape=(20,), dtype=float64, chunksize=(20,)> Coordinates: * x (x) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 In [20]: da.reindex({'x': np.arange(20)}).data.dask Out[20]: <dask.sharedict.ShareDict at 0x3201d72e8> In [21]: dict(da.reindex({'x': np.arange(20)}).data.dask) Out[21]: {('where-8e0018fae0773d202c09fde132189347', 0): (subgraph_callable, ('eq-8167293bb8136be2934a8bf111095d8f', 0), array(nan), ('getitem-0eab360ba0dee5a5c3fbded0fdfd70e3', 0)), ('eq-8167293bb8136be2934a8bf111095d8f', 0): (subgraph_callable, ('array-5ddc8bae2e6cf87c0bac846c6da4d27f', 0), -1), ('array-5ddc8bae2e6cf87c0bac846c6da4d27f', 0): array([ 0, 1, 2, 3, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]), ('xarray-<this-array>-765734894ab8f05a57335ea1064da549', 0): (<function dask.array.core.getter(a, b, asarray=True, lock=None)>, 'xarray-<this-array>-765734894ab8f05a57335ea1064da549', (slice(0, 5, None),)), 'xarray-<this-array>-765734894ab8f05a57335ea1064da549': ImplicitToExplicitIndexingAdapter(array=NumpyIndexingAdapter(array=array([1., 1., 1., 1., 1.]))), ('getitem-0eab360ba0dee5a5c3fbded0fdfd70e3', 0): (<function _operator.getitem(a, b, /)>, ('xarray-<this-array>-765734894ab8f05a57335ea1064da549', 0), (array([0, 1, 2, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4]),))} ``` Xarary isn't controlling chunk sizes directly, but it's turns The alternative design would be to append an array of all NaNs along one axis, but on average I think the current implementation is faster and results in more contiguous chunks -- it's quite common to intersperse missing indices with reindex() and alternating indexed/missing values can result in tiny chunks. Even then I think you would probably run into performance issues -- I don't think We could also conceivably put some heuristics to control chunking for this in xarray, but I'd rather do it upstream in dask.array, if possible (xarray tries to avoid thinking about chunks). |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
406812274 |