home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "NONE" and user = 6582745 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 2
  • `to_zarr()` dramatically alters dask graph 2

user 1

  • JSKenyon · 4 ✖

author_association 1

  • NONE · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1102354051 https://github.com/pydata/xarray/issues/5115#issuecomment-1102354051 https://api.github.com/repos/pydata/xarray/issues/5115 IC_kwDOAMm_X85BtJaD JSKenyon 6582745 2022-04-19T09:09:57Z 2022-04-19T09:09:57Z NONE

No problem!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr()` dramatically alters dask graph 851391441
814132352 https://github.com/pydata/xarray/issues/5115#issuecomment-814132352 https://api.github.com/repos/pydata/xarray/issues/5115 MDEyOklzc3VlQ29tbWVudDgxNDEzMjM1Mg== JSKenyon 6582745 2021-04-06T13:45:55Z 2021-04-06T13:45:55Z NONE

Having done some digging, this seems to be a result of using the dask store function. It does some graph optimisation and ultimately leads to the changes to the dask graph. I am not sure if it is worth moving this discussion to dask - it seems to me that having graph optimizations happen outside a compute call is a little dangerous.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `to_zarr()` dramatically alters dask graph 851391441
693552440 https://github.com/pydata/xarray/issues/4428#issuecomment-693552440 https://api.github.com/repos/pydata/xarray/issues/4428 MDEyOklzc3VlQ29tbWVudDY5MzU1MjQ0MA== JSKenyon 6582745 2020-09-16T17:31:54Z 2020-09-16T17:31:54Z NONE

Thanks! I will definitely give that a go when I am back at my work PC. My personal take is that this level of automated rechunking is dangerous. I have constructed the chunking in my code with great care and for a reason. Having it changed "invisibly" by operations which didn't have this behaviour previously seems problematic to me.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 702646191
693385409 https://github.com/pydata/xarray/issues/4428#issuecomment-693385409 https://api.github.com/repos/pydata/xarray/issues/4428 MDEyOklzc3VlQ29tbWVudDY5MzM4NTQwOQ== JSKenyon 6582745 2020-09-16T12:54:39Z 2020-09-16T12:54:39Z NONE

Finally managed to reproduce. Here it is: ```python import xarray import dask.array as da import numpy as np

if name == "main":

data = da.random.random([10000, 16, 4], chunks=(10000, 16, 4))

dtype = np.float32

xds = xarray.Dataset(
    data_vars={"DATA1": (("x", "y", "z"), data.astype(dtype))})

upsample_factor = 1024//xds.dims["y"]

# Create a selection which will upsample the y axis.
selection = np.repeat(np.arange(xds.dims["y"]), upsample_factor)

print("xarray.Dataset prior to resampling:\n", xds)

xds = xds.sel({"y": selection})

print("xarray.Dataset post resampling:\n", xds)

```

With dask==2.25.0 this gives: xarray.Dataset prior to resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 16, 4), meta=np.ndarray> xarray.Dataset post resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 1024, 4), meta=np.ndarray>

With dask==2.26.0 this gives: xarray.Dataset prior to resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 16, 4), meta=np.ndarray> xarray.Dataset post resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) float32 dask.array<chunksize=(10000, 512, 4), meta=np.ndarray>

And finally, the most distressing part - changing the dtype changes the chunking! With dtype = np.complex64, dask==2.26.0 gives: xarray.Dataset prior to resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 16, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) complex64 dask.array<chunksize=(10000, 16, 4), meta=np.ndarray> xarray.Dataset post resampling: <xarray.Dataset> Dimensions: (x: 10000, y: 1024, z: 4) Dimensions without coordinates: x, y, z Data variables: DATA1 (x, y, z) complex64 dask.array<chunksize=(10000, 342, 4), meta=np.ndarray>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 702646191

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 16.471ms · About: xarray-datasette