home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where author_association = "MEMBER", issue = 138332032 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • shoyer · 4 ✖

issue 1

  • Array size changes following loading of numpy array · 4 ✖

author_association 1

  • MEMBER · 4 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
193576856 https://github.com/pydata/xarray/issues/783#issuecomment-193576856 https://api.github.com/repos/pydata/xarray/issues/783 MDEyOklzc3VlQ29tbWVudDE5MzU3Njg1Ng== shoyer 1217238 2016-03-08T02:56:28Z 2016-03-08T02:56:49Z MEMBER

As expected, the following all dask.array solution triggers this:

``` python dates = pd.date_range('2001-01-01', freq='D', periods=1000) sizes = pd.Series(dates, dates).resample('1M', how='count').values chunks = (tuple(sizes), (100,)) x = da.ones((3630, 100), chunks=chunks) assert x[240:270].shape == x[240:270].compute().shape

AssertionError

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array size changes following loading of numpy array 138332032
193527245 https://github.com/pydata/xarray/issues/783#issuecomment-193527245 https://api.github.com/repos/pydata/xarray/issues/783 MDEyOklzc3VlQ29tbWVudDE5MzUyNzI0NQ== shoyer 1217238 2016-03-08T00:36:14Z 2016-03-08T00:36:14Z MEMBER

Something like this might work to generate pathological chunks for dask.array:

dates = pandas.date_range('2000-01-01', freq='D', periods=1000) sizes = pandas.Series(dates, dates).resample('1M', how='count').values chunks = (tuple(sizes), (100,))

(I don't have xarray or dask installed on my work computer, but I could check this later)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array size changes following loading of numpy array 138332032
193521326 https://github.com/pydata/xarray/issues/783#issuecomment-193521326 https://api.github.com/repos/pydata/xarray/issues/783 MDEyOklzc3VlQ29tbWVudDE5MzUyMTMyNg== shoyer 1217238 2016-03-08T00:17:00Z 2016-03-08T00:17:00Z MEMBER

If you don't specify a chunksize, xarray should use each file as a full "chunk". So it would probably be useful to know the shapes of each array you are loading with open_mfdataset. My guess is that this issue only arises when indexing arrays consisting of differently sized chunks, which is exactly why using .chunk to set a fixed chunk size resolves this issue.

To be clear, all the logic implementing the chunking and indexing code for xarray objects containing dask arrays lives inside dask.array itself, not in our xarray wrapper (which is pretty thin). This doesn't make this any less of an issue for you, but I'm pretty sure (and I think @mrocklin agrees) that the bug here in probably in the dask.array layer.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array size changes following loading of numpy array 138332032
192048274 https://github.com/pydata/xarray/issues/783#issuecomment-192048274 https://api.github.com/repos/pydata/xarray/issues/783 MDEyOklzc3VlQ29tbWVudDE5MjA0ODI3NA== shoyer 1217238 2016-03-04T01:28:23Z 2016-03-04T01:28:23Z MEMBER

This does look very strange. I'm guessing it's a dask.array bug (cc @mrocklin).

Can you make a reproducible example? If so, we'll probably be able to figure this out. How do you make this data?

Tracking this sort of thing down is a good motivation for an eager-evaluation mode in dask.array... (https://github.com/dask/dask/issues/292)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array size changes following loading of numpy array 138332032

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 225.036ms · About: xarray-datasette