home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where user = 1530840 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 4

  • DataArray to_dict() without converting with numpy tolist() 2
  • zarr and xarray chunking compatibility and `to_zarr` performance 2
  • fix empty dataset from_dict 1
  • `AttributeError: 'DataArray' object has no attribute 'ravel'` when using `np.intersect1d(..., assume_unique=True)` 1

user 1

  • chrisbarber · 6 ✖

author_association 1

  • NONE 6
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
431993989 https://github.com/pydata/xarray/issues/2371#issuecomment-431993989 https://api.github.com/repos/pydata/xarray/issues/2371 MDEyOklzc3VlQ29tbWVudDQzMTk5Mzk4OQ== chrisbarber 1530840 2018-10-22T21:27:22Z 2018-10-22T21:27:22Z NONE

Fixed by https://github.com/numpy/numpy/pull/11777 Released in https://github.com/numpy/numpy/releases/tag/v1.15.1

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `AttributeError: 'DataArray' object has no attribute 'ravel'` when using `np.intersect1d(..., assume_unique=True)` 351343574
406732486 https://github.com/pydata/xarray/issues/2300#issuecomment-406732486 https://api.github.com/repos/pydata/xarray/issues/2300 MDEyOklzc3VlQ29tbWVudDQwNjczMjQ4Ng== chrisbarber 1530840 2018-07-20T21:33:08Z 2018-07-20T21:33:08Z NONE

I took a closer look and noticed my one-dimensional fields of size 505359 were reporting a chunksize or 63170. Turns out that's enough to come up with a minimal repro: ```python

xr.version '0.10.8' ds=xr.Dataset({'foo': (['bar'], np.zeros((505359,)))}) ds.to_zarr('test.zarr') <xarray.backends.zarr.ZarrStore object at 0x7fd9680f7fd0> ds2=xr.open_zarr('test.zarr') ds2 <xarray.Dataset> Dimensions: (bar: 505359) Dimensions without coordinates: bar Data variables: foo (bar) float64 dask.array<shape=(505359,), chunksize=(63170,)> ds2.foo.encoding {'chunks': (63170,), 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': nan, 'dtype': dtype('float64')} ds2.to_zarr('test2.zarr') raises NotImplementedError: Specified zarr chunks (63170,) would overlap multiple dask chunks ((63170, 63170, 63 170, 63170, 63170, 63170, 63170, 63169),). This is not implemented in xarray yet. Consider rechunking th e data using chunk() or specifying different chunks in encoding. ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr and xarray chunking compatibility and `to_zarr` performance 342531772
406705740 https://github.com/pydata/xarray/issues/2300#issuecomment-406705740 https://api.github.com/repos/pydata/xarray/issues/2300 MDEyOklzc3VlQ29tbWVudDQwNjcwNTc0MA== chrisbarber 1530840 2018-07-20T19:36:08Z 2018-07-20T19:38:03Z NONE

Ah, that's great. I do see some improvement. Specifically, I can now set chunks using xarray, and successfully write to zarr, and reopen it. However, when reopening it I do find that the chunks have been inconsistently applied (some fields have the expected chunksize whereas some small fields have the entire variable in one chunk). Furthermore, trying to write a second time with to_zarr leads to: *** NotImplementedError: Specified zarr chunks (100,) would overlap multiple dask chunks ((100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 4),). This is not implemented in xarray yet. Consider rechunking the data usingchunk()or specifying different chunks in encoding. Trying to reapply the original chunks with xr.Dataset.chunk succeeds, and ds.chunks no longer reports "inconsistent chunks", but trying to write still produces the same error.

I also tried loading my entire dataset into memory, allowing the initial to_zarr to default to zarr's chunking heuristics. Trying to read and write a second time again results in the same error: NotImplementedError: Specified zarr chunks (63170,) would overlap multiple dask chunks ((63170, 63170, 63170, 63170, 63170, 63170, 63170, 63169),). This is not implemented in xarray yet. Consider rechunking the data usingchunk()or specifying different chunks in encoding. I tried this round-tripping experiment with my monkey patches, and it works for a sequence of read/write/read/write... without any intervention in between. This only works for default zarr-chunking, however, since the patch to xr.backends.zarr._determine_zarr_chunks overrides whatever chunks are on the originating dataset.

Curious: Is there any downside in xarray to using datasets with inconsistent chunks? I take it that it is a supported configuration because xarray allows it to happen, but just outputs that error when calling ds.chunks, which is just a sort of convenience method for looking at chunks across a whole dataset which happens to have consistent chunks...?

One other thing to add: it might be nice to have an option to allow zarr auto-chunking even when chunks!={}. I don't know how sensitive zarr performance is to chunksizes, but it'd be nice to have some form of sane auto-chunking available when you don't want to bother with manually choosing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr and xarray chunking compatibility and `to_zarr` performance 342531772
389041983 https://github.com/pydata/xarray/pull/1702#issuecomment-389041983 https://api.github.com/repos/pydata/xarray/issues/1702 MDEyOklzc3VlQ29tbWVudDM4OTA0MTk4Mw== chrisbarber 1530840 2018-05-15T04:51:03Z 2018-05-15T04:51:03Z NONE

Just doing some garbage-collection; it looks like this was somehow fixed. This works in 0.10.0 and 0.10.3: ```

ds = xr.Dataset({'a': ('b', [])}) xr.Dataset.equals(ds, xr.Dataset.from_dict(ds.to_dict())) True ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix empty dataset from_dict 272325640
333247131 https://github.com/pydata/xarray/issues/1599#issuecomment-333247131 https://api.github.com/repos/pydata/xarray/issues/1599 MDEyOklzc3VlQ29tbWVudDMzMzI0NzEzMQ== chrisbarber 1530840 2017-09-29T21:48:58Z 2017-09-29T21:48:58Z NONE

@nicain, for sure. Probably best for the API's sake to stick to the simplicity of a flag.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray to_dict() without converting with numpy tolist() 261727170
333240395 https://github.com/pydata/xarray/issues/1599#issuecomment-333240395 https://api.github.com/repos/pydata/xarray/issues/1599 MDEyOklzc3VlQ29tbWVudDMzMzI0MDM5NQ== chrisbarber 1530840 2017-09-29T21:12:15Z 2017-09-29T21:12:15Z NONE

Could have a callable serializer kwarg that defaults to np.ndarray.tolist. I have a use case where I would pass in np.ndarray.tobytes for this. But then again, I could just use numpy=True or tolist=False and then walk the dict myself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray to_dict() without converting with numpy tolist() 261727170

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.99ms · About: xarray-datasette