home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where user = 668201 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 2

  • Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 2
  • Writing a netCDF file is unexpectedly slow 2

user 1

  • fsteinmetz · 4 ✖

author_association 1

  • NONE 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
542369777 https://github.com/pydata/xarray/issues/2912#issuecomment-542369777 https://api.github.com/repos/pydata/xarray/issues/2912 MDEyOklzc3VlQ29tbWVudDU0MjM2OTc3Nw== fsteinmetz 668201 2019-10-15T19:32:50Z 2019-10-15T19:32:50Z NONE

Thanks for the explanations @jhamman and @shoyer :) Actually it turns out that I was not using particularly small chunks, but the filesystem for /tmp was faulty... After trying on a reliable filesystem, the results are much more reasonable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing a netCDF file is unexpectedly slow 435535284
533801682 https://github.com/pydata/xarray/issues/2912#issuecomment-533801682 https://api.github.com/repos/pydata/xarray/issues/2912 MDEyOklzc3VlQ29tbWVudDUzMzgwMTY4Mg== fsteinmetz 668201 2019-09-21T14:21:17Z 2019-09-21T14:21:17Z NONE

There are ways to side step some of these challenges (save_mfdataset and the distributed dask scheduler)

@jhamman Could you elaborate on these ways ?

I am having severe slow-downs when writing Datasets by blocks (backed by dask). I have also noticed that the slowdowns do not occur when writing to ramdisk. Here are the timings of to_netcdf, which uses default engine and encoding (the nc file is 4.3 GB) :

  • When writing to ramdisk (/dev/shm/) : 2min 1s
  • When writing to /tmp/ : 27min 28s
  • When writing to /tmp/ after .load(), as suggested here : 34s (.load takes 1min 43s)

The workaround suggested here works, but the datasets may not always fit in memory, and it fails the essential purpose of dask...

Note: I am using dask 2.3.0 and xarray 0.12.3

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Writing a netCDF file is unexpectedly slow 435535284
295657656 https://github.com/pydata/xarray/issues/1378#issuecomment-295657656 https://api.github.com/repos/pydata/xarray/issues/1378 MDEyOklzc3VlQ29tbWVudDI5NTY1NzY1Ng== fsteinmetz 668201 2017-04-20T09:50:19Z 2017-04-20T09:53:33Z NONE

I cannot see a use case in which repeated dims actually make sense.

In my case this situation originates from h5 files which indeed contains repeated dimensions (variables(dimensions): uint16 B0(phony_dim_0,phony_dim_0), ..., uint8 VAA(phony_dim_1,phony_dim_1)), thus xarray is not to blame here. These are "dummy" dimensions, not associated with physical values. What we do to circumvent this problem is "re-dimension" all variables. Maybe a safe approach would be for open_dataset to raise a warning by default when encountering such variables, with possibly an option to perform automatic or custom dimension naming to avoid repeated dims. I also agree with @shoyer that failing loudly when operating on such DataArrays instead of providing confusing results would be an improvement.

{
    "total_count": 5,
    "+1": 1,
    "-1": 4,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855
295593740 https://github.com/pydata/xarray/issues/1378#issuecomment-295593740 https://api.github.com/repos/pydata/xarray/issues/1378 MDEyOklzc3VlQ29tbWVudDI5NTU5Mzc0MA== fsteinmetz 668201 2017-04-20T06:11:02Z 2017-04-20T06:11:02Z NONE

Right, also positional indexing works unexpectedly in this case, though I understand it's tricky and should probably be discouraged: python A[0,:] # returns A A[:,0] # returns A.isel(dim0=0)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Many methods are broken (e.g., concat/stack/sortby) when using repeated dimensions 222676855

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 321.055ms · About: xarray-datasette