home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

14 rows where issue = 307318224 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 4

  • WeatherGod 6
  • shoyer 4
  • jswhit 3
  • dcherian 1

author_association 3

  • CONTRIBUTOR 6
  • MEMBER 5
  • NONE 3

issue 1

  • Slicing DataArray can take longer than not slicing · 14 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
738189796 https://github.com/pydata/xarray/issues/2004#issuecomment-738189796 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDczODE4OTc5Ng== WeatherGod 291576 2020-12-03T18:15:35Z 2020-12-03T18:15:35Z CONTRIBUTOR

I think so, at least in terms of my original problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
738183069 https://github.com/pydata/xarray/issues/2004#issuecomment-738183069 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDczODE4MzA2OQ== dcherian 2448579 2020-12-03T18:03:29Z 2020-12-03T18:03:29Z MEMBER

can this be closed?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
460881018 https://github.com/pydata/xarray/issues/2004#issuecomment-460881018 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDQ2MDg4MTAxOA== shoyer 1217238 2019-02-06T02:32:46Z 2019-02-06T02:32:46Z MEMBER

The performance difference here does indeed to have been fixed with netCDF-C 4.6.2 (but see also https://github.com/pydata/xarray/issues/2747)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
396317995 https://github.com/pydata/xarray/issues/2004#issuecomment-396317995 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM5NjMxNzk5NQ== jswhit 579593 2018-06-11T17:16:43Z 2018-06-11T17:16:43Z NONE

netcdf-c master now includes the same mechanism for strided access of HDF5 files as h5py. If netcdf4-python is linked against netcdf-c >= 4.6.2, performance for strided access should be greatly improved.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375102231 https://github.com/pydata/xarray/issues/2004#issuecomment-375102231 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTEwMjIzMQ== jswhit 579593 2018-03-21T21:29:34Z 2018-03-21T21:29:34Z NONE

Confirmed that the slow performance of netcdf4-python on strided access is due to the way that netcdf-c calls HDF5. There's now an issue on the netcdf-c issue tracker to implement fast strided access for HDF5 files (https://github.com/Unidata/netcdf-c/issues/908).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375067743 https://github.com/pydata/xarray/issues/2004#issuecomment-375067743 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTA2Nzc0Mw== shoyer 1217238 2018-03-21T19:29:51Z 2018-03-21T19:29:51Z MEMBER

H5py is doing all the hard work for this in h5netcdf. On Wed, Mar 21, 2018 at 11:51 AM Benjamin Root notifications@github.com wrote:

Ah, nevermind, I see that our examples only had one greater-than-one stride

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2004#issuecomment-375056363, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1g1ciNap4E9K2_dPKrol8ocz3DvLks5tgqEWgaJpZM4S0lM- .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375056363 https://github.com/pydata/xarray/issues/2004#issuecomment-375056363 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTA1NjM2Mw== WeatherGod 291576 2018-03-21T18:50:58Z 2018-03-21T18:50:58Z CONTRIBUTOR

Ah, nevermind, I see that our examples only had one greater-than-one stride

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375056077 https://github.com/pydata/xarray/issues/2004#issuecomment-375056077 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTA1NjA3Nw== WeatherGod 291576 2018-03-21T18:50:01Z 2018-03-21T18:50:01Z CONTRIBUTOR

Dunno. I can't seem to get that engine working on my system.

Reading through that thread, I wonder if the optimization they added only applies if there is only one stride greater than one?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375054212 https://github.com/pydata/xarray/issues/2004#issuecomment-375054212 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTA1NDIxMg== jswhit 579593 2018-03-21T18:44:14Z 2018-03-21T18:44:14Z NONE

netcdf4-python does reopened[::1, ::10] by making a bunch of calls to the C lib routine nc_get_vara. As pointed out in Unidata/netcdf4-python#680, this is faster than a single call to nc_get_vars (which does strided access, but is very slow). Note that reopened[::1, ::1][:,::10] is very fast, but you have to have enough memory to hold the entire array. I wonder how h5netcdf is reading the data - is it pulling the entire array into memory and then selecting or subset?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375036951 https://github.com/pydata/xarray/issues/2004#issuecomment-375036951 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAzNjk1MQ== WeatherGod 291576 2018-03-21T17:51:54Z 2018-03-21T17:51:54Z CONTRIBUTOR

This might be relevant: https://github.com/Unidata/netcdf4-python/issues/680

Still reading through the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375034973 https://github.com/pydata/xarray/issues/2004#issuecomment-375034973 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAzNDk3Mw== WeatherGod 291576 2018-03-21T17:46:09Z 2018-03-21T17:46:09Z CONTRIBUTOR

my bet is probably netCDF4-python. Don't want to write up the C code though to confirm it. Sigh... this isn't going to be a fun one to track down. Shall I open a bug report over there?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375020977 https://github.com/pydata/xarray/issues/2004#issuecomment-375020977 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAyMDk3Nw== shoyer 1217238 2018-03-21T17:08:15Z 2018-03-21T17:08:15Z MEMBER

The culprit appears to be netCDF4-python and/or netCDF-C: ``` f = netCDF4.Dataset('test.nc')

%time f['xarray_dataarray_variable'][:, ::10]

CPU times: user 313 ms, sys: 1.23 s, total: 1.54 s

```

When I try doing the same operation with h5netcdf, it runs very quickly: ```python reopened = xr.open_dataarray('test.nc', engine='h5netcdf')

%time reopened[::1, ::10].compute()

CPU times: user 6.11 ms, sys: 3.63 ms, total: 9.74 ms

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375014480 https://github.com/pydata/xarray/issues/2004#issuecomment-375014480 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAxNDQ4MA== WeatherGod 291576 2018-03-21T16:50:59Z 2018-03-21T16:56:13Z CONTRIBUTOR

Yeah, good example. Eliminates a lot of possible variables such as problems with netcdf4 compression and such. Probably should see if it happens in v0.10.0 to see if the changes to the indexing system caused this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375010010 https://github.com/pydata/xarray/issues/2004#issuecomment-375010010 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAxMDAxMA== shoyer 1217238 2018-03-21T16:38:59Z 2018-03-21T16:38:59Z MEMBER

Here's a simpler case that gets at the essence of the problem: ```python import xarray as xr import numpy as np

source = xr.DataArray(np.zeros((100, 12000)), dims=['time', 'x']) source.to_netcdf('test.nc', format='NETCDF4') reopened = xr.open_dataarray('test.nc')

%time reopened[::1, ::1].compute()

CPU times: user 1.35 ms, sys: 6.77 ms, total: 8.12 ms

%time reopened[::1, ::10].compute()

CPU times: user 371 ms, sys: 1.33 s, total: 1.7 s

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1125.079ms · About: xarray-datasette