home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "CONTRIBUTOR" and user = 5179430 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 5

  • Update contains_cftime_datetimes to avoid loading entire variable array 3
  • CFTimeIndex 1
  • implement interp() 1
  • Inconsistent results when calculating sums on float32 arrays w/ bottleneck installed 1
  • Opening datasets with large object dtype arrays is very slow 1

user 1

  • agoodm · 7 ✖

author_association 1

  • CONTRIBUTOR · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1458453929 https://github.com/pydata/xarray/pull/7494#issuecomment-1458453929 https://api.github.com/repos/pydata/xarray/issues/7494 IC_kwDOAMm_X85W7j2p agoodm 5179430 2023-03-07T16:22:21Z 2023-03-07T16:22:21Z CONTRIBUTOR

Thanks @Illviljan and @dcherian for helping to see this through.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update contains_cftime_datetimes to avoid loading entire variable array 1563270549
1411206291 https://github.com/pydata/xarray/pull/7494#issuecomment-1411206291 https://api.github.com/repos/pydata/xarray/issues/7494 IC_kwDOAMm_X85UHUyT agoodm 5179430 2023-01-31T23:17:38Z 2023-01-31T23:17:38Z CONTRIBUTOR

@Illviljan I gave your update a quick test, it seems to work well enough and still maintains the performance improvement. It looks fine to me though I guess it looks like you still need to fix this failing mypy stuff now?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update contains_cftime_datetimes to avoid loading entire variable array 1563270549
1410253782 https://github.com/pydata/xarray/pull/7494#issuecomment-1410253782 https://api.github.com/repos/pydata/xarray/issues/7494 IC_kwDOAMm_X85UDsPW agoodm 5179430 2023-01-31T12:22:02Z 2023-01-31T12:26:37Z CONTRIBUTOR

Thanks for the PR. However, does that actually make a difference? To me it looks like _contains_cftime_datetimes also only considers one element of the array.

https://github.com/pydata/xarray/blob/b4515582ffc8b7f63632bfccd109d19889d00384/xarray/core/common.py#L1779-L1780

This isn't actually the line of code that's causing the performance bottleneck, it's the access to var.data in the function call that is actually problematic as I explained in the issue thread. You can verify this yourself running this simple example before and after applying the changes in this PR:

```python import numpy as np import xarray as xr

str_array = np.arange(100000000).astype(str) ds = xr.DataArray(dims=('x',), data=str_array).to_dataset(name='str_array') ds = ds.chunk(x=10000) ds['str_array'] = ds.str_array.astype('O') # Needs to actually be object dtype to show the problem ds.to_zarr('str_array.zarr')

%time xr.open_zarr('str_array.zarr') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update contains_cftime_datetimes to avoid loading entire variable array 1563270549
1409299311 https://github.com/pydata/xarray/issues/7484#issuecomment-1409299311 https://api.github.com/repos/pydata/xarray/issues/7484 IC_kwDOAMm_X85UADNv agoodm 5179430 2023-01-30T20:36:46Z 2023-01-30T20:36:46Z CONTRIBUTOR

Great, thanks! It's actually the var.data attribute access itself that's triggering the loading so that's why I needed to put the change there, but I see your point that I should probably update contains_cftime_datetimes as well since selecting the first element again is stylistically redundant. In any case, I'll go ahead and quickly get to work at preparing a PR for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening datasets with large object dtype arrays is very slow 1561508426
413658420 https://github.com/pydata/xarray/issues/2370#issuecomment-413658420 https://api.github.com/repos/pydata/xarray/issues/2370 MDEyOklzc3VlQ29tbWVudDQxMzY1ODQyMA== agoodm 5179430 2018-08-16T19:28:42Z 2018-08-16T19:28:42Z CONTRIBUTOR

Perhaps we could make it possible to to set the ops engine (to either numpy or bottleneck) and dtype (float32, float64) via set_options()? Right now bottleneck is automatically chosen if it is installed, which is rather annoying since the xarray recipe on conda-forge ships with bottleneck even though it should be an optional dependency. Maybe that's something I should take up with the feedstock maintainers, but at the very least I think xarray should at least make its inclusion less rigid in light of these issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Inconsistent results when calculating sums on float32 arrays w/ bottleneck installed 351000813
389667265 https://github.com/pydata/xarray/pull/2104#issuecomment-389667265 https://api.github.com/repos/pydata/xarray/issues/2104 MDEyOklzc3VlQ29tbWVudDM4OTY2NzI2NQ== agoodm 5179430 2018-05-16T21:11:52Z 2018-05-16T21:11:52Z CONTRIBUTOR

Very nice! I noticed that the interpolation is performed among dimensions rather than coordinates in this PR. However the limitation to that is interpolation to/from curvilinear grids is not supported, which I think is a common enough use case, and would be extremely nice to have. Pretty sure scipy's interpolation tools work out of the box with curvilinear grids. Is an updated interface which works on coordinate variables rather than dimensions planned?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement interp() 320275317
380580598 https://github.com/pydata/xarray/pull/1252#issuecomment-380580598 https://api.github.com/repos/pydata/xarray/issues/1252 MDEyOklzc3VlQ29tbWVudDM4MDU4MDU5OA== agoodm 5179430 2018-04-11T20:11:44Z 2018-04-11T20:14:43Z CONTRIBUTOR

Hi all, any updates on the current status for this? This will be a big help for me as well in particular for processing daily CMIP5 netcdf files. I have been following this thread as well as the original issue and really appreciate this work. One other question: This PR doesn't allow for resampling on non-standard calendars as is, but I remember @shoyer mentioning that a workaround using pandas Grouper objects will exist. Would someone be able to explain to me how this would work? Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  CFTimeIndex 205473898

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 136.791ms · About: xarray-datasette