home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where author_association = "MEMBER", issue = 233350060 and user = 2443309 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date)

user 1

  • jhamman · 6 ✖

issue 1

  • If a NetCDF file is chunked on disk, open it with compatible dask chunks · 6 ✖

author_association 1

  • MEMBER · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1470801895 https://github.com/pydata/xarray/issues/1440#issuecomment-1470801895 https://api.github.com/repos/pydata/xarray/issues/1440 IC_kwDOAMm_X85Xqqfn jhamman 2443309 2023-03-15T20:33:53Z 2023-03-15T20:34:39Z MEMBER

@lskopintseva - This feature has not been implemented in Xarray (yet). In the meantime, you might find something like this helpful:

python ds = xr.open_dataset("dataset.nc") for v in ds.data_vars: # get variable chunksizes chunksizes = ds[v].encoding.get('chunksizes', None) if chunksizes is not None: chunks = dict(zip(ds[v].dims, chunksizes)) ds[v] = ds[v].chunk(chunks) # chunk the array using the underlying chunksizes

FWIW, I think this would be a nice feature to add to the netcdf4 and h5netcdf backends in Xarray. Contributions welcome!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
358829682 https://github.com/pydata/xarray/issues/1440#issuecomment-358829682 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDM1ODgyOTY4Mg== jhamman 2443309 2018-01-19T00:38:16Z 2018-01-19T00:38:16Z MEMBER

cc @kmpaul who wanted to review this conversation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
318433236 https://github.com/pydata/xarray/issues/1440#issuecomment-318433236 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMxODQzMzIzNg== jhamman 2443309 2017-07-27T17:37:39Z 2017-07-27T17:37:39Z MEMBER

@Zac-HD - We merged #1457 yesterday which should give us a platform to test any improvements we make related to this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
310733017 https://github.com/pydata/xarray/issues/1440#issuecomment-310733017 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMxMDczMzAxNw== jhamman 2443309 2017-06-23T17:59:07Z 2017-06-23T17:59:07Z MEMBER

@Zac-HD - thanks for you detailed report.

ping me again when you get started on some benchmarking and feel free to chime in further to #1457.

No block should include data from multiple files (near-absolute, due to locking - though concurrent read is supported on lower levels?)

Hopefully we can find some optimizations that help with this. I routinely want to do this, though I understand why its not always a good idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
308879158 https://github.com/pydata/xarray/issues/1440#issuecomment-308879158 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwODg3OTE1OA== jhamman 2443309 2017-06-15T22:07:33Z 2017-06-16T00:12:43Z MEMBER

@Zac-HD - I'm about to put up a PR with some initial benchmarking functionality (#1457). Are you open to putting together PR for the features you've described above? Hopefully, these two can work together.

As for the API changes related to this issue, I'd propose the following:

Use the chunks keyword to support 3 additional options

python def open_dataset(filename_or_obj, ..., chunks=None, ...): """Load and decode a dataset from a file or file-like object. Parameters ---------- .... chunks : int or dict or set or 'auto' or 'disk', optional If chunks is provided, it used to load the new dataset into dask arrays. ``chunks={}`` loads the dataset with dask using a single chunk for all arrays. ... """

  • int: chunk each dimension by chunks
  • dict: Dictionary with keys given by dimension names and values given by chunk sizes. In general, these should divide the dimensions of each dataset
  • set (or list or tuple) of str: chunk the dimension(s) provided by some heuristic, try to keep the chunk shape/size compatible with the storage of the data on disk and for use with dask
  • 'auto' (str): chunk the array(s) using some auto-magical heuristic that is compatible with the storage of the data on disk and is semi-optimized (in size) for use with dask
  • 'disk' (str): use the chunksize of the netCDF variable directly.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
306587426 https://github.com/pydata/xarray/issues/1440#issuecomment-306587426 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNjU4NzQyNg== jhamman 2443309 2017-06-06T19:10:27Z 2017-06-06T19:10:27Z MEMBER

I'd certainly support a warning when dask chunks do not align with the on-disk chunks.

Beyond that, I think we could work on a utility for automatically determining chunks sizes for xarray using some heuristics. Before we go there though, I think we really should develop some performance benchmarks. We're starting to get a lot of questions/issues about performance and it seems like we need some benchmarking to happen before we can really start fixing the underlying issues.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 2006.952ms · About: xarray-datasette