home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE" and issue = 233350060 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: body, created_at (date), updated_at (date)

user 4

  • matt-long 2
  • JanisGailis 2
  • stale[bot] 2
  • lskopintseva 1

issue 1

  • If a NetCDF file is chunked on disk, open it with compatible dask chunks · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1466954335 https://github.com/pydata/xarray/issues/1440#issuecomment-1466954335 https://api.github.com/repos/pydata/xarray/issues/1440 IC_kwDOAMm_X85Xb_Jf lskopintseva 67558326 2023-03-13T21:04:01Z 2023-03-13T21:04:01Z NONE

I have a netCDF file where variables are saved on the disk in chunks and I would like to read my netcdf file using xr.open_dataset in original chunks. I there a way to do it in xarray? Since xarray is built in netCDF4 library, I would expect this feature to be present in xarray as well..

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
1112717271 https://github.com/pydata/xarray/issues/1440#issuecomment-1112717271 https://api.github.com/repos/pydata/xarray/issues/1440 IC_kwDOAMm_X85CUrfX stale[bot] 26384082 2022-04-28T22:37:45Z 2022-04-28T22:37:45Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
573292963 https://github.com/pydata/xarray/issues/1440#issuecomment-573292963 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDU3MzI5Mjk2Mw== stale[bot] 26384082 2020-01-11T07:55:24Z 2020-01-11T07:55:24Z NONE

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
309006084 https://github.com/pydata/xarray/issues/1440#issuecomment-309006084 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwOTAwNjA4NA== matt-long 9341267 2017-06-16T11:49:38Z 2017-06-16T11:49:38Z NONE

@Zac-HD: Thanks, I submitted a separate issue: #1458

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
308821670 https://github.com/pydata/xarray/issues/1440#issuecomment-308821670 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwODgyMTY3MA== matt-long 9341267 2017-06-15T18:01:51Z 2017-06-15T18:01:51Z NONE

I have encountered a related issue here. When I read a file with netCDF4 compression into a Dataset, a subsequent call to write the dataset using to_netcdf fails.

For instance, using data from the POP model, I can convert output to netCDF4 using NCO $ ncks --netcdf4 --deflate 1 $file nc4-test.nc Then in Python: ds = xr.open_dataset('nc4-test.nc',decode_times=False,decode_coords=False) ds.to_netcdf('test-out.nc') The write fails with: "RuntimeError: NetCDF: Bad chunk sizes."

If I include format = NETCDF3_64BIT, the write completes.

This seems like a bug.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
307070835 https://github.com/pydata/xarray/issues/1440#issuecomment-307070835 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNzA3MDgzNQ== JanisGailis 9655353 2017-06-08T10:59:45Z 2017-06-08T10:59:45Z NONE

I quite like the approach you're suggesting! What I dislike the most currently with our approach is that it is a real possibility that a single netCDF chunk falls into multiple dask chunks, we don't control for that in any way! I'd happily swap our approach out to the more general one you suggest.

This does of course beg for input regarding the API constraints, as in, would it be a good idea to add more kwargs for chunk size threshold and edge ratio to open functions?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
306814837 https://github.com/pydata/xarray/issues/1440#issuecomment-306814837 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNjgxNDgzNw== JanisGailis 9655353 2017-06-07T14:37:14Z 2017-06-07T14:37:14Z NONE

We had a similar issue some time ago. We use xr.open_mfdataset to open long time series of data, where each time slice is a single file. In this case each file becomes a single dask chunk, which is appropriate for most data we have to work with (ESA CCI datasets).

We encountered a problem, however, with a few datasets that had very significant compression levels, such that a single file would fit in memory, but not a few of them, on a consumer-ish laptop. So, the machine would quickly run out of memory when working with the opened dataset.

As we have to be able to open 'automatically' all ESA CCI datasets, manually denoting the chunk sizes was not an option, so we explored a few ways how to do this. Aligning the chunk sizes with NetCDF chunking was not a great idea because of the reason shoyer mentions above. The chunk sizes for some datasets would be too small and the bottleneck moves from memory consumption to the amount of read/write operations.

We eventually figured (with help from shoyer :)) that the chunks should be small enough to fit in memory on an average user's laptop. yet as big as possible to maximize the amount of NetCDF chunks falling nicely in the dask chunk. Also, shape of the dask chunk can be of importance to maximize the amount of NetCDF chunks falling nicely in. We figured it's a good guess to divide both lat and lon dimensions by the same divisor, as that's also how NetCDF is often chunked.

So, we open the first file, determine it's 'uncompressed' size and then figure out if we should chunk it as 1, 2x2, 3x3, etc. It's far from a perfect solution, but it works in our case. Here's how we have implemented this: https://github.com/CCI-Tools/cate-core/blob/master/cate/core/ds.py#L506

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 569.489ms · About: xarray-datasette