github: issue_comments: 12 rows where author_association = "MEMBER" and issue = 233350060 sorted by updated

12 rows where author_association = "MEMBER" and issue = 233350060 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1470801895	https://github.com/pydata/xarray/issues/1440#issuecomment-1470801895	https://api.github.com/repos/pydata/xarray/issues/1440	IC_kwDOAMm_X85Xqqfn	jhamman 2443309	2023-03-15T20:33:53Z	2023-03-15T20:34:39Z	MEMBER	@lskopintseva - This feature has not been implemented in Xarray (yet). In the meantime, you might find something like this helpful: `python ds = xr.open_dataset("dataset.nc") for v in ds.data_vars: # get variable chunksizes chunksizes = ds[v].encoding.get('chunksizes', None) if chunksizes is not None: chunks = dict(zip(ds[v].dims, chunksizes)) ds[v] = ds[v].chunk(chunks) # chunk the array using the underlying chunksizes` FWIW, I think this would be a nice feature to add to the netcdf4 and h5netcdf backends in Xarray. Contributions welcome!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
632294837	https://github.com/pydata/xarray/issues/1440#issuecomment-632294837	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDYzMjI5NDgzNw==	rabernat 1197350	2020-05-21T19:19:50Z	2020-05-21T19:19:50Z	MEMBER	It seems to me that the there are lots of "layers" of "chunking", especially when you are talking about chunking an entire dataset, To simplify a little bit, here we are only talking about reading a single store, i.e. one netcdf file or one zarr group. Also out of scope is the underlying storage medium (e.g. block size).	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
632266536	https://github.com/pydata/xarray/issues/1440#issuecomment-632266536	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDYzMjI2NjUzNg==	rabernat 1197350	2020-05-21T18:23:13Z	2020-05-21T18:23:13Z	MEMBER	Can we overload the `chunks` argument in `open_xxx` to do this? We are already adding support for `chunks="auto"` ... This gets tricky, because we may want slightly different behavior depending on whether the underlying array store is chunked.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
632222508	https://github.com/pydata/xarray/issues/1440#issuecomment-632222508	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDYzMjIyMjUwOA==	dcherian 2448579	2020-05-21T16:56:02Z	2020-05-21T16:56:02Z	MEMBER	should we have an option like chunk_size='native', or chunk_size='100MB' Can we overload the `chunks` argument in `open_xxx` to do this? We are already adding support for `chunks="auto"` ...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
632183683	https://github.com/pydata/xarray/issues/1440#issuecomment-632183683	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDYzMjE4MzY4Mw==	rabernat 1197350	2020-05-21T16:13:46Z	2020-05-21T16:14:08Z	MEMBER	We discussed this issue today in our pangeo coffee break. We think the following plan would be good: [ ] Write a function called `auto_chunk(variable)` which examines a variable for the presence of a `chunks` attribute in encoding or within the data itself. Returns a new variable with chunked data. [ ] Refactor `open_zarr` to call this function [ ] Add it also to `open_dataset` to enable auto-chunking of netCDF and geotiff data Should we have an option like `chunk_size='native'`, or `chunk_size='100MB'`, with chunks chosen to align with source chunks.	{ "total_count": 2, "+1": 0, "-1": 0, "laugh": 0, "hooray": 2, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
358829682	https://github.com/pydata/xarray/issues/1440#issuecomment-358829682	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDM1ODgyOTY4Mg==	jhamman 2443309	2018-01-19T00:38:16Z	2018-01-19T00:38:16Z	MEMBER	cc @kmpaul who wanted to review this conversation.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
318433236	https://github.com/pydata/xarray/issues/1440#issuecomment-318433236	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDMxODQzMzIzNg==	jhamman 2443309	2017-07-27T17:37:39Z	2017-07-27T17:37:39Z	MEMBER	@Zac-HD - We merged #1457 yesterday which should give us a platform to test any improvements we make related to this issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
310733017	https://github.com/pydata/xarray/issues/1440#issuecomment-310733017	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDMxMDczMzAxNw==	jhamman 2443309	2017-06-23T17:59:07Z	2017-06-23T17:59:07Z	MEMBER	@Zac-HD - thanks for you detailed report. ping me again when you get started on some benchmarking and feel free to chime in further to #1457. No block should include data from multiple files (near-absolute, due to locking - though concurrent read is supported on lower levels?) Hopefully we can find some optimizations that help with this. I routinely want to do this, though I understand why its not always a good idea.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
308879158	https://github.com/pydata/xarray/issues/1440#issuecomment-308879158	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDMwODg3OTE1OA==	jhamman 2443309	2017-06-15T22:07:33Z	2017-06-16T00:12:43Z	MEMBER	@Zac-HD - I'm about to put up a PR with some initial benchmarking functionality (#1457). Are you open to putting together PR for the features you've described above? Hopefully, these two can work together. As for the API changes related to this issue, I'd propose the following: Use the chunks keyword to support 3 additional options python def open_dataset(filename_or_obj, ..., chunks=None, ...): """Load and decode a dataset from a file or file-like object. Parameters ---------- .... chunks : int or dict or set or 'auto' or 'disk', optional If chunks is provided, it used to load the new dataset into dask arrays. ``chunks={}`` loads the dataset with dask using a single chunk for all arrays. ... """ `int`: chunk each dimension by `chunks` `dict`: Dictionary with keys given by dimension names and values given by chunk sizes. In general, these should divide the dimensions of each dataset `set` (or `list` or `tuple`) of str: chunk the dimension(s) provided by some heuristic, try to keep the chunk shape/size compatible with the storage of the data on disk and for use with dask `'auto'` (str): chunk the array(s) using some auto-magical heuristic that is compatible with the storage of the data on disk and is semi-optimized (in size) for use with dask `'disk'` (str): use the chunksize of the netCDF variable directly.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
306617217	https://github.com/pydata/xarray/issues/1440#issuecomment-306617217	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDMwNjYxNzIxNw==	shoyer 1217238	2017-06-06T21:05:56Z	2017-06-06T21:05:56Z	MEMBER	I think its unavoidable that users understand how their data will be processed (e.g., whether operations will be mapped over time or space). But maybe some sort of heuristics (if not a fully automated solution) are possible. For example, maybe `chunks={'time'}` (note the `set` rather than a `dict`) could indicate "divide me into automatically chosen chunks over the `time` dimension". It's still explicit about how chunking is being done, but comes closer to expressing the intent rather than the details.	{ "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
306587426	https://github.com/pydata/xarray/issues/1440#issuecomment-306587426	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDMwNjU4NzQyNg==	jhamman 2443309	2017-06-06T19:10:27Z	2017-06-06T19:10:27Z	MEMBER	I'd certainly support a warning when dask chunks do not align with the on-disk chunks. Beyond that, I think we could work on a utility for automatically determining chunks sizes for xarray using some heuristics. Before we go there though, I think we really should develop some performance benchmarks. We're starting to get a lot of questions/issues about performance and it seems like we need some benchmarking to happen before we can really start fixing the underlying issues.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
306009664	https://github.com/pydata/xarray/issues/1440#issuecomment-306009664	https://api.github.com/repos/pydata/xarray/issues/1440	MDEyOklzc3VlQ29tbWVudDMwNjAwOTY2NA==	shoyer 1217238	2017-06-04T00:28:19Z	2017-06-04T00:28:19Z	MEMBER	My main concern is that netCDF4 chunk sizes (e.g., ~10-100KB in that blog post) are often much smaller than well sized dask chunks (10-100MB, per the Dask FAQ). I do think it would be appropriate to issue a warning if you are making dask chunks that don't line up nicely with chunks on disk to avoid performance issues (in general each chunk on disk should usually end up on only one chunk in dask), but there are lots of options for aggregating to larger chunks and it's hard to choose the best way to do that without knowing how the data will be used.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);