github: issue_comments: 9 rows where author_association = "MEMBER" and issue = 345715825 sorted by updated

9 rows where author_association = "MEMBER" and issue = 345715825 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
415177535	https://github.com/pydata/xarray/issues/2329#issuecomment-415177535	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQxNTE3NzUzNQ==	shoyer 1217238	2018-08-22T20:58:36Z	2018-08-22T20:58:36Z	MEMBER	This might be worth testing with the changes from https://github.com/pydata/xarray/pull/2261, which refactors xarray's IO handling.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
409238042	https://github.com/pydata/xarray/issues/2329#issuecomment-409238042	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwOTIzODA0Mg==	fmaussion 10050469	2018-07-31T14:20:06Z	2018-07-31T14:20:06Z	MEMBER	I updated my example above to show that the chunking over the last dimension is ridiculously slow.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
409172635	https://github.com/pydata/xarray/issues/2329#issuecomment-409172635	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwOTE3MjYzNQ==	fmaussion 10050469	2018-07-31T10:25:16Z	2018-07-31T14:18:29Z	MEMBER	Sorry for the confusion, I had an obvious mistake in my timing experiment above (forgot to do the actual computations...). The dimension order does make a difference: ```python import dask as da import xarray as xr d = xr.DataArray(da.array.zeros((1000, 721, 1440), chunks=(10, 721, 1440)), dims=('z', 'y', 'x')) d.to_netcdf('da.nc') # 8.3 Gb with xr.open_dataarray('da.nc', chunks={'z':10}) as d: %timeit d.sum().load() 3.94 s ± 95.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) with xr.open_dataarray('da.nc', chunks={'y':10}) as d: %timeit d.sum().load() 4.15 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) with xr.open_dataarray('da.nc', chunks={'x':10}) as d: %timeit d.sum().load() 1min 54s ± 1.43 s per loop (mean ± std. dev. of 7 runs, 1 loop each) with xr.open_dataarray('da.nc', chunks={'y':10, 'x':10}) as d: %timeit d.sum().load() 2min 23s ± 215 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
409168605	https://github.com/pydata/xarray/issues/2329#issuecomment-409168605	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwOTE2ODYwNQ==	fmaussion 10050469	2018-07-31T10:09:36Z	2018-07-31T13:21:34Z	MEMBER	Those chunksizes are the opposite of what I was expecting... `chunksizes` in `encoding` are ignored in your case, dask still uses your user provided encoding. Can you still try to chunk along one dimension only? i.e. `chunks={'time':200}`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
409165114	https://github.com/pydata/xarray/issues/2329#issuecomment-409165114	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwOTE2NTExNA==	fmaussion 10050469	2018-07-31T09:56:54Z	2018-07-31T10:20:32Z	MEMBER	[EDIT]: forgot the load ... <s> forget my comment about chunks - I thought this would make a difference but it's actually the opposite (to my surprise): </s>	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
409159969	https://github.com/pydata/xarray/issues/2329#issuecomment-409159969	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwOTE1OTk2OQ==	fmaussion 10050469	2018-07-31T09:38:37Z	2018-07-31T10:19:37Z	MEMBER	Out of curiosity: - why do you chunk over lats and lons rather than time? The order of dimensions in your dataarray suggest that chunking over time could be more efficient - can you show the output of `ds.mtpr` and `ds.mtpr.encoding` ?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
408928221	https://github.com/pydata/xarray/issues/2329#issuecomment-408928221	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwODkyODIyMQ==	rabernat 1197350	2018-07-30T16:37:05Z	2018-07-30T16:37:23Z	MEMBER	Can you forget about zarr for a moment and just do a reduction on your dataset? For example: `python ds.sum().load()` Keep the same chunk arguments you are currently using. This will help us understand if the problem is with reading the files. Is it your intention to chunk the files contiguously in time? Depending on the underlying structure of the data within the netCDF file, this could amount to a complete transposition of the data, which could be very slow / expensive. This could have some parallels with #2004.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
408925488	https://github.com/pydata/xarray/issues/2329#issuecomment-408925488	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwODkyNTQ4OA==	rabernat 1197350	2018-07-30T16:28:31Z	2018-07-30T16:28:31Z	MEMBER	I was somehow expecting that each worker will read a chunk and then write it to zarr, streamlined. Yes, this is what we want!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825
408860643	https://github.com/pydata/xarray/issues/2329#issuecomment-408860643	https://api.github.com/repos/pydata/xarray/issues/2329	MDEyOklzc3VlQ29tbWVudDQwODg2MDY0Mw==	rabernat 1197350	2018-07-30T13:20:59Z	2018-07-30T13:20:59Z	MEMBER	@lrntct - this sounds like a reasonable way to use zarr. We routinely do this sort of transcoding and it works reasonable well. Unfortunately something clearly isn't working right in your case. These things can be hard to debug, but we will try to help you. You might want to start by reviewing the guide I wrote for Pangeo on preparing zarr datasets. It would also be good to see a bit more detail. You posted a function `netcdf2zarr` that converts a single netcdf file to a single zarr file. How are you invoking that function? Are you trying to create one zarr store for each netCDF file? How many netCDF files are there? If there are many (e.g. one per timmestep), my recommendation is to create only one zarr store for the whole dataset. Open the netcdf files using `open_mfdataset`. If instead you have just one big netCDF file as in the example you posted above, I think I see you problem: you are calling `.chunk()` after calling `open_dataset()`, rather calling `open_dataset(nc_path, chunks=chunks)`. This probably means that you are loading the whole dataset in a single task and then re-chunking. That could be the source of the inefficiency. More ideas: - explicitly specify the chunks (rather than using `'auto'`) - eliminate the negative number in your chunk sizes - make sure you really need `clevel=9` Another useful piece of advice would be to use the dask distributed dashboard to monitor what is happening under the hood. You can do this by running `python from dask.distributed import Client client = Client() client` In a notebook, this should provide you a link to the scheduler dashboard. Once you call `ds.to_zarr()`, watch the task stream in the dashboard to see what is happening. Hopefully these ideas can help you move forward.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Out-of-core processing with dask not working properly? 345715825

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);