github: issue_comments: 6 rows where issue = 627735640 sorted by updated

6 rows where issue = 627735640 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1102835310	https://github.com/pydata/xarray/issues/4113#issuecomment-1102835310	https://api.github.com/repos/pydata/xarray/issues/4113	IC_kwDOAMm_X85Bu-5u	max-sixty 5635139	2022-04-19T16:10:25Z	2022-04-19T16:10:25Z	MEMBER	Closing as mostly resolved, I think? Please reopen if not.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.DataArray.stack load data into memory 627735640
1102492660	https://github.com/pydata/xarray/issues/4113#issuecomment-1102492660	https://api.github.com/repos/pydata/xarray/issues/4113	IC_kwDOAMm_X85BtrP0	stale[bot] 26384082	2022-04-19T10:43:48Z	2022-04-19T10:43:48Z	NONE	In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.DataArray.stack load data into memory 627735640
636849900	https://github.com/pydata/xarray/issues/4113#issuecomment-636849900	https://api.github.com/repos/pydata/xarray/issues/4113	MDEyOklzc3VlQ29tbWVudDYzNjg0OTkwMA==	Paul-Aime 36678697	2020-06-01T13:06:02Z	2020-06-01T13:06:02Z	NONE	I think it depends on the chunk size. Yes, I'm not very familiar with chunks, it seems that it's not good to have too many of them. I am not sure where 512 comes from in your example (maybe dask does something). Sorry it should have been `(100, 2048)`, it comes from the second dimension of stacking (explained below). My screenshot was for `.stack(px=("y", "x"))`, my bad. If I work with `chunks=dict(x=128, y=128)`, the chunksize after the stacking was `(100, 16384)`, which is reasonable (`z=100`, `px=(128, 128)`). Yes, after some more experiments I found out that the second chunksize after stacking is `(100, X)` where X is a multiple of the size of the second stacking dimension (here `"y"`), hence why it is working in your case (`128 * 128 == 2048 * 8`). The formula for X is something like: `shape[1] * ( (x_chunk * y_chunk) // shape[1] + bool((x_chunk * y_chunk) % shape[1]) )` So, minimum value for X is `shape[1]` (size of `"y"` dim, hence my case with small values for `x_chunk` and `y_chunk`). That's why I was saying that "chunks along the second stacking dimension seem to be merged". This might be normal, just unexpected, and still quite obscure for me. And it must be happening on dask side anyway. Thanks a lot for your insights. You can do `reset_index` before saving it into the netCDF, but it requires another computation when creating the MultiIndex after loading. Ah yes, thanks! I thought `reset_index` was similar to `unstack` for indexes created with `stack`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.DataArray.stack load data into memory 627735640
636619598	https://github.com/pydata/xarray/issues/4113#issuecomment-636619598	https://api.github.com/repos/pydata/xarray/issues/4113	MDEyOklzc3VlQ29tbWVudDYzNjYxOTU5OA==	fujiisoup 6815844	2020-06-01T05:24:35Z	2020-06-01T05:24:35Z	MEMBER	Reading with chunks load the memory more than reading without chunks, but not loading an amount of memory equals to the size of the array (300MB for a 800MB array in the example below). And by the way, also loading up the memory a bit more when stacking. I think it depends on the chunk size. If I use the chunks `chunks=dict(x=128, y=128)`, the memory usage is `RAM: 118.14 MB da: 800.0 MB RAM: 119.14 MB RAM: 125.59 MB RAM: 943.79 MB` When stacking a chunked array, only chunks alongside the first stacking dimension are conserved, and chunks along the second stacking dimension seem to be merged. I am not sure where `512` comes from in your example (maybe dask does something). If I work with `chunks=dict(x=128, y=128)`, the chunksize after the stacking was `(100, 16384)`, which is reasonable (z=100, px=(128, 128)). A workaround could have been to save the data already stacked, but "MultiIndex cannot yet be serialized to netCDF". You can do `reset_index` before saving it into the netCDF, but it requires another computation when creating the MultiIndex after loading.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.DataArray.stack load data into memory 627735640
636491064	https://github.com/pydata/xarray/issues/4113#issuecomment-636491064	https://api.github.com/repos/pydata/xarray/issues/4113	MDEyOklzc3VlQ29tbWVudDYzNjQ5MTA2NA==	Paul-Aime 36678697	2020-05-31T16:04:39Z	2020-05-31T16:04:39Z	NONE	Thanks for the answer. I tried some experiments with chunked reading with dask, but I have observations I don't fully get : 1) Still loading memory Reading with chunks load the memory more than reading without chunks, but not loading an amount of memory equals to the size of the array (300MB for a 800MB array in the example below). And by the way, also loading up the memory a bit more when stacking. But I think this may be normal, because of something like loading the dask machinery in the memory, and that I will see the full benefits when working on bigger data, am I right? 2) Stacking is breaking the chunks When stacking a chunked array, only chunks alongside the first stacking dimension are conserved, and chunks along the second stacking dimension seem to be merged. I think this has something to do with the very nature of indexes, but not sure. 3) Rechunking load the memory A workaround to 2) could have been to re-chunk as wanted after stacking, but then it is fully loading the data. Example (Considering the following to replace the `main()` function of the script in the original post.) ```python def main(): `fname = "da.nc" shape = 512, 2048, 100 # 800 MB xr.DataArray( np.random.randn(*shape), dims=("x", "y", "z"), ).to_netcdf(fname) print_ram_state() da = xr.open_dataarray(fname, chunks=dict(x=1, y=1)) print(f" da: {mb(da.nbytes)} MB") print_ram_state() mda = da.stack(px=("x", "y")) print_ram_state() mda = mda.chunk(dict(px=1)) print_ram_state()` ``` which outputs something like: `RAM: 94.52 MB da: 800.0 MB RAM: 398.83 MB RAM: 589.05 MB RAM: 1409.11 MB` Chunks displayed thanks to the jupyter notebook visualization: Before stacking: After stacking: A workaround could have been to save the data already stacked, but "MultiIndex cannot yet be serialized to netCDF". Maybe there is another workaround? (Sorry for the long post)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.DataArray.stack load data into memory 627735640
636418772	https://github.com/pydata/xarray/issues/4113#issuecomment-636418772	https://api.github.com/repos/pydata/xarray/issues/4113	MDEyOklzc3VlQ29tbWVudDYzNjQxODc3Mg==	fujiisoup 6815844	2020-05-31T04:21:29Z	2020-05-31T04:21:29Z	MEMBER	Thank you for raising an issue. I confirmed this problem is reproduced. Since our Lazyarray does not support the reshaping, it loads the data automatically. This automatic loading happens in many other operations. For example, if you multiply your array by a scalar, `python mda = da *2` It also loads the data into memory. Maybe we should improve the documentation. FYI, using dask arrays may solve this problem. To open the file with dask, you could add `chunks` keywords, `python da = xr.open_dataarray("da.nc", chunks={'x': 16, 'y': 16})` Then, the reshape will be a lazy operation too.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray.DataArray.stack load data into memory 627735640

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

6 rows where issue = 627735640 sorted by updated_at descending

1) Still loading memory

2) Stacking is breaking the chunks

3) Rechunking load the memory

Example

Advanced export