github: issue_comments: 14 rows where issue = 944996552 sorted by updated

14 rows where issue = 944996552 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
892159149	https://github.com/pydata/xarray/issues/5604#issuecomment-892159149	https://api.github.com/repos/pydata/xarray/issues/5604	IC_kwDOAMm_X841LUSt	hansukyang 11863789	2021-08-03T20:54:30Z	2021-08-03T20:54:30Z	NONE	I don't know if this is related but recent updates of Dask has very large memory usage (after 2021.03 version) that I'm not sure is getting addressed yet (https://github.com/dask/dask/issues/7583).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
884197067	https://github.com/pydata/xarray/issues/5604#issuecomment-884197067	https://api.github.com/repos/pydata/xarray/issues/5604	IC_kwDOAMm_X840s8bL	areichmuth 25606497	2021-07-21T13:38:22Z	2021-07-21T14:33:53Z	NONE	Hi there, I have a very similar problem and before I open another issue I rather share my example here: Minimal Complete Verifiable Example: This little computation uses >500 MB of memory even if the file reveals only a size of 154MB: ```python with xr.open_dataset(climdata+'tavg_subset.nc', chunks={"latitude": 300, "longitude": 300}) as ds: print(ds) <xarray.Dataset> Dimensions: (latitude: 168, longitude: 664, time: 731) Coordinates: * time (time) datetime64[ns] 1971-01-01 1971-01-02 ... 1972-12-31 * longitude (longitude) float64 20.27 20.3 20.33 20.36 ... 40.92 40.95 40.98 * latitude (latitude) float64 40.23 40.2 40.17 40.14 ... 35.08 35.05 35.02 Data variables: tavg (time, latitude, longitude) float32 dask.array<chunksize=(731, 168, 300), meta=np.ndarray> annualMean = ds.tavg.resample(time="1Y").mean('time', keep_attrs=True) annualMean.to_netcdf("outputMean.nc", format="NETCDF4_CLASSIC", engine="netcdf4") ``` My problem is that the original files are each >120GB in size and I run into out-of-memory error on our HPC (asking for 10 CPUs with 16GB each). I thought xarray processes everything in chunks for not overusing the memory - but something seems really wrong here!?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
881641897	https://github.com/pydata/xarray/issues/5604#issuecomment-881641897	https://api.github.com/repos/pydata/xarray/issues/5604	IC_kwDOAMm_X840jMmp	max-sixty 5635139	2021-07-16T18:36:45Z	2021-07-16T18:36:45Z	MEMBER	The memory usage does seem high. Not having the indexes aligned makes it into an expensive operation, and I would vote to have that fail by default ref (https://github.com/pydata/xarray/discussions/5499#discussioncomment-929765). Can the input files be aligned before attempting to combine the data? Or are you not in control of the input files? To debug the memory, you probably need to do something like use `memory_profiler`, and try for varying numbers of files — unfortunately it's a complex problem and just looking at `htop` gives very course information.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
881131177	https://github.com/pydata/xarray/issues/5604#issuecomment-881131177	https://api.github.com/repos/pydata/xarray/issues/5604	IC_kwDOAMm_X840hP6p	tommy307507 49487505	2021-07-16T02:29:33Z	2021-07-16T02:29:44Z	NONE	Again — where are you seeing this 1000GB or 1000x number? (also have a look at GitHub docs on how to format the code) Sorry I think the 1000x is a confusion on my part on not reading the numbers correctly or poor understanding of how memory units work, but I will explain it again. on the top command, it draws all 100GiB of memory and started to use swap files that it causes the system to automately kill the code. The ubar variable should only draw 5911001249*8 = 648,480,800 bytes of memory, which is only 0.648GiB (Gigabytes), however the top command shows that it uses 92.5Gib Mem and all 16Gib of swap files, the actual drawn memory of the program is about 109 Gib (because that's all that is avaliable before it gets automatically killed) and it is in fact only 168x what's really needed.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
881111321	https://github.com/pydata/xarray/issues/5604#issuecomment-881111321	https://api.github.com/repos/pydata/xarray/issues/5604	IC_kwDOAMm_X840hLEZ	max-sixty 5635139	2021-07-16T01:29:19Z	2021-07-16T01:29:19Z	MEMBER	Again — where are you seeing this 1000GB or 1000x number? (also have a look at GitHub docs on how to format the code)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
881106553	https://github.com/pydata/xarray/issues/5604#issuecomment-881106553	https://api.github.com/repos/pydata/xarray/issues/5604	IC_kwDOAMm_X840hJ55	tommy307507 49487505	2021-07-16T01:13:11Z	2021-07-16T01:13:11Z	NONE	For Ubar it says dask.array<where, shape=(59, 1100, 1249), dtype=float64, chunksize=(59, 1100, 1249), chunktype=numpy.ndarray> But for U it says dask.array<concatenate, shape=(59, 35, 1100, 1249), dtype=float64, chunksize=(1, 1, 1100, 1249), chunktype=numpy.ndarray> Those are very different operations, is that the reason for the 1000Gb consumption?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
881100884	https://github.com/pydata/xarray/issues/5604#issuecomment-881100884	https://api.github.com/repos/pydata/xarray/issues/5604	IC_kwDOAMm_X840hIhU	tommy307507 49487505	2021-07-16T00:55:25Z	2021-07-16T00:57:19Z	NONE	This will likely need much more detail. Though to start: what's the source of the 1000x number? What happens if you pass `compat="identical", coords="minimal"` to `open_mfdataset`? If that fails, the opening operation may be doing some expensive alignment. Trying this gives me "conflicting values for variable 'ubar' on objects to be combined.", actually it makes sense as identical requires values to be the same right?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
880857434	https://github.com/pydata/xarray/issues/5604#issuecomment-880857434	https://api.github.com/repos/pydata/xarray/issues/5604	MDEyOklzc3VlQ29tbWVudDg4MDg1NzQzNA==	tommy307507 49487505	2021-07-15T16:49:33Z	2021-07-15T16:49:33Z	NONE	An example which we can reproduce locally would be the most helpful, if possible! … On Thu, 15 Jul 2021, 12:42 tommy307507, @.***> wrote: I also don't understand how the chunksize of v2d_time is 59 instead of 1 Is v2d_time one of the dimensions being concatenated along by open_mfdataset? Yes, I will try the above tomorrow, and post it back here. I did try to pass concat_dim = ["v2d_time", "v3d_time" ] but that still causes the problem — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5604 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISNPI5PCPY3XH4FSWTEQODTX4FXLANCNFSM5AMYCK2Q . Thanks for your quick reply but I am not at work right now as it's 1am over here I might test the limit of this happening tomorrow, I am trying to merge 59 files right now so might try less files for the lower limit. as passing 20 Gb of files around would be quite hard.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
880854744	https://github.com/pydata/xarray/issues/5604#issuecomment-880854744	https://api.github.com/repos/pydata/xarray/issues/5604	MDEyOklzc3VlQ29tbWVudDg4MDg1NDc0NA==	tommy307507 49487505	2021-07-15T16:45:17Z	2021-07-15T16:45:17Z	NONE	My temporary bypass around this is to do open_dataset on all of the files, storing the u and ubar in two separate lists and saving to file after doing an xr.concat on both of them They can be concatenated just fine and the file is about the expected size of 23Gb. The operation also takes up similar memory.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
880853826	https://github.com/pydata/xarray/issues/5604#issuecomment-880853826	https://api.github.com/repos/pydata/xarray/issues/5604	MDEyOklzc3VlQ29tbWVudDg4MDg1MzgyNg==	TomNicholas 35968931	2021-07-15T16:44:32Z	2021-07-15T16:44:32Z	MEMBER	An example which we can reproduce locally would be the most helpful, if possible! On Thu, 15 Jul 2021, 12:42 tommy307507, @.***> wrote: I also don't understand how the chunksize of v2d_time is 59 instead of 1 Is v2d_time one of the dimensions being concatenated along by open_mfdataset? Yes, I will try the above tomorrow, and post it back here. I did try to pass concat_dim = ["v2d_time", "v3d_time" ] but that still causes the problem — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/5604#issuecomment-880851062, or unsubscribe https://github.com/notifications/unsubscribe-auth/AISNPI5PCPY3XH4FSWTEQODTX4FXLANCNFSM5AMYCK2Q .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
880851062	https://github.com/pydata/xarray/issues/5604#issuecomment-880851062	https://api.github.com/repos/pydata/xarray/issues/5604	MDEyOklzc3VlQ29tbWVudDg4MDg1MTA2Mg==	tommy307507 49487505	2021-07-15T16:42:18Z	2021-07-15T16:42:18Z	NONE	I also don't understand how the chunksize of v2d_time is 59 instead of 1 Is `v2d_time` one of the dimensions being concatenated along by `open_mfdataset`? Yes, I will try the above tomorrow, and post it back here. I did try to pass concat_dim = ["v2d_time", "v3d_time" ] but that still causes the problem	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
880809255	https://github.com/pydata/xarray/issues/5604#issuecomment-880809255	https://api.github.com/repos/pydata/xarray/issues/5604	MDEyOklzc3VlQ29tbWVudDg4MDgwOTI1NQ==	TomNicholas 35968931	2021-07-15T15:51:18Z	2021-07-15T15:51:18Z	MEMBER	I also don't understand how the chunksize of v2d_time is 59 instead of 1 Is `v2d_time` one of the dimensions being concatenated along by `open_mfdataset`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
880500336	https://github.com/pydata/xarray/issues/5604#issuecomment-880500336	https://api.github.com/repos/pydata/xarray/issues/5604	MDEyOklzc3VlQ29tbWVudDg4MDUwMDMzNg==	max-sixty 5635139	2021-07-15T08:24:12Z	2021-07-15T08:24:12Z	MEMBER	This will likely need much more detail. Though to start: what's the source of the 1000x number? What happens if you pass `compat="identical", coords="minimal"` to `open_mfdataset`? If that fails, the opening operation may be doing some expensive alignment.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552
880410159	https://github.com/pydata/xarray/issues/5604#issuecomment-880410159	https://api.github.com/repos/pydata/xarray/issues/5604	MDEyOklzc3VlQ29tbWVudDg4MDQxMDE1OQ==	tommy307507 49487505	2021-07-15T05:36:50Z	2021-07-15T05:36:50Z	NONE	The variable can be combined using xr.concat if I open the individual files using xr.open_dataset and takes only 1.1g memory , I think the issue is somehow inside open_mfdataset, I also don't understand how the chunksize of v2d_time is 59 instead of 1	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Extremely Large Memory usage for a very small variable 944996552

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);