issues: 223231729
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
223231729 | MDU6SXNzdWUyMjMyMzE3Mjk= | 1379 | xr.concat consuming too much resources | 7799184 | open | 0 | 4 | 2017-04-20T23:33:52Z | 2021-07-08T17:42:18Z | CONTRIBUTOR | Hi, I am reading in several (~1000) small ascii files into Dataset objects and trying to concatenate them over one specific dimension but I eventually blow my memory up. The file glob is not huge (~700M, my computer has ~16G) and I can do it fine if I only read in the Datasets appending them to a list without concatenating them (my memory increases by 5% only or so by the time I had read them all). However, when trying to concatenate each file into one single Dataset upon reading over a loop, the processing speeds drastically reduce before I have read 10% of the files or so and my memory usage keeps going up until it eventually blows up before I read and concatenate 30% of these files (the screenshot below was taken before it blew up, the memory usage was under 20% by the start of the processing). I was wondering if this is expected, or if there something that could be improved to make that work more efficiently please. I'm changing my approach now by extracting numpy arrays from the individual Datasets, concatenating these numpy arrays and defining the final Dataset only at the end. Thanks. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1379/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |