issues: 2230680765
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2230680765 | I_kwDOAMm_X86E9Xy9 | 8919 | Using the xarray.Dataset.where() function takes up a lot of memory | 69391863 | closed | 0 | 4 | 2024-04-08T09:15:49Z | 2024-04-09T02:45:09Z | 2024-04-09T02:45:08Z | NONE | What is your issue?My python script was killed because it took up too much memory. After checking, I found that the problem is the ds.where() function. The original netcdf file opened from the hard disk takes up about 10 Mb of storage, but when I mask the data that doesn't match according to the latitude and longitude location, the variable ds takes up a dozen GB of memory. When I deleted this variable using del ds, the memory occupied by the script immediately returned to normal. ``` Open this netcdf file.ds = xr.open_dataset(track) If longitude range is [-180, 180], then convert to [0, 360].if np.any(ds[var_lon] < 0): ds[var_lon] = ds[var_lon] % 360 Extract data by longitude and latitude.ds = ds.where((ds[var_lon] >= region[0]) & (ds[var_lon] <= region[1]) & (ds[var_lat] >= region[2]) & (ds[var_lat] <= region[3])) Select data by range and value of some variables.for key, value in range_select.items(): ds = ds.where((ds[key] >= value[0]) & (ds[key] <= value[1])) for key, value in value_select.items(): ds = ds.where(ds[key].isin(value)) ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8919/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | 13221727 | issue |