id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 291332965,MDU6SXNzdWUyOTEzMzI5NjU=,1854,Drop coordinates on loading large dataset.,1797906,closed,0,,,22,2018-01-24T19:35:46Z,2020-02-15T14:49:53Z,2020-02-15T14:49:53Z,NONE,,,,"I've been struggling for quite a while to load a large dataset so I thought it best ask as I think I'm missing a trick. I've also looked through the issues but, even though there are a fair few questions that seemed promising. I have a number of `*.nc` files with variables across the coordinates `latitude`, `longitude` and `time`. Each file has the data for all the latitude and longitudes of the world and then some period of time - about two months. The goal is to go through that data and get all the history of a single latitude/longitude coordinate - instead of the data for all latitude and longitude for small periods. This is my current few lines of script: ```python ds = xr.open_mfdataset('path/to/ncs/*.nc', chunks={'time': 127}) # 127 is normally the size of the time dimension in each file recs = ds.sel(latitude=10, longitude=10).to_dataframe().to_records() np.savez('location.npz', recs) ``` However, this blows out the memory on my machine on the `open_mfdataset` call when I use the full dataset. I've tried a bunch of different ways of chunking the data (like: 'latitude': 1, 'longitude': 1) but not been able to get past this stage. I was wondering if there's a way to either determine a good chunk size or maybe tell the `open_mfdataset` to only keep values from the lat/lng coordinates I care about (`coords` kwarg looked like it could've been it) . I'm using version `0.10.0` of xarray Would very much appreciate any help.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1854/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 257400162,MDU6SXNzdWUyNTc0MDAxNjI=,1572,Modifying data set resulting in much larger file size,1797906,closed,0,,,7,2017-09-13T14:24:06Z,2017-09-18T08:59:24Z,2017-09-13T17:12:28Z,NONE,,,,"I'm loading a 130MB `nc` file and applying a `where` mask to it to remove a significant amount of the floating points - replacing them with `nan`. However, when I go to save this file it has increased to over 500MB. If I load the original data set and instantly save it the file stays roughly the same size. Here's how I'm applying the mask: ```python import os import xarray as xr fp = 'ERA20c/swh_2010_01_05_05.nc' ds = xr.open_dataset(fp) ds = ds.where(ds.latitude > 50) head, ext = os.path.splitext(fp) xr.open_dataset(fp).to_netcdf('{}-duplicate{}'.format(head, ext)) ds.to_netcdf('{}-masked{}'.format(head, ext)) ``` Is there a way to reduce this file size of the masked dataset? I'd expect it to be roughly the same size or smaller. Thanks.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1572/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 255997962,MDU6SXNzdWUyNTU5OTc5NjI=,1561,exit code 137 when using xarray.open_mfdataset,1797906,closed,0,,,3,2017-09-07T16:31:50Z,2017-09-13T14:16:07Z,2017-09-13T14:16:06Z,NONE,,,,"While using the `xarray.open_mfdataset` I get a `exit code 137 SIGKILL 9` killing my process. I do not get this while using a subset of the data though. I'm also providing a chunks argument. Does anyone know what might be causing this? Could it be the computer is completely running out of memory (RAM + SWAP + HDD)? Unsure what's causing this as I get no stack trace just the `SIGKILL`. Thanks.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1561/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue