html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2501#issuecomment-768470505,https://api.github.com/repos/pydata/xarray/issues/2501,768470505,MDEyOklzc3VlQ29tbWVudDc2ODQ3MDUwNQ==,2448579,2021-01-27T18:06:16Z,2021-01-27T18:06:16Z,MEMBER,I think this is stale now. See https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets for latest guidance on reading such datasets. Please open a new issue if you are still having trouble with `open_mfdataset`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-512663861,https://api.github.com/repos/pydata/xarray/issues/2501,512663861,MDEyOklzc3VlQ29tbWVudDUxMjY2Mzg2MQ==,7799184,2019-07-18T04:51:06Z,2019-07-18T04:52:17Z,CONTRIBUTOR,"Hi guys, I'm having some issue that looks similar to @rsignell-usgs. Trying to open 413 netcdf files using `open_mfdataset` with `parallel=True`. The dataset (successfully opened with `parallel=False`) has ~300G on disk and looks like: ```ipython In [1] import xarray as xr In [2]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=False) In [3]: dset Out[3]: Dimensions: (latitude: 190, longitude: 289, time: 302092) Coordinates: * longitude (longitude) float32 70.0 70.4 70.8 71.2 ... 184.4 184.8 185.2 * latitude (latitude) float32 -55.6 -55.2 -54.8 -54.4 ... 19.2 19.6 20.0 * time (time) datetime64[ns] 1979-01-01 ... 2013-05-31T23:00:00.000013440 Data variables: hs (time, latitude, longitude) float32 dask.array fp (time, latitude, longitude) float32 dask.array dp (time, latitude, longitude) float32 dask.array wl (time, latitude, longitude) float32 dask.array U10 (time, latitude, longitude) float32 dask.array V10 (time, latitude, longitude) float32 dask.array hs1 (time, latitude, longitude) float32 dask.array hs2 (time, latitude, longitude) float32 dask.array tp1 (time, latitude, longitude) float32 dask.array tp2 (time, latitude, longitude) float32 dask.array lp0 (time, latitude, longitude) float32 dask.array lp1 (time, latitude, longitude) float32 dask.array lp2 (time, latitude, longitude) float32 dask.array th0 (time, latitude, longitude) float32 dask.array th1 (time, latitude, longitude) float32 dask.array th2 (time, latitude, longitude) float32 dask.array hs0 (time, latitude, longitude) float32 dask.array tp0 (time, latitude, longitude) float32 dask.array ``` Trying to read it on a standard python session gives me core dumped: ```ipython In [1]: import xarray as xr In [2]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True) Bus error (core dumped) ``` Trying to read it on a dask cluster I get: ```ipython In [1]: from dask.distributed import Client In [2]: import xarray as xr In [3]: client = Client() In [4]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitud ...: e': 100}, parallel=True) free(): double free detected in tcache 2free(): double free detected in tcache 2 free(): double free detected in tcache 2 distributed.nanny - WARNING - Worker process 18744 was killed by signal 11 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Worker process 18740 was killed by signal 6 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Worker process 18742 was killed by signal 7 distributed.nanny - WARNING - Worker process 18738 was killed by signal 6 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker free(): double free detected in tcache 2munmap_chunk(): invalid pointer free(): double free detected in tcache 2 free(): double free detected in tcache 2 distributed.nanny - WARNING - Worker process 19082 was killed by signal 6 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Worker process 19073 was killed by signal 6 distributed.nanny - WARNING - Restarting worker --------------------------------------------------------------------------- KilledWorker Traceback (most recent call last) in () ----> 1 dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True) /usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, **kwargs) 772 # calling compute here will return the datasets/file_objs lists, 773 # the underlying datasets will still be stored as dask arrays --> 774 datasets, file_objs = dask.compute(datasets, file_objs) 775 776 # Combine all datasets, closing them in case of a ValueError /usr/local/lib/python3.7/dist-packages/dask/base.py in compute(*args, **kwargs) 444 keys = [x.__dask_keys__() for x in collections] 445 postcomputes = [x.__dask_postcompute__() for x in collections] --> 446 results = schedule(dsk, keys, **kwargs) 447 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) 448 /home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 2525 should_rejoin = False 2526 try: -> 2527 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 2528 finally: 2529 for f in futures.values(): /home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous) 1821 direct=direct, 1822 local_worker=local_worker, -> 1823 asynchronous=asynchronous, 1824 ) 1825 /home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs) 761 else: 762 return sync( --> 763 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs 764 ) 765 /home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 330 e.wait(10) 331 if error[0]: --> 332 six.reraise(*error[0]) 333 else: 334 return result[0] /usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb) 691 if value.__traceback__ is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None /home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in f() 315 if callback_timeout is not None: 316 future = gen.with_timeout(timedelta(seconds=callback_timeout), future) --> 317 result[0] = yield future 318 except Exception as exc: 319 error[0] = sys.exc_info() /home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self) 733 734 try: --> 735 value = future.result() 736 except Exception: 737 exc_info = sys.exc_info() /home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self) 740 if exc_info is not None: 741 try: --> 742 yielded = self.gen.throw(*exc_info) # type: ignore 743 finally: 744 # Break up a reference to itself /home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1678 exc = CancelledError(key) 1679 else: -> 1680 six.reraise(type(exception), exception, traceback) 1681 raise exc 1682 if errors == ""skip"": /usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb) 691 if value.__traceback__ is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None KilledWorker: ('open_dataset-e7916acb-6d9f-4532-ab76-5b9c1b1a39c2', ) ``` Is there anything obviously wrong I'm trying here please? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-510144707,https://api.github.com/repos/pydata/xarray/issues/2501,510144707,MDEyOklzc3VlQ29tbWVudDUxMDE0NDcwNw==,1872600,2019-07-10T16:59:12Z,2019-07-11T11:47:02Z,NONE,"@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the `feature_id` coordinate to prevent `open_mfdataset` from trying to harmonize that coordinate from all the chunks. So if I use this code, the `open_mfdataset` command finishes: ```python def drop_coords(ds): ds = ds.drop(['reference_time','feature_id']) return ds.reset_coords(drop=True) ``` and I can then add back in the dropped coordinate values at the end: ```python dsets = [xr.open_dataset(f) for f in files[:3]] ds.coords['feature_id'] = dsets[0].coords['feature_id'] ``` I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-510217080,https://api.github.com/repos/pydata/xarray/issues/2501,510217080,MDEyOklzc3VlQ29tbWVudDUxMDIxNzA4MA==,1312546,2019-07-10T20:30:41Z,2019-07-10T20:30:41Z,MEMBER,"Yep, that’s my suspicion as well. I’m still plugging away at it. Currently the pausing logic isn’t quite working well. > On Jul 10, 2019, at 12:10, Ryan Abernathey wrote: > > I believe that the memory issue is basically the same as dask/distributed#2602. > > The graphs look like: read --> rechunk --> write. > > Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-510169853,https://api.github.com/repos/pydata/xarray/issues/2501,510169853,MDEyOklzc3VlQ29tbWVudDUxMDE2OTg1Mw==,1197350,2019-07-10T18:10:37Z,2019-07-10T18:10:37Z,MEMBER,"I believe that the memory issue is basically the same as https://github.com/dask/distributed/issues/2602. The graphs look like: `read --> rechunk --> write`. Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-510167911,https://api.github.com/repos/pydata/xarray/issues/2501,510167911,MDEyOklzc3VlQ29tbWVudDUxMDE2NzkxMQ==,1312546,2019-07-10T18:05:07Z,2019-07-10T18:05:07Z,MEMBER,"Great, thanks. I’ll look into the memory issue when writing. We may already have an issue for it. > On Jul 10, 2019, at 10:59, Rich Signell wrote: > > @TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the feature_id coordinate to prevent open_mfdataset from trying to harmonize that coordinate from all the chunks. > > So if I use this code, the open_mdfdataset command finishes: > > def drop_coords(ds): > ds = ds.drop(['reference_time','feature_id']) > return ds.reset_coords(drop=True) > and I can then add back in the dropped coordinate values at the end: > > dsets = [xr.open_dataset(f) for f in files[:3]] > ds.coords['feature_id'] = dsets[0].coords['feature_id'] > I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509379294,https://api.github.com/repos/pydata/xarray/issues/2501,509379294,MDEyOklzc3VlQ29tbWVudDUwOTM3OTI5NA==,1872600,2019-07-08T20:28:48Z,2019-07-08T20:29:20Z,NONE,"@TomAugspurger , I thought @rabernat's suggestion of implementing ```python def drop_coords(ds): return ds.reset_coords(drop=True) ``` would avoid this checking. Did I understand or implement this incorrectly?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509346055,https://api.github.com/repos/pydata/xarray/issues/2501,509346055,MDEyOklzc3VlQ29tbWVudDUwOTM0NjA1NQ==,1312546,2019-07-08T18:46:58Z,2019-07-08T18:46:58Z,MEMBER,"@rsignell-usgs very helpful, thanks. I'd noticed that there was a pause after the open_dataset tasks finish, indicating that either the scheduler or (more likely) the client was doing work rather than the cluster. Most likely @rabernat's guess > In open_mfdataset, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset. is correct. Verifying all that now, and looking into if / how that can be done on the workers.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509341467,https://api.github.com/repos/pydata/xarray/issues/2501,509341467,MDEyOklzc3VlQ29tbWVudDUwOTM0MTQ2Nw==,1872600,2019-07-08T18:34:02Z,2019-07-08T18:34:02Z,NONE,"@rabernat , to answer your question, if I open just two files: ``` ds = xr.open_mfdataset(files[:2], preprocess=drop_coords, autoclose=True, parallel=True) ``` the resulting dataset is: ``` Dimensions: (feature_id: 2729077, reference_time: 1, time: 2) Coordinates: * reference_time (reference_time) datetime64[ns] 2009-01-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 * time (time) datetime64[ns] 2009-01-01 2009-01-01T01:00:00 Data variables: streamflow (time, feature_id) float64 dask.array q_lateral (time, feature_id) float64 dask.array velocity (time, feature_id) float64 dask.array qSfcLatRunoff (time, feature_id) float64 dask.array qBucket (time, feature_id) float64 dask.array qBtmVertRunoff (time, feature_id) float64 dask.array Attributes: featureType: timeSeries proj4: +proj=longlat +datum=NAD83 +no_defs model_initialization_time: 2009-01-01_00:00:00 station_dimension: feature_id model_output_valid_time: 2009-01-01_00:00:00 stream_order_output: 1 cdm_datatype: Station esri_pe_string: GEOGCS[GCS_North_American_1983,DATUM[D_North_... Conventions: CF-1.6 model_version: NWM 1.2 dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me... ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509340139,https://api.github.com/repos/pydata/xarray/issues/2501,509340139,MDEyOklzc3VlQ29tbWVudDUwOTM0MDEzOQ==,1872600,2019-07-08T18:30:18Z,2019-07-08T18:30:18Z,NONE,"@TomAugspurger, okay, I just ran the above code again and here's what happens: The `open_mfdataset` proceeds nicely on my 8 workers with 40 cores, eventually completing the 8760 `open_dataset` tasks in about 10 minutes. One interesting thing is that the number of tasks keep dropping as time goes on. Not sure why that would be: ![2019-07-08_13-40-09](https://user-images.githubusercontent.com/1872600/60832559-2d5ae080-a18a-11e9-9b0d-e7e39196412d.png) ![2019-07-08_13-42-21](https://user-images.githubusercontent.com/1872600/60832572-3481ee80-a18a-11e9-8bba-e9ee783894da.png) ![2019-07-08_13-43-15](https://user-images.githubusercontent.com/1872600/60832578-377cdf00-a18a-11e9-9b89-0d80353a62c9.png) ![2019-07-08_13-43-58](https://user-images.githubusercontent.com/1872600/60832589-3cda2980-a18a-11e9-989c-0a95754e9e46.png) ![2019-07-08_13-49-57](https://user-images.githubusercontent.com/1872600/60832613-4d8a9f80-a18a-11e9-8c54-7029a3cfd08c.png) The memory usage on the workers seems okay during this process: ![2019-07-08_13-38-52](https://user-images.githubusercontent.com/1872600/60832649-66935080-a18a-11e9-8075-dc2fca79f830.png) Then, despite the tasks showing on the dashboard being completed, the `open_mfdataset` command does not complete, but nothing has died, and I'm not sure what's happening. I check `top` and get this: ![2019-07-08_13-51-13](https://user-images.githubusercontent.com/1872600/60832847-eb7e6a00-a18a-11e9-84cc-18e8796fede9.png) then after about 10 more minutes, I get these warnings: ![2019-07-08_13-56-19](https://user-images.githubusercontent.com/1872600/60832800-c853ba80-a18a-11e9-839a-487fd1276460.png) and then the errors: ```python-traceback distributed.client - WARNING - Couldn't gather 17520 keys, rescheduling {'getattr-fd038834-befa-4a9b-b78f-51f9aa2b28e5': ('tcp://127.0.0.1:45640',), 'drop_coords-39be9e52-59de-4e1f-b6d8-27e7d931b5af': ('tcp://127.0.0.1:55881',), 'drop_coords-8bd07037-9ca4-4f97-83fb-8b02d7ad0333': ('tcp://127.0.0.1:56164',), 'drop_coords-ca3dd72b-e5af-4099-b593-89dc97717718': ('tcp://127.0.0.1:59961',), 'getattr-c0af8992-e928-4d42-9e64-340303143454': ('tcp://127.0.0.1:42989',), 'drop_coords-8cdfe5fb-7a29-4606-8692-efa747be5bc1': ('tcp://127.0.0.1:35445',), 'getattr-03669206-0d26-46a1-988d-690fe830e52f': ... ``` Full error listing here: https://gist.github.com/rsignell-usgs/3b7101966b8c6d05f48a0e01695f35d6 Does this help? I'd be happy to screenshare if that would be useful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509307081,https://api.github.com/repos/pydata/xarray/issues/2501,509307081,MDEyOklzc3VlQ29tbWVudDUwOTMwNzA4MQ==,1312546,2019-07-08T16:57:15Z,2019-07-08T16:57:15Z,MEMBER,"I'm looking into it today. Can you clarify > The memory use kept growing until the process died. by ""process"" do you mean a dask worker process, or just the main python process executing the `ds = xr.open_mfdataset(...)` code?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509282831,https://api.github.com/repos/pydata/xarray/issues/2501,509282831,MDEyOklzc3VlQ29tbWVudDUwOTI4MjgzMQ==,1872600,2019-07-08T15:51:23Z,2019-07-08T15:51:23Z,NONE,"@TomAugspurger, I'm back from vacation now and ready to attack this again. Any updates on your end? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-506497180,https://api.github.com/repos/pydata/xarray/issues/2501,506497180,MDEyOklzc3VlQ29tbWVudDUwNjQ5NzE4MA==,1312546,2019-06-27T20:24:26Z,2019-06-27T20:24:26Z,MEMBER,"> The datasets in our cloud datastore are designed explicitly to avoid this problem! Good to know! FYI, https://github.com/pydata/xarray/issues/2501#issuecomment-506478508 was user error (I can access it, but need to specify the us-east-1 region). Taking a look now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-506482057,https://api.github.com/repos/pydata/xarray/issues/2501,506482057,MDEyOklzc3VlQ29tbWVudDUwNjQ4MjA1Nw==,1197350,2019-06-27T19:36:51Z,2019-06-27T19:36:51Z,MEMBER,"@rsignell-usgs Can you post the xarray repr of two sample files post pre-processing function?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-506481845,https://api.github.com/repos/pydata/xarray/issues/2501,506481845,MDEyOklzc3VlQ29tbWVudDUwNjQ4MTg0NQ==,1197350,2019-06-27T19:36:11Z,2019-06-27T19:36:11Z,MEMBER,"> Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior? The datasets in our cloud datastore are designed explicitly to avoid this problem!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-506478508,https://api.github.com/repos/pydata/xarray/issues/2501,506478508,MDEyOklzc3VlQ29tbWVudDUwNjQ3ODUwOA==,1312546,2019-06-27T19:25:05Z,2019-06-27T19:25:05Z,MEMBER,"Thanks, will take a look this afternoon. Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior? I may not have access to the bucket (or I'm misusing `rclone`) ``` 2019/06/27 14:23:50 NOTICE: Config file ""/Users/taugspurger/.config/rclone/rclone.conf"" not found - using defaults 2019/06/27 14:23:50 Failed to create file system for ""aws-east:nwm-archive/2009"": didn't find section in config file ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-506475819,https://api.github.com/repos/pydata/xarray/issues/2501,506475819,MDEyOklzc3VlQ29tbWVudDUwNjQ3NTgxOQ==,1872600,2019-06-27T19:16:28Z,2019-06-27T19:24:31Z,NONE,"I tried this, and either I didn't apply it right, or it didn't work. The memory use kept growing until the process died. My code to process the 8760 netcdf files with `open_mfdataset` looks like this: ```python import xarray as xr from dask.distributed import Client, progress, LocalCluster cluster = LocalCluster() client = Client(cluster) import pandas as pd dates = pd.date_range(start='2009-01-01 00:00',end='2009-12-31 23:00', freq='1h') files = ['./nc/{}/{}.CHRTOUT_DOMAIN1.comp'.format(date.strftime('%Y'),date.strftime('%Y%m%d%H%M')) for date in dates] def drop_coords(ds): return ds.reset_coords(drop=True) ds = xr.open_mfdataset(files, preprocess=drop_coords, autoclose=True, parallel=True) ds1 = ds.chunk(chunks={'time':168, 'feature_id':209929}) import numcodecs numcodecs.blosc.use_threads = False ds1.to_zarr('zarr/2009', mode='w', consolidated=True) ``` I transfered the netcdf files from AWS S3 to my local disk to run this, using this command: ``` rclone sync --include '*.CHRTOUT_DOMAIN1.comp' aws-east:nwm-archive/2009 . --checksum --fast-list --transfers 16 ``` @TomAugspurger, if you could take a look, that would be great, and if you have any ideas of how to make this example simpler/more easily reproducible, please let me know.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-503641038,https://api.github.com/repos/pydata/xarray/issues/2501,503641038,MDEyOklzc3VlQ29tbWVudDUwMzY0MTAzOA==,1197350,2019-06-19T16:48:29Z,2019-06-19T16:48:29Z,MEMBER,"Try writing a preprocessor function that drops all coordinates ```python def drop_coords(ds): return ds.reset_coords(drop=True) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-497381301,https://api.github.com/repos/pydata/xarray/issues/2501,497381301,MDEyOklzc3VlQ29tbWVudDQ5NzM4MTMwMQ==,1872600,2019-05-30T15:55:56Z,2019-05-30T15:58:48Z,NONE,"I'm hitting some memory issues with using `open_mfdataset` with a cluster also. Specifically, I'm trying to open 8760 NetCDF files with an 8 node, 40 cpu LocalCluster. When I issue: ``` ds = xr.open_mfdataset(files, parallel=True) ``` all looks good on the Dask dashboard: ![2019-05-30_9-55-05](https://user-images.githubusercontent.com/1872600/58641001-51442000-82c8-11e9-81e0-9580ec2271b1.png) ![2019-05-30_9-54-49](https://user-images.githubusercontent.com/1872600/58641007-530de380-82c8-11e9-9c1f-46e5fca187da.png) and the tasks complete with no errors in about 4 minutes. Then 4 more minutes go by before I get a bunch of errors like: ``` distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting distributed.nanny - WARNING - Worker process 26054 was killed by unknown signal distributed.nanny - WARNING - Restarting worker ``` and my cell doesn't complete. Any suggestions?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-432546977,https://api.github.com/repos/pydata/xarray/issues/2501,432546977,MDEyOklzc3VlQ29tbWVudDQzMjU0Njk3Nw==,1492047,2018-10-24T07:38:31Z,2018-10-24T07:38:31Z,CONTRIBUTOR,"Thank you for looking into this. I just want to point out that I'm not that much concerned with the ""slow performance"" but much more with the memory consumption and the limitation it implies. ```python from glob import glob import xarray as xr all_files = glob('...*TP110*.nc') display(xr.open_dataset(all_files[0])) display(xr.open_dataset(all_files[1])) ``` ``` Dimensions: (meas_ind: 40, time: 2871, wvf_ind: 128) Coordinates: * time (time) datetime64[ns] 2017-06-19T14:24:20.792036992 ... 2017-06-19T15:14:38.491743104 * meas_ind (meas_ind) int8 0 1 2 3 4 ... 36 37 38 39 * wvf_ind (wvf_ind) int8 0 1 2 3 ... 125 126 127 lat (time) float64 ... lon (time) float64 ... lon_40hz (time, meas_ind) float64 ... lat_40hz (time, meas_ind) float64 ... Data variables: time_40hz (time, meas_ind) datetime64[ns] ... surface_type (time) float32 ... rad_surf_type (time) float32 ... qual_alt_1hz_range (time) float32 ... qual_alt_1hz_swh (time) float32 ... qual_alt_1hz_sig0 (time) float32 ... qual_alt_1hz_off_nadir_angle_wf (time) float32 ... qual_inst_corr_1hz_range (time) float32 ... qual_inst_corr_1hz_swh (time) float32 ... qual_inst_corr_1hz_sig0 (time) float32 ... qual_rad_1hz_tb_k (time) float32 ... qual_rad_1hz_tb_ka (time) float32 ... alt_state_flag_acq_mode_40hz (time, meas_ind) float32 ... alt_state_flag_tracking_mode_40hz (time, meas_ind) float32 ... orb_state_flag_diode (time) float32 ... orb_state_flag_rest (time) float32 ... ecmwf_meteo_map_avail (time) float32 ... trailing_edge_variation_flag (time) float32 ... trailing_edge_variation_flag_40hz (time, meas_ind) float32 ... ice_flag (time) float32 ... interp_flag_mean_sea_surface (time) float32 ... interp_flag_mdt (time) float32 ... interp_flag_ocean_tide_sol1 (time) float32 ... interp_flag_ocean_tide_sol2 (time) float32 ... interp_flag_meteo (time) float32 ... alt (time) float64 ... alt_40hz (time, meas_ind) float64 ... orb_alt_rate (time) float32 ... range (time) float64 ... range_40hz (time, meas_ind) float64 ... range_used_40hz (time, meas_ind) float32 ... range_rms (time) float32 ... range_numval (time) float32 ... number_of_iterations (time, meas_ind) float32 ... net_instr_corr_range (time) float64 ... model_dry_tropo_corr (time) float32 ... model_wet_tropo_corr (time) float32 ... rad_wet_tropo_corr (time) float32 ... iono_corr_gim (time) float32 ... sea_state_bias (time) float32 ... swh (time) float32 ... swh_40hz (time, meas_ind) float32 ... swh_used_40hz (time, meas_ind) float32 ... swh_rms (time) float32 ... swh_numval (time) float32 ... net_instr_corr_swh (time) float32 ... sig0 (time) float32 ... sig0_40hz (time, meas_ind) float32 ... sig0_used_40hz (time, meas_ind) float32 ... sig0_rms (time) float32 ... sig0_numval (time) float32 ... agc (time) float32 ... agc_rms (time) float32 ... agc_numval (time) float32 ... net_instr_corr_sig0 (time) float32 ... atmos_corr_sig0 (time) float32 ... off_nadir_angle_wf (time) float32 ... off_nadir_angle_wf_40hz (time, meas_ind) float32 ... tb_k (time) float32 ... tb_ka (time) float32 ... mean_sea_surface (time) float64 ... mean_topography (time) float64 ... geoid (time) float64 ... bathymetry (time) float64 ... inv_bar_corr (time) float32 ... hf_fluctuations_corr (time) float32 ... ocean_tide_sol1 (time) float64 ... ocean_tide_sol2 (time) float64 ... ocean_tide_equil (time) float32 ... ocean_tide_non_equil (time) float32 ... load_tide_sol1 (time) float32 ... load_tide_sol2 (time) float32 ... solid_earth_tide (time) float32 ... pole_tide (time) float32 ... wind_speed_model_u (time) float32 ... wind_speed_model_v (time) float32 ... wind_speed_alt (time) float32 ... rad_water_vapor (time) float32 ... rad_liquid_water (time) float32 ... ice1_range_40hz (time, meas_ind) float64 ... ice1_sig0_40hz (time, meas_ind) float32 ... ice1_qual_flag_40hz (time, meas_ind) float32 ... seaice_range_40hz (time, meas_ind) float64 ... seaice_sig0_40hz (time, meas_ind) float32 ... seaice_qual_flag_40hz (time, meas_ind) float32 ... ice2_range_40hz (time, meas_ind) float64 ... ice2_le_sig0_40hz (time, meas_ind) float32 ... ice2_sig0_40hz (time, meas_ind) float32 ... ice2_sigmal_40hz (time, meas_ind) float32 ... ice2_slope1_40hz (time, meas_ind) float64 ... ice2_slope2_40hz (time, meas_ind) float64 ... ice2_mqe_40hz (time, meas_ind) float32 ... ice2_qual_flag_40hz (time, meas_ind) float32 ... mqe_40hz (time, meas_ind) float32 ... peakiness_40hz (time, meas_ind) float32 ... ssha (time) float32 ... tracker_40hz (time, meas_ind) float64 ... tracker_used_40hz (time, meas_ind) float32 ... tracker_diode_40hz (time, meas_ind) float64 ... pri_counter_40hz (time, meas_ind) float64 ... qual_alt_1hz_off_nadir_angle_pf (time) float32 ... off_nadir_angle_pf (time) float32 ... off_nadir_angle_rain_40hz (time, meas_ind) float32 ... uso_corr (time) float64 ... internal_path_delay_corr (time) float64 ... modeled_instr_corr_range (time) float32 ... doppler_corr (time) float32 ... cog_corr (time) float32 ... modeled_instr_corr_swh (time) float32 ... internal_corr_sig0 (time) float32 ... modeled_instr_corr_sig0 (time) float32 ... agc_40hz (time, meas_ind) float32 ... agc_corr_40hz (time, meas_ind) float32 ... scaling_factor_40hz (time, meas_ind) float64 ... epoch_40hz (time, meas_ind) float64 ... width_leading_edge_40hz (time, meas_ind) float64 ... amplitude_40hz (time, meas_ind) float64 ... thermal_noise_40hz (time, meas_ind) float64 ... seaice_epoch_40hz (time, meas_ind) float64 ... seaice_amplitude_40hz (time, meas_ind) float64 ... ice2_epoch_40hz (time, meas_ind) float64 ... ice2_amplitude_40hz (time, meas_ind) float64 ... ice2_mean_amplitude_40hz (time, meas_ind) float64 ... ice2_thermal_noise_40hz (time, meas_ind) float64 ... ice2_slope_40hz (time, meas_ind) float64 ... signal_to_noise_ratio (time) float32 ... waveforms_40hz (time, meas_ind, wvf_ind) float32 ... Attributes: Conventions: CF-1.1 title: GDR - Expertise dataset institution: CNES source: radar altimeter history: 2017-07-21 08:25:07 : Creation contact: CNES aviso@oceanobs.com, EUMETSAT ops@... references: L1 library=V4.5p1, L2 library=V5.5p2, ... processing_center: SALP reference_document: SARAL/ALTIKA Products Handbook, SALP-M... mission_name: SARAL altimeter_sensor_name: ALTIKA radiometer_sensor_name: ALTIKA_RAD doris_sensor_name: DGXX cycle_number: 110 absolute_rev_number: 22545 pass_number: 1 absolute_pass_number: 109219 equator_time: 2017-06-19 14:49:32.128000 equator_longitude: 227.77 first_meas_time: 2017-06-19 14:24:20.792037 last_meas_time: 2017-06-19 15:14:38.491743 xref_altimeter_level1: ALK_ALT_1PaS20170619_154722_20170619_1... xref_radiometer_level1: ALK_RAD_1PaS20170619_154643_20170619_1... xref_altimeter_characterisation: ALK_CHA_AXVCNE20131115_120000_20100101... xref_radiometer_characterisation: ALK_CHR_AXVCNE20110207_180000_20110101... xref_altimeter_ltm: ALK_CAL_AXXCNE20170720_110014_20130102... xref_doris_uso: SRL_OS1_AXXCNE20170720_083800_20130226... xref_orbit_data: SRL_VOR_AXVCNE20170720_111700_20170618... xref_pf_data: SRL_VPF_AXVCNE20170720_111800_20170618... xref_pole_location: SMM_POL_AXXCNE20170721_071500_19870101... xref_gim_data: SRL_ION_AXPCNE20170620_074756_20170619... xref_mog2d_data: SMM_MOG_AXVCNE20170709_191501_20170619... xref_orf_data: SRL_ORF_AXXCNE20170720_083800_20160704... xref_meteorological_files: SMM_APA_AXVCNE20170619_170611_20170619... ellipsoid_axis: 6378136.3 ellipsoid_flattening: 0.0033528131778969 Dimensions: (meas_ind: 40, time: 2779, wvf_ind: 128) Coordinates: * time (time) datetime64[ns] 2017-06-19T15:14:39.356848 ... 2017-06-19T16:04:56.808873920 * meas_ind (meas_ind) int8 0 1 2 3 4 ... 36 37 38 39 * wvf_ind (wvf_ind) int8 0 1 2 3 ... 125 126 127 lat (time) float64 ... lon (time) float64 ... lon_40hz (time, meas_ind) float64 ... lat_40hz (time, meas_ind) float64 ... Data variables: time_40hz (time, meas_ind) datetime64[ns] ... surface_type (time) float32 ... rad_surf_type (time) float32 ... qual_alt_1hz_range (time) float32 ... qual_alt_1hz_swh (time) float32 ... qual_alt_1hz_sig0 (time) float32 ... qual_alt_1hz_off_nadir_angle_wf (time) float32 ... qual_inst_corr_1hz_range (time) float32 ... qual_inst_corr_1hz_swh (time) float32 ... qual_inst_corr_1hz_sig0 (time) float32 ... qual_rad_1hz_tb_k (time) float32 ... qual_rad_1hz_tb_ka (time) float32 ... alt_state_flag_acq_mode_40hz (time, meas_ind) float32 ... alt_state_flag_tracking_mode_40hz (time, meas_ind) float32 ... orb_state_flag_diode (time) float32 ... orb_state_flag_rest (time) float32 ... ecmwf_meteo_map_avail (time) float32 ... trailing_edge_variation_flag (time) float32 ... trailing_edge_variation_flag_40hz (time, meas_ind) float32 ... ice_flag (time) float32 ... interp_flag_mean_sea_surface (time) float32 ... interp_flag_mdt (time) float32 ... interp_flag_ocean_tide_sol1 (time) float32 ... interp_flag_ocean_tide_sol2 (time) float32 ... interp_flag_meteo (time) float32 ... alt (time) float64 ... alt_40hz (time, meas_ind) float64 ... orb_alt_rate (time) float32 ... range (time) float64 ... range_40hz (time, meas_ind) float64 ... range_used_40hz (time, meas_ind) float32 ... range_rms (time) float32 ... range_numval (time) float32 ... number_of_iterations (time, meas_ind) float32 ... net_instr_corr_range (time) float64 ... model_dry_tropo_corr (time) float32 ... model_wet_tropo_corr (time) float32 ... rad_wet_tropo_corr (time) float32 ... iono_corr_gim (time) float32 ... sea_state_bias (time) float32 ... swh (time) float32 ... swh_40hz (time, meas_ind) float32 ... swh_used_40hz (time, meas_ind) float32 ... swh_rms (time) float32 ... swh_numval (time) float32 ... net_instr_corr_swh (time) float32 ... sig0 (time) float32 ... sig0_40hz (time, meas_ind) float32 ... sig0_used_40hz (time, meas_ind) float32 ... sig0_rms (time) float32 ... sig0_numval (time) float32 ... agc (time) float32 ... agc_rms (time) float32 ... agc_numval (time) float32 ... net_instr_corr_sig0 (time) float32 ... atmos_corr_sig0 (time) float32 ... off_nadir_angle_wf (time) float32 ... off_nadir_angle_wf_40hz (time, meas_ind) float32 ... tb_k (time) float32 ... tb_ka (time) float32 ... mean_sea_surface (time) float64 ... mean_topography (time) float64 ... geoid (time) float64 ... bathymetry (time) float64 ... inv_bar_corr (time) float32 ... hf_fluctuations_corr (time) float32 ... ocean_tide_sol1 (time) float64 ... ocean_tide_sol2 (time) float64 ... ocean_tide_equil (time) float32 ... ocean_tide_non_equil (time) float32 ... load_tide_sol1 (time) float32 ... load_tide_sol2 (time) float32 ... solid_earth_tide (time) float32 ... pole_tide (time) float32 ... wind_speed_model_u (time) float32 ... wind_speed_model_v (time) float32 ... wind_speed_alt (time) float32 ... rad_water_vapor (time) float32 ... rad_liquid_water (time) float32 ... ice1_range_40hz (time, meas_ind) float64 ... ice1_sig0_40hz (time, meas_ind) float32 ... ice1_qual_flag_40hz (time, meas_ind) float32 ... seaice_range_40hz (time, meas_ind) float64 ... seaice_sig0_40hz (time, meas_ind) float32 ... seaice_qual_flag_40hz (time, meas_ind) float32 ... ice2_range_40hz (time, meas_ind) float64 ... ice2_le_sig0_40hz (time, meas_ind) float32 ... ice2_sig0_40hz (time, meas_ind) float32 ... ice2_sigmal_40hz (time, meas_ind) float32 ... ice2_slope1_40hz (time, meas_ind) float64 ... ice2_slope2_40hz (time, meas_ind) float64 ... ice2_mqe_40hz (time, meas_ind) float32 ... ice2_qual_flag_40hz (time, meas_ind) float32 ... mqe_40hz (time, meas_ind) float32 ... peakiness_40hz (time, meas_ind) float32 ... ssha (time) float32 ... tracker_40hz (time, meas_ind) float64 ... tracker_used_40hz (time, meas_ind) float32 ... tracker_diode_40hz (time, meas_ind) float64 ... pri_counter_40hz (time, meas_ind) float64 ... qual_alt_1hz_off_nadir_angle_pf (time) float32 ... off_nadir_angle_pf (time) float32 ... off_nadir_angle_rain_40hz (time, meas_ind) float32 ... uso_corr (time) float64 ... internal_path_delay_corr (time) float64 ... modeled_instr_corr_range (time) float32 ... doppler_corr (time) float32 ... cog_corr (time) float32 ... modeled_instr_corr_swh (time) float32 ... internal_corr_sig0 (time) float32 ... modeled_instr_corr_sig0 (time) float32 ... agc_40hz (time, meas_ind) float32 ... agc_corr_40hz (time, meas_ind) float32 ... scaling_factor_40hz (time, meas_ind) float64 ... epoch_40hz (time, meas_ind) float64 ... width_leading_edge_40hz (time, meas_ind) float64 ... amplitude_40hz (time, meas_ind) float64 ... thermal_noise_40hz (time, meas_ind) float64 ... seaice_epoch_40hz (time, meas_ind) float64 ... seaice_amplitude_40hz (time, meas_ind) float64 ... ice2_epoch_40hz (time, meas_ind) float64 ... ice2_amplitude_40hz (time, meas_ind) float64 ... ice2_mean_amplitude_40hz (time, meas_ind) float64 ... ice2_thermal_noise_40hz (time, meas_ind) float64 ... ice2_slope_40hz (time, meas_ind) float64 ... signal_to_noise_ratio (time) float32 ... waveforms_40hz (time, meas_ind, wvf_ind) float32 ... Attributes: Conventions: CF-1.1 title: GDR - Expertise dataset institution: CNES source: radar altimeter history: 2017-07-21 08:25:19 : Creation contact: CNES aviso@oceanobs.com, EUMETSAT ops@... references: L1 library=V4.5p1, L2 library=V5.5p2, ... processing_center: SALP reference_document: SARAL/ALTIKA Products Handbook, SALP-M... mission_name: SARAL altimeter_sensor_name: ALTIKA radiometer_sensor_name: ALTIKA_RAD doris_sensor_name: DGXX cycle_number: 110 absolute_rev_number: 22546 pass_number: 2 absolute_pass_number: 109220 equator_time: 2017-06-19 15:39:46.492000 equator_longitude: 35.21 first_meas_time: 2017-06-19 15:14:39.356848 last_meas_time: 2017-06-19 16:04:56.808874 xref_altimeter_level1: ALK_ALT_1PaS20170619_154722_20170619_1... xref_radiometer_level1: ALK_RAD_1PaS20170619_154643_20170619_1... xref_altimeter_characterisation: ALK_CHA_AXVCNE20131115_120000_20100101... xref_radiometer_characterisation: ALK_CHR_AXVCNE20110207_180000_20110101... xref_altimeter_ltm: ALK_CAL_AXXCNE20170720_110014_20130102... xref_doris_uso: SRL_OS1_AXXCNE20170720_083800_20130226... xref_orbit_data: SRL_VOR_AXVCNE20170720_111700_20170618... xref_pf_data: SRL_VPF_AXVCNE20170720_111800_20170618... xref_pole_location: SMM_POL_AXXCNE20170721_071500_19870101... xref_gim_data: SRL_ION_AXPCNE20170620_074756_20170619... xref_mog2d_data: SMM_MOG_AXVCNE20170709_191501_20170619... xref_orf_data: SRL_ORF_AXXCNE20170720_083800_20160704... xref_meteorological_files: SMM_APA_AXVCNE20170619_170611_20170619... ellipsoid_axis: 6378136.3 ellipsoid_flattening: 0.0033528131778969```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-432342306,https://api.github.com/repos/pydata/xarray/issues/2501,432342306,MDEyOklzc3VlQ29tbWVudDQzMjM0MjMwNg==,1197350,2018-10-23T17:27:50Z,2018-10-23T17:27:50Z,MEMBER,"^ I'm assuming you're in a notebook. If not, call `print` instead of `display`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-432342180,https://api.github.com/repos/pydata/xarray/issues/2501,432342180,MDEyOklzc3VlQ29tbWVudDQzMjM0MjE4MA==,1197350,2018-10-23T17:27:30Z,2018-10-23T17:27:30Z,MEMBER,"In `open_mfdataset`, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset. To help us help you debug, please provide more information about the files your are opening. Specifically, please call `open_dataset()` directly on the first two files and copy and paste the output here. Specifically, do something like this ```python from glob import glob import xarray as xr all_files = glob('*1002*.nc') display(xr.open_dataset(all_files[0])) display(xr.open_dataset(all_files[1])) ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074