html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2501#issuecomment-768470505,https://api.github.com/repos/pydata/xarray/issues/2501,768470505,MDEyOklzc3VlQ29tbWVudDc2ODQ3MDUwNQ==,2448579,2021-01-27T18:06:16Z,2021-01-27T18:06:16Z,MEMBER,I think this is stale now. See https://xarray.pydata.org/en/stable/io.html#reading-multi-file-datasets for latest guidance on reading such datasets. Please open a new issue if you are still having trouble with `open_mfdataset`,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-512663861,https://api.github.com/repos/pydata/xarray/issues/2501,512663861,MDEyOklzc3VlQ29tbWVudDUxMjY2Mzg2MQ==,7799184,2019-07-18T04:51:06Z,2019-07-18T04:52:17Z,CONTRIBUTOR,"Hi guys, I'm having some issue that looks similar to @rsignell-usgs. Trying to open 413 netcdf files using `open_mfdataset` with `parallel=True`. The dataset (successfully opened with `parallel=False`) has ~300G on disk and looks like:

```ipython
In [1] import xarray as xr

In [2]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=False)

In [3]: dset
Out[3]:
<xarray.Dataset>
Dimensions:    (latitude: 190, longitude: 289, time: 302092)
Coordinates:
  * longitude  (longitude) float32 70.0 70.4 70.8 71.2 ... 184.4 184.8 185.2
  * latitude   (latitude) float32 -55.6 -55.2 -54.8 -54.4 ... 19.2 19.6 20.0
  * time       (time) datetime64[ns] 1979-01-01 ... 2013-05-31T23:00:00.000013440
Data variables:
    hs         (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    fp         (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    dp         (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    wl         (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    U10        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    V10        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    hs1        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    hs2        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    tp1        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    tp2        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    lp0        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    lp1        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    lp2        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    th0        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    th1        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    th2        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    hs0        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
    tp0        (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)>
```

Trying to read it on a standard python session gives me core dumped:

```ipython
In [1]: import xarray as xr

In [2]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True)
Bus error (core dumped)
```

Trying to read it on a dask cluster I get:

```ipython
In [1]: from dask.distributed import Client

In [2]: import xarray as xr

In [3]: client = Client()

In [4]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitud
   ...: e': 100}, parallel=True)
free(): double free detected in tcache 2free(): double free detected in tcache 2

free(): double free detected in tcache 2
distributed.nanny - WARNING - Worker process 18744 was killed by signal 11
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 18740 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 18742 was killed by signal 7
distributed.nanny - WARNING - Worker process 18738 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
free(): double free detected in tcache 2munmap_chunk(): invalid pointer

free(): double free detected in tcache 2
free(): double free detected in tcache 2
distributed.nanny - WARNING - Worker process 19082 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 19073 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
---------------------------------------------------------------------------
KilledWorker                              Traceback (most recent call last)
<ipython-input-4-740561b80fec> in <module>()
----> 1 dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True)

/usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, **kwargs)
    772         # calling compute here will return the datasets/file_objs lists,
    773         # the underlying datasets will still be stored as dask arrays
--> 774         datasets, file_objs = dask.compute(datasets, file_objs)
    775 
    776     # Combine all datasets, closing them in case of a ValueError

/usr/local/lib/python3.7/dist-packages/dask/base.py in compute(*args, **kwargs)
    444     keys = [x.__dask_keys__() for x in collections]
    445     postcomputes = [x.__dask_postcompute__() for x in collections]
--> 446     results = schedule(dsk, keys, **kwargs)
    447     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    448 

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2525                     should_rejoin = False
   2526             try:
-> 2527                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   2528             finally:
   2529                 for f in futures.values():

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
   1821                 direct=direct,
   1822                 local_worker=local_worker,
-> 1823                 asynchronous=asynchronous,
   1824             )
   1825 

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    761         else:
    762             return sync(
--> 763                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    764             )
    765 

/home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    330             e.wait(10)
    331     if error[0]:
--> 332         six.reraise(*error[0])
    333     else:
    334         return result[0]

/usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

/home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in f()
    315             if callback_timeout is not None:
    316                 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 317             result[0] = yield future
    318         except Exception as exc:
    319             error[0] = sys.exc_info()

/home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self)
    733 
    734                     try:
--> 735                         value = future.result()
    736                     except Exception:
    737                         exc_info = sys.exc_info()

/home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self)
    740                     if exc_info is not None:
    741                         try:
--> 742                             yielded = self.gen.throw(*exc_info)  # type: ignore
    743                         finally:
    744                             # Break up a reference to itself

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1678                             exc = CancelledError(key)
   1679                         else:
-> 1680                             six.reraise(type(exception), exception, traceback)
   1681                         raise exc
   1682                     if errors == ""skip"":

/usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

KilledWorker: ('open_dataset-e7916acb-6d9f-4532-ab76-5b9c1b1a39c2', <Worker 'tcp://10.240.0.5:36019', memory: 0, processing: 63>)
```

Is there anything obviously wrong I'm trying here please?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-510144707,https://api.github.com/repos/pydata/xarray/issues/2501,510144707,MDEyOklzc3VlQ29tbWVudDUxMDE0NDcwNw==,1872600,2019-07-10T16:59:12Z,2019-07-11T11:47:02Z,NONE,"@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the `feature_id` coordinate to prevent `open_mfdataset` from trying to harmonize that coordinate from all the chunks. 

So if I use this code, the `open_mfdataset` command finishes:
```python
def drop_coords(ds):
    ds = ds.drop(['reference_time','feature_id'])
    return ds.reset_coords(drop=True)
```
and I can then add back in the dropped coordinate values at the end:
```python
dsets = [xr.open_dataset(f) for f in files[:3]]
ds.coords['feature_id'] = dsets[0].coords['feature_id']
```

I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-510217080,https://api.github.com/repos/pydata/xarray/issues/2501,510217080,MDEyOklzc3VlQ29tbWVudDUxMDIxNzA4MA==,1312546,2019-07-10T20:30:41Z,2019-07-10T20:30:41Z,MEMBER,"Yep, that’s my suspicion as well. I’m still plugging away at it. Currently the pausing logic isn’t quite working well. 

> On Jul 10, 2019, at 12:10, Ryan Abernathey <notifications@github.com> wrote:
> 
> I believe that the memory issue is basically the same as dask/distributed#2602.
> 
> The graphs look like: read --> rechunk --> write.
> 
> Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory.
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-510169853,https://api.github.com/repos/pydata/xarray/issues/2501,510169853,MDEyOklzc3VlQ29tbWVudDUxMDE2OTg1Mw==,1197350,2019-07-10T18:10:37Z,2019-07-10T18:10:37Z,MEMBER,"I believe that the memory issue is basically the same as https://github.com/dask/distributed/issues/2602.

The graphs look like: `read --> rechunk --> write`.

Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory.

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-510167911,https://api.github.com/repos/pydata/xarray/issues/2501,510167911,MDEyOklzc3VlQ29tbWVudDUxMDE2NzkxMQ==,1312546,2019-07-10T18:05:07Z,2019-07-10T18:05:07Z,MEMBER,"Great, thanks. I’ll look into the memory issue when writing. We may already have an issue for it. 

> On Jul 10, 2019, at 10:59, Rich Signell <notifications@github.com> wrote:
> 
> @TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the feature_id coordinate to prevent open_mfdataset from trying to harmonize that coordinate from all the chunks.
> 
> So if I use this code, the open_mdfdataset command finishes:
> 
> def drop_coords(ds):
>     ds = ds.drop(['reference_time','feature_id'])
>     return ds.reset_coords(drop=True)
> and I can then add back in the dropped coordinate values at the end:
> 
> dsets = [xr.open_dataset(f) for f in files[:3]]
> ds.coords['feature_id'] = dsets[0].coords['feature_id']
> I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?
> 
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or mute the thread.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509379294,https://api.github.com/repos/pydata/xarray/issues/2501,509379294,MDEyOklzc3VlQ29tbWVudDUwOTM3OTI5NA==,1872600,2019-07-08T20:28:48Z,2019-07-08T20:29:20Z,NONE,"@TomAugspurger , I thought @rabernat's suggestion of implementing
```python
def drop_coords(ds):
    return ds.reset_coords(drop=True)
```
would avoid this checking.  Did I understand or implement this incorrectly?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509346055,https://api.github.com/repos/pydata/xarray/issues/2501,509346055,MDEyOklzc3VlQ29tbWVudDUwOTM0NjA1NQ==,1312546,2019-07-08T18:46:58Z,2019-07-08T18:46:58Z,MEMBER,"@rsignell-usgs very helpful, thanks. I'd noticed that there was a pause after the open_dataset tasks finish, indicating that either the scheduler or (more likely) the client was doing work rather than the cluster. Most likely @rabernat's guess

> In open_mfdataset, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset.

is correct. Verifying all that now, and looking into if / how that can be done on the workers.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509341467,https://api.github.com/repos/pydata/xarray/issues/2501,509341467,MDEyOklzc3VlQ29tbWVudDUwOTM0MTQ2Nw==,1872600,2019-07-08T18:34:02Z,2019-07-08T18:34:02Z,NONE,"@rabernat , to answer your question, if I open just two files:
```
ds = xr.open_mfdataset(files[:2], preprocess=drop_coords, autoclose=True, parallel=True)
```
the resulting dataset is:
```
<xarray.Dataset>
Dimensions:         (feature_id: 2729077, reference_time: 1, time: 2)
Coordinates:
  * reference_time  (reference_time) datetime64[ns] 2009-01-01
  * feature_id      (feature_id) int32 101 179 181 ... 1180001803 1180001804
  * time            (time) datetime64[ns] 2009-01-01 2009-01-01T01:00:00
Data variables:
    streamflow      (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    q_lateral       (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    velocity        (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    qSfcLatRunoff   (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    qBucket         (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    qBtmVertRunoff  (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
Attributes:
    featureType:                timeSeries
    proj4:                      +proj=longlat +datum=NAD83 +no_defs
    model_initialization_time:  2009-01-01_00:00:00
    station_dimension:          feature_id
    model_output_valid_time:    2009-01-01_00:00:00
    stream_order_output:        1
    cdm_datatype:               Station
    esri_pe_string:             GEOGCS[GCS_North_American_1983,DATUM[D_North_...
    Conventions:                CF-1.6
    model_version:              NWM 1.2
    dev_OVRTSWCRT:              1
    dev_NOAH_TIMESTEP:          3600
    dev_channel_only:           0
    dev_channelBucket_only:     0
    dev:                        dev_ prefix indicates development/internal me...
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509340139,https://api.github.com/repos/pydata/xarray/issues/2501,509340139,MDEyOklzc3VlQ29tbWVudDUwOTM0MDEzOQ==,1872600,2019-07-08T18:30:18Z,2019-07-08T18:30:18Z,NONE,"@TomAugspurger, okay, I just ran the above code again and here's what happens:

The `open_mfdataset` proceeds nicely on my 8 workers with 40 cores, eventually completing the 8760 `open_dataset` tasks in about 10 minutes.   One interesting thing is that the number of tasks keep dropping as time goes on.  Not sure why that would be:
![2019-07-08_13-40-09](https://user-images.githubusercontent.com/1872600/60832559-2d5ae080-a18a-11e9-9b0d-e7e39196412d.png)
![2019-07-08_13-42-21](https://user-images.githubusercontent.com/1872600/60832572-3481ee80-a18a-11e9-8bba-e9ee783894da.png)
![2019-07-08_13-43-15](https://user-images.githubusercontent.com/1872600/60832578-377cdf00-a18a-11e9-9b89-0d80353a62c9.png)
![2019-07-08_13-43-58](https://user-images.githubusercontent.com/1872600/60832589-3cda2980-a18a-11e9-989c-0a95754e9e46.png)
![2019-07-08_13-49-57](https://user-images.githubusercontent.com/1872600/60832613-4d8a9f80-a18a-11e9-8c54-7029a3cfd08c.png)
The memory usage on the workers seems okay during this process:
![2019-07-08_13-38-52](https://user-images.githubusercontent.com/1872600/60832649-66935080-a18a-11e9-8075-dc2fca79f830.png)

Then, despite the tasks showing on the dashboard being completed, the `open_mfdataset` command does not complete, but nothing has died, and I'm not sure what's happening.   I check `top` and get this:
![2019-07-08_13-51-13](https://user-images.githubusercontent.com/1872600/60832847-eb7e6a00-a18a-11e9-84cc-18e8796fede9.png)
  
then after about 10 more minutes, I get these warnings:
![2019-07-08_13-56-19](https://user-images.githubusercontent.com/1872600/60832800-c853ba80-a18a-11e9-839a-487fd1276460.png)

and then the errors:
```python-traceback
distributed.client - WARNING - Couldn't gather 17520 keys, rescheduling {'getattr-fd038834-befa-4a9b-b78f-51f9aa2b28e5': ('tcp://127.0.0.1:45640',), 'drop_coords-39be9e52-59de-4e1f-b6d8-27e7d931b5af': ('tcp://127.0.0.1:55881',), 'drop_coords-8bd07037-9ca4-4f97-83fb-8b02d7ad0333': ('tcp://127.0.0.1:56164',), 'drop_coords-ca3dd72b-e5af-4099-b593-89dc97717718': ('tcp://127.0.0.1:59961',), 'getattr-c0af8992-e928-4d42-9e64-340303143454': ('tcp://127.0.0.1:42989',), 'drop_coords-8cdfe5fb-7a29-4606-8692-efa747be5bc1': ('tcp://127.0.0.1:35445',), 'getattr-03669206-0d26-46a1-988d-690fe830e52f': 
...
```
Full error listing here:
https://gist.github.com/rsignell-usgs/3b7101966b8c6d05f48a0e01695f35d6

Does this help?    I'd be happy to screenshare if that would be useful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509307081,https://api.github.com/repos/pydata/xarray/issues/2501,509307081,MDEyOklzc3VlQ29tbWVudDUwOTMwNzA4MQ==,1312546,2019-07-08T16:57:15Z,2019-07-08T16:57:15Z,MEMBER,"I'm looking into it today. Can you clarify

> The memory use kept growing until the process died.

by ""process"" do you mean a dask worker process, or just the main python process executing the `ds = xr.open_mfdataset(...)` code?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509282831,https://api.github.com/repos/pydata/xarray/issues/2501,509282831,MDEyOklzc3VlQ29tbWVudDUwOTI4MjgzMQ==,1872600,2019-07-08T15:51:23Z,2019-07-08T15:51:23Z,NONE,"@TomAugspurger, I'm back from vacation now and ready to attack this again.   Any updates on your end?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-506497180,https://api.github.com/repos/pydata/xarray/issues/2501,506497180,MDEyOklzc3VlQ29tbWVudDUwNjQ5NzE4MA==,1312546,2019-06-27T20:24:26Z,2019-06-27T20:24:26Z,MEMBER,"> The datasets in our cloud datastore are designed explicitly to avoid this problem!


Good to know!

FYI, https://github.com/pydata/xarray/issues/2501#issuecomment-506478508 was user error (I can access it, but need to specify the us-east-1 region). Taking a look now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-506482057,https://api.github.com/repos/pydata/xarray/issues/2501,506482057,MDEyOklzc3VlQ29tbWVudDUwNjQ4MjA1Nw==,1197350,2019-06-27T19:36:51Z,2019-06-27T19:36:51Z,MEMBER,"@rsignell-usgs 

Can you post the xarray repr of two sample files post pre-processing function?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-506481845,https://api.github.com/repos/pydata/xarray/issues/2501,506481845,MDEyOklzc3VlQ29tbWVudDUwNjQ4MTg0NQ==,1197350,2019-06-27T19:36:11Z,2019-06-27T19:36:11Z,MEMBER,"> Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior?

The datasets in our cloud datastore are designed explicitly to avoid this problem!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-506478508,https://api.github.com/repos/pydata/xarray/issues/2501,506478508,MDEyOklzc3VlQ29tbWVudDUwNjQ3ODUwOA==,1312546,2019-06-27T19:25:05Z,2019-06-27T19:25:05Z,MEMBER,"Thanks, will take a look this afternoon. Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior? I may not have access to the bucket (or I'm misusing `rclone`)

```
2019/06/27 14:23:50 NOTICE: Config file ""/Users/taugspurger/.config/rclone/rclone.conf"" not found - using defaults
2019/06/27 14:23:50 Failed to create file system for ""aws-east:nwm-archive/2009"": didn't find section in config file
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-506475819,https://api.github.com/repos/pydata/xarray/issues/2501,506475819,MDEyOklzc3VlQ29tbWVudDUwNjQ3NTgxOQ==,1872600,2019-06-27T19:16:28Z,2019-06-27T19:24:31Z,NONE,"I tried this, and either I didn't apply it right, or it didn't work.   The memory use kept growing until the process died.   My code to process the 8760 netcdf files with `open_mfdataset` looks like this:

```python
import xarray as xr
from dask.distributed import Client, progress, LocalCluster

cluster = LocalCluster()
client = Client(cluster)

import pandas as pd

dates = pd.date_range(start='2009-01-01 00:00',end='2009-12-31 23:00', freq='1h')
files = ['./nc/{}/{}.CHRTOUT_DOMAIN1.comp'.format(date.strftime('%Y'),date.strftime('%Y%m%d%H%M')) for date in dates]

def drop_coords(ds):
    return ds.reset_coords(drop=True)

ds = xr.open_mfdataset(files, preprocess=drop_coords, autoclose=True, parallel=True)
ds1 = ds.chunk(chunks={'time':168, 'feature_id':209929})

import numcodecs
numcodecs.blosc.use_threads = False
ds1.to_zarr('zarr/2009', mode='w', consolidated=True)
```

I transfered the netcdf files from AWS S3 to my local disk to run this, using this command:

```
rclone sync --include '*.CHRTOUT_DOMAIN1.comp' aws-east:nwm-archive/2009 . --checksum --fast-list --transfers 16
```
@TomAugspurger, if you could take a look, that would be great, and if you have any ideas of how to make this example simpler/more easily reproducible, please let me know.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-503641038,https://api.github.com/repos/pydata/xarray/issues/2501,503641038,MDEyOklzc3VlQ29tbWVudDUwMzY0MTAzOA==,1197350,2019-06-19T16:48:29Z,2019-06-19T16:48:29Z,MEMBER,"Try writing a preprocessor function that drops all coordinates
```python
def drop_coords(ds):
    return ds.reset_coords(drop=True)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-497381301,https://api.github.com/repos/pydata/xarray/issues/2501,497381301,MDEyOklzc3VlQ29tbWVudDQ5NzM4MTMwMQ==,1872600,2019-05-30T15:55:56Z,2019-05-30T15:58:48Z,NONE,"I'm hitting some memory issues with using `open_mfdataset` with a cluster also.   

Specifically, I'm trying to open 8760 NetCDF files with an 8 node, 40 cpu LocalCluster.   

When I issue: 
```
ds = xr.open_mfdataset(files, parallel=True)
```
all looks good on the Dask dashboard:
![2019-05-30_9-55-05](https://user-images.githubusercontent.com/1872600/58641001-51442000-82c8-11e9-81e0-9580ec2271b1.png)
![2019-05-30_9-54-49](https://user-images.githubusercontent.com/1872600/58641007-530de380-82c8-11e9-9c1f-46e5fca187da.png)
and the tasks complete with no errors in about 4 minutes.  

Then 4 more minutes go by before I get a bunch of errors like:
```
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker process 26054 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
```
and my cell doesn't complete.   

Any suggestions?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-432546977,https://api.github.com/repos/pydata/xarray/issues/2501,432546977,MDEyOklzc3VlQ29tbWVudDQzMjU0Njk3Nw==,1492047,2018-10-24T07:38:31Z,2018-10-24T07:38:31Z,CONTRIBUTOR,"Thank you for looking into this.

I just want to point out that I'm not that much concerned with the ""slow performance"" but much more with the memory consumption and the limitation it implies.

```python
from glob import glob
import xarray as xr

all_files = glob('...*TP110*.nc')
display(xr.open_dataset(all_files[0]))
display(xr.open_dataset(all_files[1]))
```

```
<xarray.Dataset>
Dimensions:                            (meas_ind: 40, time: 2871, wvf_ind: 128)
Coordinates:
  * time                               (time) datetime64[ns] 2017-06-19T14:24:20.792036992 ... 2017-06-19T15:14:38.491743104
  * meas_ind                           (meas_ind) int8 0 1 2 3 4 ... 36 37 38 39
  * wvf_ind                            (wvf_ind) int8 0 1 2 3 ... 125 126 127
    lat                                (time) float64 ...
    lon                                (time) float64 ...
    lon_40hz                           (time, meas_ind) float64 ...
    lat_40hz                           (time, meas_ind) float64 ...
Data variables:
    time_40hz                          (time, meas_ind) datetime64[ns] ...
    surface_type                       (time) float32 ...
    rad_surf_type                      (time) float32 ...
    qual_alt_1hz_range                 (time) float32 ...
    qual_alt_1hz_swh                   (time) float32 ...
    qual_alt_1hz_sig0                  (time) float32 ...
    qual_alt_1hz_off_nadir_angle_wf    (time) float32 ...
    qual_inst_corr_1hz_range           (time) float32 ...
    qual_inst_corr_1hz_swh             (time) float32 ...
    qual_inst_corr_1hz_sig0            (time) float32 ...
    qual_rad_1hz_tb_k                  (time) float32 ...
    qual_rad_1hz_tb_ka                 (time) float32 ...
    alt_state_flag_acq_mode_40hz       (time, meas_ind) float32 ...
    alt_state_flag_tracking_mode_40hz  (time, meas_ind) float32 ...
    orb_state_flag_diode               (time) float32 ...
    orb_state_flag_rest                (time) float32 ...
    ecmwf_meteo_map_avail              (time) float32 ...
    trailing_edge_variation_flag       (time) float32 ...
    trailing_edge_variation_flag_40hz  (time, meas_ind) float32 ...
    ice_flag                           (time) float32 ...
    interp_flag_mean_sea_surface       (time) float32 ...
    interp_flag_mdt                    (time) float32 ...
    interp_flag_ocean_tide_sol1        (time) float32 ...
    interp_flag_ocean_tide_sol2        (time) float32 ...
    interp_flag_meteo                  (time) float32 ...
    alt                                (time) float64 ...
    alt_40hz                           (time, meas_ind) float64 ...
    orb_alt_rate                       (time) float32 ...
    range                              (time) float64 ...
    range_40hz                         (time, meas_ind) float64 ...
    range_used_40hz                    (time, meas_ind) float32 ...
    range_rms                          (time) float32 ...
    range_numval                       (time) float32 ...
    number_of_iterations               (time, meas_ind) float32 ...
    net_instr_corr_range               (time) float64 ...
    model_dry_tropo_corr               (time) float32 ...
    model_wet_tropo_corr               (time) float32 ...
    rad_wet_tropo_corr                 (time) float32 ...
    iono_corr_gim                      (time) float32 ...
    sea_state_bias                     (time) float32 ...
    swh                                (time) float32 ...
    swh_40hz                           (time, meas_ind) float32 ...
    swh_used_40hz                      (time, meas_ind) float32 ...
    swh_rms                            (time) float32 ...
    swh_numval                         (time) float32 ...
    net_instr_corr_swh                 (time) float32 ...
    sig0                               (time) float32 ...
    sig0_40hz                          (time, meas_ind) float32 ...
    sig0_used_40hz                     (time, meas_ind) float32 ...
    sig0_rms                           (time) float32 ...
    sig0_numval                        (time) float32 ...
    agc                                (time) float32 ...
    agc_rms                            (time) float32 ...
    agc_numval                         (time) float32 ...
    net_instr_corr_sig0                (time) float32 ...
    atmos_corr_sig0                    (time) float32 ...
    off_nadir_angle_wf                 (time) float32 ...
    off_nadir_angle_wf_40hz            (time, meas_ind) float32 ...
    tb_k                               (time) float32 ...
    tb_ka                              (time) float32 ...
    mean_sea_surface                   (time) float64 ...
    mean_topography                    (time) float64 ...
    geoid                              (time) float64 ...
    bathymetry                         (time) float64 ...
    inv_bar_corr                       (time) float32 ...
    hf_fluctuations_corr               (time) float32 ...
    ocean_tide_sol1                    (time) float64 ...
    ocean_tide_sol2                    (time) float64 ...
    ocean_tide_equil                   (time) float32 ...
    ocean_tide_non_equil               (time) float32 ...
    load_tide_sol1                     (time) float32 ...
    load_tide_sol2                     (time) float32 ...
    solid_earth_tide                   (time) float32 ...
    pole_tide                          (time) float32 ...
    wind_speed_model_u                 (time) float32 ...
    wind_speed_model_v                 (time) float32 ...
    wind_speed_alt                     (time) float32 ...
    rad_water_vapor                    (time) float32 ...
    rad_liquid_water                   (time) float32 ...
    ice1_range_40hz                    (time, meas_ind) float64 ...
    ice1_sig0_40hz                     (time, meas_ind) float32 ...
    ice1_qual_flag_40hz                (time, meas_ind) float32 ...
    seaice_range_40hz                  (time, meas_ind) float64 ...
    seaice_sig0_40hz                   (time, meas_ind) float32 ...
    seaice_qual_flag_40hz              (time, meas_ind) float32 ...
    ice2_range_40hz                    (time, meas_ind) float64 ...
    ice2_le_sig0_40hz                  (time, meas_ind) float32 ...
    ice2_sig0_40hz                     (time, meas_ind) float32 ...
    ice2_sigmal_40hz                   (time, meas_ind) float32 ...
    ice2_slope1_40hz                   (time, meas_ind) float64 ...
    ice2_slope2_40hz                   (time, meas_ind) float64 ...
    ice2_mqe_40hz                      (time, meas_ind) float32 ...
    ice2_qual_flag_40hz                (time, meas_ind) float32 ...
    mqe_40hz                           (time, meas_ind) float32 ...
    peakiness_40hz                     (time, meas_ind) float32 ...
    ssha                               (time) float32 ...
    tracker_40hz                       (time, meas_ind) float64 ...
    tracker_used_40hz                  (time, meas_ind) float32 ...
    tracker_diode_40hz                 (time, meas_ind) float64 ...
    pri_counter_40hz                   (time, meas_ind) float64 ...
    qual_alt_1hz_off_nadir_angle_pf    (time) float32 ...
    off_nadir_angle_pf                 (time) float32 ...
    off_nadir_angle_rain_40hz          (time, meas_ind) float32 ...
    uso_corr                           (time) float64 ...
    internal_path_delay_corr           (time) float64 ...
    modeled_instr_corr_range           (time) float32 ...
    doppler_corr                       (time) float32 ...
    cog_corr                           (time) float32 ...
    modeled_instr_corr_swh             (time) float32 ...
    internal_corr_sig0                 (time) float32 ...
    modeled_instr_corr_sig0            (time) float32 ...
    agc_40hz                           (time, meas_ind) float32 ...
    agc_corr_40hz                      (time, meas_ind) float32 ...
    scaling_factor_40hz                (time, meas_ind) float64 ...
    epoch_40hz                         (time, meas_ind) float64 ...
    width_leading_edge_40hz            (time, meas_ind) float64 ...
    amplitude_40hz                     (time, meas_ind) float64 ...
    thermal_noise_40hz                 (time, meas_ind) float64 ...
    seaice_epoch_40hz                  (time, meas_ind) float64 ...
    seaice_amplitude_40hz              (time, meas_ind) float64 ...
    ice2_epoch_40hz                    (time, meas_ind) float64 ...
    ice2_amplitude_40hz                (time, meas_ind) float64 ...
    ice2_mean_amplitude_40hz           (time, meas_ind) float64 ...
    ice2_thermal_noise_40hz            (time, meas_ind) float64 ...
    ice2_slope_40hz                    (time, meas_ind) float64 ...
    signal_to_noise_ratio              (time) float32 ...
    waveforms_40hz                     (time, meas_ind, wvf_ind) float32 ...
Attributes:
    Conventions:                       CF-1.1
    title:                             GDR - Expertise dataset
    institution:                       CNES
    source:                            radar altimeter
    history:                           2017-07-21 08:25:07 : Creation
    contact:                           CNES aviso@oceanobs.com, EUMETSAT ops@...
    references:                        L1 library=V4.5p1, L2 library=V5.5p2, ...
    processing_center:                 SALP
    reference_document:                SARAL/ALTIKA Products Handbook, SALP-M...
    mission_name:                      SARAL
    altimeter_sensor_name:             ALTIKA
    radiometer_sensor_name:            ALTIKA_RAD
    doris_sensor_name:                 DGXX
    cycle_number:                      110
    absolute_rev_number:               22545
    pass_number:                       1
    absolute_pass_number:              109219
    equator_time:                      2017-06-19 14:49:32.128000
    equator_longitude:                 227.77
    first_meas_time:                   2017-06-19 14:24:20.792037
    last_meas_time:                    2017-06-19 15:14:38.491743
    xref_altimeter_level1:             ALK_ALT_1PaS20170619_154722_20170619_1...
    xref_radiometer_level1:            ALK_RAD_1PaS20170619_154643_20170619_1...
    xref_altimeter_characterisation:   ALK_CHA_AXVCNE20131115_120000_20100101...
    xref_radiometer_characterisation:  ALK_CHR_AXVCNE20110207_180000_20110101...
    xref_altimeter_ltm:                ALK_CAL_AXXCNE20170720_110014_20130102...
    xref_doris_uso:                    SRL_OS1_AXXCNE20170720_083800_20130226...
    xref_orbit_data:                   SRL_VOR_AXVCNE20170720_111700_20170618...
    xref_pf_data:                      SRL_VPF_AXVCNE20170720_111800_20170618...
    xref_pole_location:                SMM_POL_AXXCNE20170721_071500_19870101...
    xref_gim_data:                     SRL_ION_AXPCNE20170620_074756_20170619...
    xref_mog2d_data:                   SMM_MOG_AXVCNE20170709_191501_20170619...
    xref_orf_data:                     SRL_ORF_AXXCNE20170720_083800_20160704...
    xref_meteorological_files:         SMM_APA_AXVCNE20170619_170611_20170619...
    ellipsoid_axis:                    6378136.3
    ellipsoid_flattening:              0.0033528131778969
<xarray.Dataset>
Dimensions:                            (meas_ind: 40, time: 2779, wvf_ind: 128)
Coordinates:
  * time                               (time) datetime64[ns] 2017-06-19T15:14:39.356848 ... 2017-06-19T16:04:56.808873920
  * meas_ind                           (meas_ind) int8 0 1 2 3 4 ... 36 37 38 39
  * wvf_ind                            (wvf_ind) int8 0 1 2 3 ... 125 126 127
    lat                                (time) float64 ...
    lon                                (time) float64 ...
    lon_40hz                           (time, meas_ind) float64 ...
    lat_40hz                           (time, meas_ind) float64 ...
Data variables:
    time_40hz                          (time, meas_ind) datetime64[ns] ...
    surface_type                       (time) float32 ...
    rad_surf_type                      (time) float32 ...
    qual_alt_1hz_range                 (time) float32 ...
    qual_alt_1hz_swh                   (time) float32 ...
    qual_alt_1hz_sig0                  (time) float32 ...
    qual_alt_1hz_off_nadir_angle_wf    (time) float32 ...
    qual_inst_corr_1hz_range           (time) float32 ...
    qual_inst_corr_1hz_swh             (time) float32 ...
    qual_inst_corr_1hz_sig0            (time) float32 ...
    qual_rad_1hz_tb_k                  (time) float32 ...
    qual_rad_1hz_tb_ka                 (time) float32 ...
    alt_state_flag_acq_mode_40hz       (time, meas_ind) float32 ...
    alt_state_flag_tracking_mode_40hz  (time, meas_ind) float32 ...
    orb_state_flag_diode               (time) float32 ...
    orb_state_flag_rest                (time) float32 ...
    ecmwf_meteo_map_avail              (time) float32 ...
    trailing_edge_variation_flag       (time) float32 ...
    trailing_edge_variation_flag_40hz  (time, meas_ind) float32 ...
    ice_flag                           (time) float32 ...
    interp_flag_mean_sea_surface       (time) float32 ...
    interp_flag_mdt                    (time) float32 ...
    interp_flag_ocean_tide_sol1        (time) float32 ...
    interp_flag_ocean_tide_sol2        (time) float32 ...
    interp_flag_meteo                  (time) float32 ...
    alt                                (time) float64 ...
    alt_40hz                           (time, meas_ind) float64 ...
    orb_alt_rate                       (time) float32 ...
    range                              (time) float64 ...
    range_40hz                         (time, meas_ind) float64 ...
    range_used_40hz                    (time, meas_ind) float32 ...
    range_rms                          (time) float32 ...
    range_numval                       (time) float32 ...
    number_of_iterations               (time, meas_ind) float32 ...
    net_instr_corr_range               (time) float64 ...
    model_dry_tropo_corr               (time) float32 ...
    model_wet_tropo_corr               (time) float32 ...
    rad_wet_tropo_corr                 (time) float32 ...
    iono_corr_gim                      (time) float32 ...
    sea_state_bias                     (time) float32 ...
    swh                                (time) float32 ...
    swh_40hz                           (time, meas_ind) float32 ...
    swh_used_40hz                      (time, meas_ind) float32 ...
    swh_rms                            (time) float32 ...
    swh_numval                         (time) float32 ...
    net_instr_corr_swh                 (time) float32 ...
    sig0                               (time) float32 ...
    sig0_40hz                          (time, meas_ind) float32 ...
    sig0_used_40hz                     (time, meas_ind) float32 ...
    sig0_rms                           (time) float32 ...
    sig0_numval                        (time) float32 ...
    agc                                (time) float32 ...
    agc_rms                            (time) float32 ...
    agc_numval                         (time) float32 ...
    net_instr_corr_sig0                (time) float32 ...
    atmos_corr_sig0                    (time) float32 ...
    off_nadir_angle_wf                 (time) float32 ...
    off_nadir_angle_wf_40hz            (time, meas_ind) float32 ...
    tb_k                               (time) float32 ...
    tb_ka                              (time) float32 ...
    mean_sea_surface                   (time) float64 ...
    mean_topography                    (time) float64 ...
    geoid                              (time) float64 ...
    bathymetry                         (time) float64 ...
    inv_bar_corr                       (time) float32 ...
    hf_fluctuations_corr               (time) float32 ...
    ocean_tide_sol1                    (time) float64 ...
    ocean_tide_sol2                    (time) float64 ...
    ocean_tide_equil                   (time) float32 ...
    ocean_tide_non_equil               (time) float32 ...
    load_tide_sol1                     (time) float32 ...
    load_tide_sol2                     (time) float32 ...
    solid_earth_tide                   (time) float32 ...
    pole_tide                          (time) float32 ...
    wind_speed_model_u                 (time) float32 ...
    wind_speed_model_v                 (time) float32 ...
    wind_speed_alt                     (time) float32 ...
    rad_water_vapor                    (time) float32 ...
    rad_liquid_water                   (time) float32 ...
    ice1_range_40hz                    (time, meas_ind) float64 ...
    ice1_sig0_40hz                     (time, meas_ind) float32 ...
    ice1_qual_flag_40hz                (time, meas_ind) float32 ...
    seaice_range_40hz                  (time, meas_ind) float64 ...
    seaice_sig0_40hz                   (time, meas_ind) float32 ...
    seaice_qual_flag_40hz              (time, meas_ind) float32 ...
    ice2_range_40hz                    (time, meas_ind) float64 ...
    ice2_le_sig0_40hz                  (time, meas_ind) float32 ...
    ice2_sig0_40hz                     (time, meas_ind) float32 ...
    ice2_sigmal_40hz                   (time, meas_ind) float32 ...
    ice2_slope1_40hz                   (time, meas_ind) float64 ...
    ice2_slope2_40hz                   (time, meas_ind) float64 ...
    ice2_mqe_40hz                      (time, meas_ind) float32 ...
    ice2_qual_flag_40hz                (time, meas_ind) float32 ...
    mqe_40hz                           (time, meas_ind) float32 ...
    peakiness_40hz                     (time, meas_ind) float32 ...
    ssha                               (time) float32 ...
    tracker_40hz                       (time, meas_ind) float64 ...
    tracker_used_40hz                  (time, meas_ind) float32 ...
    tracker_diode_40hz                 (time, meas_ind) float64 ...
    pri_counter_40hz                   (time, meas_ind) float64 ...
    qual_alt_1hz_off_nadir_angle_pf    (time) float32 ...
    off_nadir_angle_pf                 (time) float32 ...
    off_nadir_angle_rain_40hz          (time, meas_ind) float32 ...
    uso_corr                           (time) float64 ...
    internal_path_delay_corr           (time) float64 ...
    modeled_instr_corr_range           (time) float32 ...
    doppler_corr                       (time) float32 ...
    cog_corr                           (time) float32 ...
    modeled_instr_corr_swh             (time) float32 ...
    internal_corr_sig0                 (time) float32 ...
    modeled_instr_corr_sig0            (time) float32 ...
    agc_40hz                           (time, meas_ind) float32 ...
    agc_corr_40hz                      (time, meas_ind) float32 ...
    scaling_factor_40hz                (time, meas_ind) float64 ...
    epoch_40hz                         (time, meas_ind) float64 ...
    width_leading_edge_40hz            (time, meas_ind) float64 ...
    amplitude_40hz                     (time, meas_ind) float64 ...
    thermal_noise_40hz                 (time, meas_ind) float64 ...
    seaice_epoch_40hz                  (time, meas_ind) float64 ...
    seaice_amplitude_40hz              (time, meas_ind) float64 ...
    ice2_epoch_40hz                    (time, meas_ind) float64 ...
    ice2_amplitude_40hz                (time, meas_ind) float64 ...
    ice2_mean_amplitude_40hz           (time, meas_ind) float64 ...
    ice2_thermal_noise_40hz            (time, meas_ind) float64 ...
    ice2_slope_40hz                    (time, meas_ind) float64 ...
    signal_to_noise_ratio              (time) float32 ...
    waveforms_40hz                     (time, meas_ind, wvf_ind) float32 ...
Attributes:
    Conventions:                       CF-1.1
    title:                             GDR - Expertise dataset
    institution:                       CNES
    source:                            radar altimeter
    history:                           2017-07-21 08:25:19 : Creation
    contact:                           CNES aviso@oceanobs.com, EUMETSAT ops@...
    references:                        L1 library=V4.5p1, L2 library=V5.5p2, ...
    processing_center:                 SALP
    reference_document:                SARAL/ALTIKA Products Handbook, SALP-M...
    mission_name:                      SARAL
    altimeter_sensor_name:             ALTIKA
    radiometer_sensor_name:            ALTIKA_RAD
    doris_sensor_name:                 DGXX
    cycle_number:                      110
    absolute_rev_number:               22546
    pass_number:                       2
    absolute_pass_number:              109220
    equator_time:                      2017-06-19 15:39:46.492000
    equator_longitude:                 35.21
    first_meas_time:                   2017-06-19 15:14:39.356848
    last_meas_time:                    2017-06-19 16:04:56.808874
    xref_altimeter_level1:             ALK_ALT_1PaS20170619_154722_20170619_1...
    xref_radiometer_level1:            ALK_RAD_1PaS20170619_154643_20170619_1...
    xref_altimeter_characterisation:   ALK_CHA_AXVCNE20131115_120000_20100101...
    xref_radiometer_characterisation:  ALK_CHR_AXVCNE20110207_180000_20110101...
    xref_altimeter_ltm:                ALK_CAL_AXXCNE20170720_110014_20130102...
    xref_doris_uso:                    SRL_OS1_AXXCNE20170720_083800_20130226...
    xref_orbit_data:                   SRL_VOR_AXVCNE20170720_111700_20170618...
    xref_pf_data:                      SRL_VPF_AXVCNE20170720_111800_20170618...
    xref_pole_location:                SMM_POL_AXXCNE20170721_071500_19870101...
    xref_gim_data:                     SRL_ION_AXPCNE20170620_074756_20170619...
    xref_mog2d_data:                   SMM_MOG_AXVCNE20170709_191501_20170619...
    xref_orf_data:                     SRL_ORF_AXXCNE20170720_083800_20160704...
    xref_meteorological_files:         SMM_APA_AXVCNE20170619_170611_20170619...
    ellipsoid_axis:                    6378136.3
    ellipsoid_flattening:              0.0033528131778969```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-432342306,https://api.github.com/repos/pydata/xarray/issues/2501,432342306,MDEyOklzc3VlQ29tbWVudDQzMjM0MjMwNg==,1197350,2018-10-23T17:27:50Z,2018-10-23T17:27:50Z,MEMBER,"^ I'm assuming you're in a notebook. If not, call `print` instead of `display`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-432342180,https://api.github.com/repos/pydata/xarray/issues/2501,432342180,MDEyOklzc3VlQ29tbWVudDQzMjM0MjE4MA==,1197350,2018-10-23T17:27:30Z,2018-10-23T17:27:30Z,MEMBER,"In `open_mfdataset`, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset.

To help us help you debug, please provide more information about the files your are opening. Specifically, please call `open_dataset()` directly on the first two files and copy and paste the output here. Specifically, do something like this
```python
from glob import glob
import xarray as xr
all_files = glob('*1002*.nc')
display(xr.open_dataset(all_files[0]))
display(xr.open_dataset(all_files[1]))
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074