html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2233#issuecomment-1078439763,https://api.github.com/repos/pydata/xarray/issues/2233,1078439763,IC_kwDOAMm_X85AR69T,1872600,2022-03-24T22:26:07Z,2023-07-16T15:13:39Z,NONE,"https://github.com/pydata/xarray/issues/2233#issuecomment-397602084
Would the [new xarray index/coordinate internal refactoring](https://twitter.com/xarray_dev/status/1506004653594488833) now allow us to address this issue?

cc @kthyng","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,332471780
https://github.com/pydata/xarray/issues/6318#issuecomment-1056917100,https://api.github.com/repos/pydata/xarray/issues/6318,1056917100,IC_kwDOAMm_X84-_0Zs,1872600,2022-03-02T13:13:24Z,2022-03-02T13:14:40Z,NONE,"While I was typing this, @keewis provided a workaround here: https://github.com/fsspec/kerchunk/issues/130#issuecomment-1056897730 !    Leaving this open until I know whether this is something best left for users to implement or something to be handled in xarray.  #6318 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1157163377
https://github.com/pydata/xarray/pull/4140#issuecomment-985769385,https://api.github.com/repos/pydata/xarray/issues/4140,985769385,IC_kwDOAMm_X846waWp,1872600,2021-12-03T19:22:13Z,2021-12-03T19:22:13Z,NONE,"Thanks @snowman2 !  Done in https://github.com/corteva/rioxarray/issues/440
","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,636451398
https://github.com/pydata/xarray/pull/4140#issuecomment-985530331,https://api.github.com/repos/pydata/xarray/issues/4140,985530331,IC_kwDOAMm_X846vf_b,1872600,2021-12-03T13:41:35Z,2021-12-03T13:43:33Z,NONE,"I'd like to use this cool new rasterio/fspec functionality in xarray!  

I must be doing something wrong here in cell [5]:
https://nbviewer.org/gist/rsignell-usgs/dbf3d8e952895ca255f300790759c60f
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,636451398
https://github.com/pydata/xarray/issues/2697#issuecomment-832761716,https://api.github.com/repos/pydata/xarray/issues/2697,832761716,MDEyOklzc3VlQ29tbWVudDgzMjc2MTcxNg==,1872600,2021-05-05T15:02:55Z,2021-05-05T15:04:59Z,NONE,"It's worth pointing out that you can create [FileReferenceSystem JSON](https://github.com/intake/fsspec-reference-maker#version-1) to accomplish many of the tasks we used to use NcML for:
* create a single virtual dataset that points to a collection of files
* modify dataset and variable attributes

It also has the nice feature that it makes your dataset faster to work with on the cloud because the map to the data is loaded in one shot!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,401874795
https://github.com/pydata/xarray/pull/4461#issuecomment-741889071,https://api.github.com/repos/pydata/xarray/issues/4461,741889071,MDEyOklzc3VlQ29tbWVudDc0MTg4OTA3MQ==,1872600,2020-12-09T16:31:37Z,2021-01-19T14:46:49Z,NONE,"I'm really looking forward to getting this merged so I can open the National Water Model Zarr I created last week thusly:  
```python
ds = xr.open_dataset(s3://noaa-nwm-retro-v2.0-zarr-pds', engine='zarr', 
        backend_kwargs={'consolidated':True, ""storage_options"": {'anon':True}})
```
@martindurant tells me this takes only **3 s** with the new async capability!

That would be pretty awesome, because now it takes **1min 15s** to open this dataset!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,709187212
https://github.com/pydata/xarray/issues/4122#issuecomment-745520766,https://api.github.com/repos/pydata/xarray/issues/4122,745520766,MDEyOklzc3VlQ29tbWVudDc0NTUyMDc2Ng==,1872600,2020-12-15T19:39:16Z,2020-12-15T19:39:16Z,NONE,"I'm closing this the recommended approach for writing NetCDF to object stroage is to write locally, then push. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,631085856
https://github.com/pydata/xarray/pull/4461#issuecomment-741942375,https://api.github.com/repos/pydata/xarray/issues/4461,741942375,MDEyOklzc3VlQ29tbWVudDc0MTk0MjM3NQ==,1872600,2020-12-09T17:50:04Z,2020-12-09T17:50:04Z,NONE,"@rabernat , awesome!     I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,709187212
https://github.com/pydata/xarray/issues/4470#issuecomment-727222443,https://api.github.com/repos/pydata/xarray/issues/4470,727222443,MDEyOklzc3VlQ29tbWVudDcyNzIyMjQ0Mw==,1872600,2020-11-14T15:22:49Z,2020-11-14T15:23:28Z,NONE,"Just a note that the only unstructured grid (triangular mesh) example I have is: 
http://gallery.pangeo.io/repos/rsignell-usgs/esip-gallery/01_hurricane_ike_water_levels.html

I figured out how to make that notebook from the info at:
https://earthsim.holoviz.org/user_guide/Visualizing_Meshes.html

The ""earthsim"" project was developed by the Holoviz team (@jbednar & co) funded by USACE when @dharhas was there.  Would be cool to revive this. 

The Holoviz team and USACE might not have been aware of the [UGRID conventions](http://ugrid-conventions.github.io/ugrid-conventions/) when they developed that code, so currently it's a bit awkward to go from a UGRID-compliant NetCDF dataset to visualization with Holoviz (as you can see from the Hurricane Ike notebook).     That would be low-hanging fruit for any future effort. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,710357592
https://github.com/pydata/xarray/pull/3804#issuecomment-680138664,https://api.github.com/repos/pydata/xarray/issues/3804,680138664,MDEyOklzc3VlQ29tbWVudDY4MDEzODY2NA==,1872600,2020-08-25T16:39:34Z,2020-08-25T17:07:42Z,NONE,"Drumroll.... @dcherian, epic cymbal crash?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572251686
https://github.com/pydata/xarray/issues/4338#issuecomment-673433045,https://api.github.com/repos/pydata/xarray/issues/4338,673433045,MDEyOklzc3VlQ29tbWVudDY3MzQzMzA0NQ==,1872600,2020-08-13T11:54:10Z,2020-08-13T12:04:11Z,NONE,"@nicholaskgeorge your minimal test would be monotonic if `square2` and `square4` had `x` coordinates `[3,4,5]` instead of `[2,3,4]`, but it seems `combine_by_coords` doesn't mind that?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,677773328
https://github.com/pydata/xarray/pull/3804#issuecomment-665163886,https://api.github.com/repos/pydata/xarray/issues/3804,665163886,MDEyOklzc3VlQ29tbWVudDY2NTE2Mzg4Ng==,1872600,2020-07-28T17:10:47Z,2020-07-28T17:11:33Z,NONE,"@dcherian , are we just waiting for one more ""+1"" here, or are the failing checks related to this PR?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572251686
https://github.com/pydata/xarray/issues/4082#issuecomment-642841283,https://api.github.com/repos/pydata/xarray/issues/4082,642841283,MDEyOklzc3VlQ29tbWVudDY0Mjg0MTI4Mw==,1872600,2020-06-11T17:58:30Z,2020-06-11T18:00:28Z,NONE,"@jswhit, do you know if https://github.com/Unidata/netcdf4-python is doing the caching?

Just to catch you up quickly, we have a workflow that opens a bunch of opendap datasets, and while the default `file_cache_maxsize=128` works on Linux, if this exceeds 25 files on windows it fails:
```
xr.set_options(file_cache_maxsize=25)   # works
#xr.set_options(file_cache_maxsize=26)   # fails
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286
https://github.com/pydata/xarray/issues/4082#issuecomment-641236117,https://api.github.com/repos/pydata/xarray/issues/4082,641236117,MDEyOklzc3VlQ29tbWVudDY0MTIzNjExNw==,1872600,2020-06-09T11:42:38Z,2020-06-09T11:42:38Z,NONE,"@DennisHeimbigner , do you not agree that this issue on windows is related to the number of files cached from OPeNDAP requests?   Clearly there are some differences with cache files on windows: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg11190.html ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286
https://github.com/pydata/xarray/issues/4082#issuecomment-640808125,https://api.github.com/repos/pydata/xarray/issues/4082,640808125,MDEyOklzc3VlQ29tbWVudDY0MDgwODEyNQ==,1872600,2020-06-08T18:51:37Z,2020-06-08T18:51:37Z,NONE,"@DennisHeimbigner I don't understand how it can be a DAP or code issue since:
- it runs on Linux without errors with default `file_cache_maxsize=128`.
- it runs on Windows without errors with  `file_cache_maxsize=25`
Right?  Or am I missing something?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286
https://github.com/pydata/xarray/issues/4082#issuecomment-640590247,https://api.github.com/repos/pydata/xarray/issues/4082,640590247,MDEyOklzc3VlQ29tbWVudDY0MDU5MDI0Nw==,1872600,2020-06-08T13:05:28Z,2020-06-08T13:05:28Z,NONE,"Or perhaps Unidata's @WardF, who leads NetCDF development. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286
https://github.com/pydata/xarray/issues/4122#issuecomment-640548620,https://api.github.com/repos/pydata/xarray/issues/4122,640548620,MDEyOklzc3VlQ29tbWVudDY0MDU0ODYyMA==,1872600,2020-06-08T11:36:14Z,2020-06-08T11:37:21Z,NONE,"@martindurant, I asked @ajelenak offline and he reminded me that: 
> File metadata are dispersed throughout an HDF5 [and NetCDF4] file in order to support writing and modifying array sizes at any time of execution

Looking forward to `simplecache::` for writing in `fsspec=0.7.5`!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,631085856
https://github.com/pydata/xarray/issues/4122#issuecomment-639771646,https://api.github.com/repos/pydata/xarray/issues/4122,639771646,MDEyOklzc3VlQ29tbWVudDYzOTc3MTY0Ng==,1872600,2020-06-05T20:08:37Z,2020-06-05T20:54:36Z,NONE,"Okay @scottyhq, I tried setting `engine='h5netcdf'`, but still got:
```
OSError: Seek only available in read mode
```
Thinking about this a little more, it's pretty clear why writing NetCDF to S3 would require seek mode. 

I asked @martindurant about supporting seek for writing in `fsspec` and he said that would be pretty hard.  And in fact, the performance probably would be pretty terrible as lots of little writes would be required.  

So maybe it's best just to write netcdf files locally and then push them to S3.    

And to facilitate that,  @martindurant [merged a PR yesterday](https://github.com/intake/filesystem_spec/pull/309) to enable `simplecache` for writing in `fsspec`, so after doing:
```
pip install git+https://github.com/intake/filesystem_spec.git
```
in my environment, this now works:
```python
import xarray as xr
import fsspec

ds = xr.open_dataset('http://geoport.usgs.esipfed.org/thredds/dodsC'
                        '/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf')

outfile = fsspec.open('simplecache::s3://chs-pangeo-data-bucket/rsignell/foo2.nc', 
                      mode='wb', s3=dict(profile='default'))
with outfile as f:
    ds.to_netcdf(f)
```
(Here I'm telling `fsspec` to use the AWS credentials in my ""default"" profile)

Thanks Martin!!!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,631085856
https://github.com/pydata/xarray/issues/4082#issuecomment-639450932,https://api.github.com/repos/pydata/xarray/issues/4082,639450932,MDEyOklzc3VlQ29tbWVudDYzOTQ1MDkzMg==,1872600,2020-06-05T12:26:14Z,2020-06-05T12:26:14Z,NONE,"@shoyer, unfortunately these opendap datasets contain only 1 time record (1 daily value) each.   And it works fine on Linux with `file_cache_maxsize=128`, so it must be some Windows cache thing right? 

So since I just picked `file_cache_maxsize=10` arbitrarily, I thought it would be useful to see what the maximum value was.   Using the good old bi-section method, I determined that (for this case anyway), the maximum size that works is 25.  

In other words:
```
xr.set_options(file_cache_maxsize=25)   # works
#xr.set_options(file_cache_maxsize=26)   # fails
```
I would bet money that Unidata's @DennisHeimbigner knows what's going on here!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286
https://github.com/pydata/xarray/issues/4082#issuecomment-639111588,https://api.github.com/repos/pydata/xarray/issues/4082,639111588,MDEyOklzc3VlQ29tbWVudDYzOTExMTU4OA==,1872600,2020-06-04T20:55:49Z,2020-06-04T20:55:49Z,NONE,"@EliT1626 , I confirmed that this problem exists on Windows, but not on Linux.   

The error:
```
 IOError: [Errno -37] NetCDF: Write to read only: 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201703/oisst-avhrr-v02r01.20170304.nc'
```
suggested some kind of cache problem, and as you noted it always fails after a certain number of dates, so I tried increasing the number of cached files from the default 128 to 256:
```
xr.set_options(file_cache_maxsize=256)
```
but that had no effect. 

Just to see if it would fail earlier, I then tried *decreasing* the number of cached files:
```
xr.set_options(file_cache_maxsize=10)
```
and to my surprise, it ran all the way through:
https://nbviewer.jupyter.org/gist/rsignell-usgs/c52fadd8626734bdd32a432279bc6779

I'm hoping someone who worked on the caching (@shoyer?) might have some idea of what is going on, but at least you can execute your workflow now on windows!  

","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286
https://github.com/pydata/xarray/pull/3804#issuecomment-592094766,https://api.github.com/repos/pydata/xarray/issues/3804,592094766,MDEyOklzc3VlQ29tbWVudDU5MjA5NDc2Ng==,1872600,2020-02-27T17:59:13Z,2020-02-27T17:59:13Z,NONE,This PR is motivated by the work described in this [Medium blog post](https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314),"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572251686
https://github.com/pydata/xarray/issues/3339#issuecomment-534722389,https://api.github.com/repos/pydata/xarray/issues/3339,534722389,MDEyOklzc3VlQ29tbWVudDUzNDcyMjM4OQ==,1872600,2019-09-24T19:56:17Z,2019-09-24T19:56:17Z,NONE,"Yep, upgrading to dask=2.4.0 fixed the problem!  Phew.  ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,497823072
https://github.com/pydata/xarray/issues/3339#issuecomment-534710770,https://api.github.com/repos/pydata/xarray/issues/3339,534710770,MDEyOklzc3VlQ29tbWVudDUzNDcxMDc3MA==,1872600,2019-09-24T19:23:25Z,2019-09-24T19:23:25Z,NONE,"@shoyer , indeed, while I have the same xarray=0.13 and numpy=1.17.2 as @jhamman, he has dask=2.4.0 and I have dask=2.2.0.   I'll try upgrading and will report back.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,497823072
https://github.com/pydata/xarray/issues/2501#issuecomment-510144707,https://api.github.com/repos/pydata/xarray/issues/2501,510144707,MDEyOklzc3VlQ29tbWVudDUxMDE0NDcwNw==,1872600,2019-07-10T16:59:12Z,2019-07-11T11:47:02Z,NONE,"@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the `feature_id` coordinate to prevent `open_mfdataset` from trying to harmonize that coordinate from all the chunks. 

So if I use this code, the `open_mfdataset` command finishes:
```python
def drop_coords(ds):
    ds = ds.drop(['reference_time','feature_id'])
    return ds.reset_coords(drop=True)
```
and I can then add back in the dropped coordinate values at the end:
```python
dsets = [xr.open_dataset(f) for f in files[:3]]
ds.coords['feature_id'] = dsets[0].coords['feature_id']
```

I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509379294,https://api.github.com/repos/pydata/xarray/issues/2501,509379294,MDEyOklzc3VlQ29tbWVudDUwOTM3OTI5NA==,1872600,2019-07-08T20:28:48Z,2019-07-08T20:29:20Z,NONE,"@TomAugspurger , I thought @rabernat's suggestion of implementing
```python
def drop_coords(ds):
    return ds.reset_coords(drop=True)
```
would avoid this checking.  Did I understand or implement this incorrectly?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509341467,https://api.github.com/repos/pydata/xarray/issues/2501,509341467,MDEyOklzc3VlQ29tbWVudDUwOTM0MTQ2Nw==,1872600,2019-07-08T18:34:02Z,2019-07-08T18:34:02Z,NONE,"@rabernat , to answer your question, if I open just two files:
```
ds = xr.open_mfdataset(files[:2], preprocess=drop_coords, autoclose=True, parallel=True)
```
the resulting dataset is:
```
<xarray.Dataset>
Dimensions:         (feature_id: 2729077, reference_time: 1, time: 2)
Coordinates:
  * reference_time  (reference_time) datetime64[ns] 2009-01-01
  * feature_id      (feature_id) int32 101 179 181 ... 1180001803 1180001804
  * time            (time) datetime64[ns] 2009-01-01 2009-01-01T01:00:00
Data variables:
    streamflow      (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    q_lateral       (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    velocity        (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    qSfcLatRunoff   (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    qBucket         (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
    qBtmVertRunoff  (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)>
Attributes:
    featureType:                timeSeries
    proj4:                      +proj=longlat +datum=NAD83 +no_defs
    model_initialization_time:  2009-01-01_00:00:00
    station_dimension:          feature_id
    model_output_valid_time:    2009-01-01_00:00:00
    stream_order_output:        1
    cdm_datatype:               Station
    esri_pe_string:             GEOGCS[GCS_North_American_1983,DATUM[D_North_...
    Conventions:                CF-1.6
    model_version:              NWM 1.2
    dev_OVRTSWCRT:              1
    dev_NOAH_TIMESTEP:          3600
    dev_channel_only:           0
    dev_channelBucket_only:     0
    dev:                        dev_ prefix indicates development/internal me...
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509340139,https://api.github.com/repos/pydata/xarray/issues/2501,509340139,MDEyOklzc3VlQ29tbWVudDUwOTM0MDEzOQ==,1872600,2019-07-08T18:30:18Z,2019-07-08T18:30:18Z,NONE,"@TomAugspurger, okay, I just ran the above code again and here's what happens:

The `open_mfdataset` proceeds nicely on my 8 workers with 40 cores, eventually completing the 8760 `open_dataset` tasks in about 10 minutes.   One interesting thing is that the number of tasks keep dropping as time goes on.  Not sure why that would be:
![2019-07-08_13-40-09](https://user-images.githubusercontent.com/1872600/60832559-2d5ae080-a18a-11e9-9b0d-e7e39196412d.png)
![2019-07-08_13-42-21](https://user-images.githubusercontent.com/1872600/60832572-3481ee80-a18a-11e9-8bba-e9ee783894da.png)
![2019-07-08_13-43-15](https://user-images.githubusercontent.com/1872600/60832578-377cdf00-a18a-11e9-9b89-0d80353a62c9.png)
![2019-07-08_13-43-58](https://user-images.githubusercontent.com/1872600/60832589-3cda2980-a18a-11e9-989c-0a95754e9e46.png)
![2019-07-08_13-49-57](https://user-images.githubusercontent.com/1872600/60832613-4d8a9f80-a18a-11e9-8c54-7029a3cfd08c.png)
The memory usage on the workers seems okay during this process:
![2019-07-08_13-38-52](https://user-images.githubusercontent.com/1872600/60832649-66935080-a18a-11e9-8075-dc2fca79f830.png)

Then, despite the tasks showing on the dashboard being completed, the `open_mfdataset` command does not complete, but nothing has died, and I'm not sure what's happening.   I check `top` and get this:
![2019-07-08_13-51-13](https://user-images.githubusercontent.com/1872600/60832847-eb7e6a00-a18a-11e9-84cc-18e8796fede9.png)
  
then after about 10 more minutes, I get these warnings:
![2019-07-08_13-56-19](https://user-images.githubusercontent.com/1872600/60832800-c853ba80-a18a-11e9-839a-487fd1276460.png)

and then the errors:
```python-traceback
distributed.client - WARNING - Couldn't gather 17520 keys, rescheduling {'getattr-fd038834-befa-4a9b-b78f-51f9aa2b28e5': ('tcp://127.0.0.1:45640',), 'drop_coords-39be9e52-59de-4e1f-b6d8-27e7d931b5af': ('tcp://127.0.0.1:55881',), 'drop_coords-8bd07037-9ca4-4f97-83fb-8b02d7ad0333': ('tcp://127.0.0.1:56164',), 'drop_coords-ca3dd72b-e5af-4099-b593-89dc97717718': ('tcp://127.0.0.1:59961',), 'getattr-c0af8992-e928-4d42-9e64-340303143454': ('tcp://127.0.0.1:42989',), 'drop_coords-8cdfe5fb-7a29-4606-8692-efa747be5bc1': ('tcp://127.0.0.1:35445',), 'getattr-03669206-0d26-46a1-988d-690fe830e52f': 
...
```
Full error listing here:
https://gist.github.com/rsignell-usgs/3b7101966b8c6d05f48a0e01695f35d6

Does this help?    I'd be happy to screenshare if that would be useful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-509282831,https://api.github.com/repos/pydata/xarray/issues/2501,509282831,MDEyOklzc3VlQ29tbWVudDUwOTI4MjgzMQ==,1872600,2019-07-08T15:51:23Z,2019-07-08T15:51:23Z,NONE,"@TomAugspurger, I'm back from vacation now and ready to attack this again.   Any updates on your end?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-506475819,https://api.github.com/repos/pydata/xarray/issues/2501,506475819,MDEyOklzc3VlQ29tbWVudDUwNjQ3NTgxOQ==,1872600,2019-06-27T19:16:28Z,2019-06-27T19:24:31Z,NONE,"I tried this, and either I didn't apply it right, or it didn't work.   The memory use kept growing until the process died.   My code to process the 8760 netcdf files with `open_mfdataset` looks like this:

```python
import xarray as xr
from dask.distributed import Client, progress, LocalCluster

cluster = LocalCluster()
client = Client(cluster)

import pandas as pd

dates = pd.date_range(start='2009-01-01 00:00',end='2009-12-31 23:00', freq='1h')
files = ['./nc/{}/{}.CHRTOUT_DOMAIN1.comp'.format(date.strftime('%Y'),date.strftime('%Y%m%d%H%M')) for date in dates]

def drop_coords(ds):
    return ds.reset_coords(drop=True)

ds = xr.open_mfdataset(files, preprocess=drop_coords, autoclose=True, parallel=True)
ds1 = ds.chunk(chunks={'time':168, 'feature_id':209929})

import numcodecs
numcodecs.blosc.use_threads = False
ds1.to_zarr('zarr/2009', mode='w', consolidated=True)
```

I transfered the netcdf files from AWS S3 to my local disk to run this, using this command:

```
rclone sync --include '*.CHRTOUT_DOMAIN1.comp' aws-east:nwm-archive/2009 . --checksum --fast-list --transfers 16
```
@TomAugspurger, if you could take a look, that would be great, and if you have any ideas of how to make this example simpler/more easily reproducible, please let me know.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2501#issuecomment-497381301,https://api.github.com/repos/pydata/xarray/issues/2501,497381301,MDEyOklzc3VlQ29tbWVudDQ5NzM4MTMwMQ==,1872600,2019-05-30T15:55:56Z,2019-05-30T15:58:48Z,NONE,"I'm hitting some memory issues with using `open_mfdataset` with a cluster also.   

Specifically, I'm trying to open 8760 NetCDF files with an 8 node, 40 cpu LocalCluster.   

When I issue: 
```
ds = xr.open_mfdataset(files, parallel=True)
```
all looks good on the Dask dashboard:
![2019-05-30_9-55-05](https://user-images.githubusercontent.com/1872600/58641001-51442000-82c8-11e9-81e0-9580ec2271b1.png)
![2019-05-30_9-54-49](https://user-images.githubusercontent.com/1872600/58641007-530de380-82c8-11e9-9c1f-46e5fca187da.png)
and the tasks complete with no errors in about 4 minutes.  

Then 4 more minutes go by before I get a bunch of errors like:
```
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
distributed.nanny - WARNING - Worker process 26054 was killed by unknown signal
distributed.nanny - WARNING - Restarting worker
```
and my cell doesn't complete.   

Any suggestions?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/2368#issuecomment-443227318,https://api.github.com/repos/pydata/xarray/issues/2368,443227318,MDEyOklzc3VlQ29tbWVudDQ0MzIyNzMxOA==,1872600,2018-11-30T14:53:13Z,2018-11-30T14:53:13Z,NONE,"@nordam , can you provide an example?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,350899839
https://github.com/pydata/xarray/issues/2503#issuecomment-432743208,https://api.github.com/repos/pydata/xarray/issues/2503,432743208,MDEyOklzc3VlQ29tbWVudDQzMjc0MzIwOA==,1872600,2018-10-24T17:02:34Z,2018-10-24T17:02:34Z,NONE,"The version that is working in [@rabernat's esgf binder env](https://github.com/rabernat/pangeo_esgf_demo/blob/master/binder/environment.yml) is:
```
libnetcdf                 4.6.1               h9cd6fdc_11    conda-forge
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666
https://github.com/pydata/xarray/issues/2503#issuecomment-432706068,https://api.github.com/repos/pydata/xarray/issues/2503,432706068,MDEyOklzc3VlQ29tbWVudDQzMjcwNjA2OA==,1872600,2018-10-24T15:27:33Z,2018-10-24T15:27:33Z,NONE,"I fired up my notebook on @rabernat's binder env and it worked fine also:
https://nbviewer.jupyter.org/gist/rsignell-usgs/aebdac44a1d773b99673cb132c2ef5eb","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666
https://github.com/pydata/xarray/issues/2503#issuecomment-432416114,https://api.github.com/repos/pydata/xarray/issues/2503,432416114,MDEyOklzc3VlQ29tbWVudDQzMjQxNjExNA==,1872600,2018-10-23T20:55:42Z,2018-10-23T20:55:42Z,NONE,"@lesserwhirls , is this the issue you are referring to? https://github.com/Unidata/netcdf4-python/issues/836","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666
https://github.com/pydata/xarray/issues/2503#issuecomment-432415704,https://api.github.com/repos/pydata/xarray/issues/2503,432415704,MDEyOklzc3VlQ29tbWVudDQzMjQxNTcwNA==,1872600,2018-10-23T20:54:24Z,2018-10-23T20:54:24Z,NONE,"@jhamman, doesn't this dask status plot tell us that multiple workers are connecting and getting data?
![2018-10-23_16-53-20](https://user-images.githubusercontent.com/1872600/47390007-4ac34980-d6e4-11e8-8f54-b8f7b6d0c25d.png)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666
https://github.com/pydata/xarray/issues/2503#issuecomment-432389980,https://api.github.com/repos/pydata/xarray/issues/2503,432389980,MDEyOklzc3VlQ29tbWVudDQzMjM4OTk4MA==,1872600,2018-10-23T19:39:09Z,2018-10-23T19:39:09Z,NONE,"Perhaps it's also worth mentioning that I don't see any errors on the THREDDS server side on either the tomcat catalina or thredds threddsServlet logs.   @lesserwhirls, any ideas?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666
https://github.com/pydata/xarray/issues/2503#issuecomment-432374559,https://api.github.com/repos/pydata/xarray/issues/2503,432374559,MDEyOklzc3VlQ29tbWVudDQzMjM3NDU1OQ==,1872600,2018-10-23T18:53:28Z,2018-10-23T19:39:08Z,NONE,"FWIW, in my workflow there was nothing fundamentally wrong, meaning that the requests worked for a while, but eventually would die with the `NetCDF: Malformed or inaccessible DAP DDS` message. 

So for just a short time period (in this case 50 time steps, 2 chunks in time), it would usually work:
https://nbviewer.jupyter.org/gist/rsignell-usgs/1155c76ed3440858ced8132e4cd81df4
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666
https://github.com/pydata/xarray/issues/2503#issuecomment-432367931,https://api.github.com/repos/pydata/xarray/issues/2503,432367931,MDEyOklzc3VlQ29tbWVudDQzMjM2NzkzMQ==,1872600,2018-10-23T18:34:48Z,2018-10-23T19:18:52Z,NONE,"I tried a similar workflow last week with an AWS kubernetes cluster with opendap endpoints and it also failed: https://nbviewer.jupyter.org/gist/rsignell-usgs/8583ea8f8b5e1c926b0409bd536095a9


I thought it was likely some intermittent problem that wasn't handled well.  In my case after a while I get:
```
distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=_ElementwiseFunctionArray(LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7ff93cbbd828>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x7ff945421378>, encoded_fill_values={1e+37}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(375, 400, None), slice(0, 7, None), slice(0, 670, None), slice(0, 300, None))) kwargs: {} Exception: OSError(-72, 'NetCDF: Malformed or inaccessible DAP DDS')
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666
https://github.com/pydata/xarray/issues/2323#issuecomment-408606913,https://api.github.com/repos/pydata/xarray/issues/2323,408606913,MDEyOklzc3VlQ29tbWVudDQwODYwNjkxMw==,1872600,2018-07-28T13:07:39Z,2018-07-28T13:07:39Z,NONE,"@shoyer, if we a `znetcdf` library like `h5netcdf` we could get `mf_dataset` ""for free"" though, right?  
Zarr definitely has more and different compression options than NetCDF -- does that make this concept problematic?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,345354038
https://github.com/pydata/xarray/issues/2233#issuecomment-397596002,https://api.github.com/repos/pydata/xarray/issues/2233,397596002,MDEyOklzc3VlQ29tbWVudDM5NzU5NjAwMg==,1872600,2018-06-15T11:44:35Z,2018-06-15T11:44:35Z,NONE,"@rabernat , this unstructured grid model output follows the [UGRID Conventions](http://ugrid-conventions.github.io/ugrid-conventions/), which layer on top of the CF Conventions.   The issue Xarray is having here is with the vertical coordinate however, so this issue could arise with any CF convention model where the vertical stretching function varies over the domain. 

As requested, here is the ncdump of this URL:
```
jovyan@jupyter-rsignell-2dusgs:~$ ncdump -h http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc
netcdf NECOFS_GOM3_FORECAST {
dimensions:
        time = UNLIMITED ; // (145 currently)
        maxStrlen64 = 64 ;
        nele = 99137 ;
        node = 53087 ;
        siglay = 40 ;
        three = 3 ;
variables:
        float lon(node) ;
                lon:long_name = ""nodal longitude"" ;
                lon:standard_name = ""longitude"" ;
                lon:units = ""degrees_east"" ;
        float lat(node) ;
                lat:long_name = ""nodal latitude"" ;
                lat:standard_name = ""latitude"" ;
                lat:units = ""degrees_north"" ;
        float xc(nele) ;
                xc:long_name = ""zonal x-coordinate"" ;
                xc:units = ""meters"" ;
        float yc(nele) ;
                yc:long_name = ""zonal y-coordinate"" ;
                yc:units = ""meters"" ;
        float lonc(nele) ;
                lonc:long_name = ""zonal longitude"" ;
                lonc:standard_name = ""longitude"" ;
                lonc:units = ""degrees_east"" ;
        float latc(nele) ;
                latc:long_name = ""zonal latitude"" ;
                latc:standard_name = ""latitude"" ;
                latc:units = ""degrees_north"" ;
        float siglay(siglay, node) ;
                siglay:long_name = ""Sigma Layers"" ;
                siglay:standard_name = ""ocean_sigma_coordinate"" ;
                siglay:positive = ""up"" ;
                siglay:valid_min = -1. ;
                siglay:valid_max = 0. ;
                siglay:formula_terms = ""sigma: siglay eta: zeta depth: h"" ;
        float h(node) ;
                h:long_name = ""Bathymetry"" ;
                h:standard_name = ""sea_floor_depth_below_geoid"" ;
                h:units = ""m"" ;
                h:coordinates = ""lat lon"" ;
                h:type = ""data"" ;
                h:mesh = ""fvcom_mesh"" ;
                h:location = ""node"" ;
        int nv(three, nele) ;
                nv:long_name = ""nodes surrounding element"" ;
                nv:cf_role = ""face_node_connnectivity"" ;
                nv:start_index = 1 ;
        float time(time) ;
                time:long_name = ""time"" ;
                time:units = ""days since 1858-11-17 00:00:00"" ;
                time:format = ""modified julian day (MJD)"" ;
                time:time_zone = ""UTC"" ;
                time:standard_name = ""time"" ;
        float zeta(time, node) ;
                zeta:long_name = ""Water Surface Elevation"" ;
                zeta:units = ""meters"" ;
                zeta:standard_name = ""sea_surface_height_above_geoid"" ;
                zeta:coordinates = ""time lat lon"" ;
                zeta:type = ""data"" ;
                zeta:missing_value = -999. ;
                zeta:field = ""elev, scalar"" ;
                zeta:coverage_content_type = ""modelResult"" ;
                zeta:mesh = ""fvcom_mesh"" ;
                zeta:location = ""node"" ;
        int nbe(three, nele) ;
                nbe:long_name = ""elements surrounding each element"" ;
        float u(time, siglay, nele) ;
                u:long_name = ""Eastward Water Velocity"" ;
                u:units = ""meters s-1"" ;
                u:type = ""data"" ;
                u:missing_value = -999. ;
                u:field = ""ua, scalar"" ;
                u:coverage_content_type = ""modelResult"" ;
                u:standard_name = ""eastward_sea_water_velocity"" ;
                u:coordinates = ""time siglay latc lonc"" ;
                u:mesh = ""fvcom_mesh"" ;
                u:location = ""face"" ;
        float v(time, siglay, nele) ;
                v:long_name = ""Northward Water Velocity"" ;
                v:units = ""meters s-1"" ;
                v:type = ""data"" ;
                v:missing_value = -999. ;
                v:field = ""va, scalar"" ;
                v:coverage_content_type = ""modelResult"" ;
                v:standard_name = ""northward_sea_water_velocity"" ;
                v:coordinates = ""time siglay latc lonc"" ;
                v:mesh = ""fvcom_mesh"" ;
                v:location = ""face"" ;
        float ww(time, siglay, nele) ;
                ww:long_name = ""Upward Water Velocity"" ;
                ww:units = ""meters s-1"" ;
                ww:type = ""data"" ;
                ww:coverage_content_type = ""modelResult"" ;
                ww:standard_name = ""upward_sea_water_velocity"" ;
                ww:coordinates = ""time siglay latc lonc"" ;
                ww:mesh = ""fvcom_mesh"" ;
                ww:location = ""face"" ;
        float ua(time, nele) ;
                ua:long_name = ""Vertically Averaged x-velocity"" ;
                ua:units = ""meters s-1"" ;
                ua:type = ""data"" ;
                ua:missing_value = -999. ;
                ua:field = ""ua, scalar"" ;
                ua:coverage_content_type = ""modelResult"" ;
                ua:standard_name = ""barotropic_eastward_sea_water_velocity"" ;
                ua:coordinates = ""time latc lonc"" ;
                ua:mesh = ""fvcom_mesh"" ;
                ua:location = ""face"" ;
        float va(time, nele) ;
                va:long_name = ""Vertically Averaged y-velocity"" ;
                va:units = ""meters s-1"" ;
                va:type = ""data"" ;
                va:missing_value = -999. ;
                va:field = ""va, scalar"" ;
                va:coverage_content_type = ""modelResult"" ;
                va:standard_name = ""barotropic_northward_sea_water_velocity"" ;
                va:coordinates = ""time latc lonc"" ;
                va:mesh = ""fvcom_mesh"" ;
                va:location = ""face"" ;
        float temp(time, siglay, node) ;
                temp:long_name = ""temperature"" ;
                temp:standard_name = ""sea_water_potential_temperature"" ;
                temp:units = ""degrees_C"" ;
                temp:coordinates = ""time siglay lat lon"" ;
                temp:type = ""data"" ;
                temp:coverage_content_type = ""modelResult"" ;
                temp:mesh = ""fvcom_mesh"" ;
                temp:location = ""node"" ;
        float salinity(time, siglay, node) ;
                salinity:long_name = ""salinity"" ;
                salinity:standard_name = ""sea_water_salinity"" ;
                salinity:units = ""0.001"" ;
                salinity:coordinates = ""time siglay lat lon"" ;
                salinity:type = ""data"" ;
                salinity:coverage_content_type = ""modelResult"" ;
                salinity:mesh = ""fvcom_mesh"" ;
                salinity:location = ""node"" ;
        int fvcom_mesh ;
                fvcom_mesh:cf_role = ""mesh_topology"" ;
                fvcom_mesh:topology_dimension = 2 ;
                fvcom_mesh:node_coordinates = ""lon lat"" ;
                fvcom_mesh:face_coordinates = ""lonc latc"" ;
                fvcom_mesh:face_node_connectivity = ""nv"" ;

// global attributes:
                :title = ""NECOFS GOM3 (FVCOM) - Northeast US - Latest Forecast"" ;
                :institution = ""School for Marine Science and Technology"" ;
                :source = ""FVCOM_3.0"" ;
                :Conventions = ""CF-1.0, UGRID-1.0"" ;
                :summary = ""Latest forecast from the FVCOM Northeast Coastal Ocean Forecast System using an newer, higher-resolution GOM3 mesh (GOM2 was the preceding mesh)"" ;
    
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,332471780
https://github.com/pydata/xarray/pull/2131#issuecomment-395535173,https://api.github.com/repos/pydata/xarray/issues/2131,395535173,MDEyOklzc3VlQ29tbWVudDM5NTUzNTE3Mw==,1872600,2018-06-07T19:20:24Z,2018-06-07T19:20:24Z,NONE,Sounds good.  Thanks @shoyer!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930
https://github.com/pydata/xarray/pull/2131#issuecomment-395524953,https://api.github.com/repos/pydata/xarray/issues/2131,395524953,MDEyOklzc3VlQ29tbWVudDM5NTUyNDk1Mw==,1872600,2018-06-07T18:45:42Z,2018-06-07T18:45:42Z,NONE,Might this PR warrant a new minor release?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930
https://github.com/pydata/xarray/pull/2131#issuecomment-395476675,https://api.github.com/repos/pydata/xarray/issues/2131,395476675,MDEyOklzc3VlQ29tbWVudDM5NTQ3NjY3NQ==,1872600,2018-06-07T16:07:14Z,2018-06-07T16:11:08Z,NONE,"@jhamman woohoo!  Cell [20] completes nicely now: 
https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594
I'm getting some errors in Cell [20], but I think those are unrelated and didn't affect the successful completion of the tasks, right?  (this is on an HPC system)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930
https://github.com/pydata/xarray/pull/2131#issuecomment-395447613,https://api.github.com/repos/pydata/xarray/issues/2131,395447613,MDEyOklzc3VlQ29tbWVudDM5NTQ0NzYxMw==,1872600,2018-06-07T14:46:21Z,2018-06-07T14:47:07Z,NONE,"@jhamman , although I'm getting distributed workers to compute the mean from a bunch of images, I'm getting a ""Failed to Serialize"" error in cell [23] of this notebook:
https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594
If this is a bug, I think it was there before the recent updates. 

You should be able to run this notebook without modification. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930
https://github.com/pydata/xarray/pull/2131#issuecomment-394887291,https://api.github.com/repos/pydata/xarray/issues/2131,394887291,MDEyOklzc3VlQ29tbWVudDM5NDg4NzI5MQ==,1872600,2018-06-05T23:00:51Z,2018-06-05T23:13:08Z,NONE,"@jhamman , still very much interested in this -- could the existing functionality be merged and enhanced later?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930
https://github.com/pydata/xarray/pull/2131#issuecomment-389330810,https://api.github.com/repos/pydata/xarray/issues/2131,389330810,MDEyOklzc3VlQ29tbWVudDM4OTMzMDgxMA==,1872600,2018-05-15T22:15:22Z,2018-05-15T22:15:22Z,NONE,"It's working for me!
https://gist.github.com/rsignell-usgs/ef81fb4306dac3a2406d0adb575b340f","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930
https://github.com/pydata/xarray/pull/2131#issuecomment-389277628,https://api.github.com/repos/pydata/xarray/issues/2131,389277628,MDEyOklzc3VlQ29tbWVudDM4OTI3NzYyOA==,1872600,2018-05-15T19:02:06Z,2018-05-15T19:02:06Z,NONE,@jhamman should I test this out on my original workflow or wait a bit?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930
https://github.com/pydata/xarray/issues/2121#issuecomment-388786292,https://api.github.com/repos/pydata/xarray/issues/2121,388786292,MDEyOklzc3VlQ29tbWVudDM4ODc4NjI5Mg==,1872600,2018-05-14T11:34:45Z,2018-05-14T11:34:45Z,NONE,"@jhamman what kind of expertise would it take to do this job (e.g,  it just a copy-and-paste with some small changes that a newbie could probably do, or would it be best for core dev team)?

And is there any workaround that can be used in the interim?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,322445312
https://github.com/pydata/xarray/pull/1811#issuecomment-382466626,https://api.github.com/repos/pydata/xarray/issues/1811,382466626,MDEyOklzc3VlQ29tbWVudDM4MjQ2NjYyNg==,1872600,2018-04-18T17:30:25Z,2018-04-18T17:32:21Z,NONE,"@jhamman, I was just using `client = Client()`.   Should I be using `LocalCluster` instead?  
(there is no kubernetes on this JupyterHub).   
Also, is there a better place to have this sort of discussion or is it okay here?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-382421609,https://api.github.com/repos/pydata/xarray/issues/1811,382421609,MDEyOklzc3VlQ29tbWVudDM4MjQyMTYwOQ==,1872600,2018-04-18T15:11:02Z,2018-04-18T15:14:12Z,NONE,"@jhamman, I tried the same code with a single-threaded scheduler:
```python
    ...
    delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False)
    persist_store = delayed_store.persist(retries=100, get=dask.local.get_sync)
```
and it ran to completion with no errors (taking 2 hours for 100GB to Zarr).    What should I try next?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/pull/1811#issuecomment-381969631,https://api.github.com/repos/pydata/xarray/issues/1811,381969631,MDEyOklzc3VlQ29tbWVudDM4MTk2OTYzMQ==,1872600,2018-04-17T12:12:15Z,2018-04-17T12:15:19Z,NONE,"@jhamman , I'm trying to test `compute=False` out this code:
```python
# Write National Water Model data to Zarr

from dask.distributed import Client
import pandas as pd
import xarray as xr
import s3fs
import zarr

if __name__ == '__main__':

    client = Client()

    root = '/projects/water/nwm/data/forcing_short_range/'                      # Local Files
#   root = 'http://tds.renci.org:8080/thredds/dodsC/nwm/forcing_short_range/'   # OPenDAP


    bucket_endpoint='https://s3.us-west-1.amazonaws.com/'
#   bucket_endpoint='https://iu.jetstream-cloud.org:8080'

    f_zarr = 'rsignell/nwm/test_week'

    dates = pd.date_range(start='2018-04-01T00:00', end='2018-04-07T23:00', freq='H')
    urls = ['{}{}/nwm.t{}z.short_range.forcing.f001.conus.nc'.format(root,a.strftime('%Y%m%d'),a.strftime('%H')) for a in dates]

    ds = xr.open_mfdataset(urls, concat_dim='time', lock=True)
    ds = ds.drop(['ProjectionCoordinateSystem'])

    fs = s3fs.S3FileSystem(anon=False, client_kwargs=dict(endpoint_url=bucket_endpoint))
    d = s3fs.S3Map(f_zarr, s3=fs)

    compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2)
    encoding = {vname: {'compressor': compressor} for vname in ds.data_vars}

    delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False)
    persist_store = delayed_store.persist(retries=100)
```
and after 20 seconds or so, the process dies with this error:

```python-traceback
/home/rsignell/my-conda-envs/zarr/lib/python3.6/site-packages/distributed/worker.py:742: 
UserWarning: Large object of size 1.23 MB detected in task graph:

  (<xarray.backends.zarr.ZarrStore object at 0x7f5d8 ... deedecefab224')

Consider scattering large objects ahead of time with client.scatter
to reduce scheduler burden and keep data on workers

    future = client.submit(func, big_data)    # bad

    big_future = client.scatter(big_data)     # good
    future = client.submit(func, big_future)  # good
  % (format_bytes(len(b)), s))
```
Do you have suggestions on how to modify my code?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795
https://github.com/pydata/xarray/issues/1621#issuecomment-339093278,https://api.github.com/repos/pydata/xarray/issues/1621,339093278,MDEyOklzc3VlQ29tbWVudDMzOTA5MzI3OA==,1872600,2017-10-24T18:50:21Z,2017-10-24T18:50:21Z,NONE,I vote for `1` also.  How many makes a quorum?   :smile_cat: ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,264321376
https://github.com/pydata/xarray/issues/1621#issuecomment-338172936,https://api.github.com/repos/pydata/xarray/issues/1621,338172936,MDEyOklzc3VlQ29tbWVudDMzODE3MjkzNg==,1872600,2017-10-20T10:46:53Z,2017-10-20T10:50:18Z,NONE,"On https://stackoverflow.com/a/46675990/2005869, @shoyer explains: 
> My understanding of CF standard names is that `forecast_period` should be equal to the difference between time and `forecast_reference_time`, i.e., `forecast_period` = `time` - `forecast_reference_time`. If you specified your `time_offset` variable with units in the form ""hours"", then it would be decoded to `timedelta64`, along with `datetime64` for time and time_run, so xarray's arithmetic would actually satisfy this identity. You might find this useful if you only wanted to include two of these variables and wanted to calculate the third on the fly.   On the other hand, you probably don't want to convert the `Tper` variable to `timedelta64`. Technically, it is also a time period, but it's not a variable that makes sense to compare to time.

I understand the potential issue here, but I think Xarray should follow [CF conventions for time](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#time-coordinate), and only treat variables as time coordinates if they have valid CF time units (`<time unit> since <date>`).     

We know of thousands of datasets  (every dataset with waves!) where the current Xarray behavior is a problem. ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,264321376
https://github.com/pydata/xarray/pull/844#issuecomment-217183543,https://api.github.com/repos/pydata/xarray/issues/844,217183543,MDEyOklzc3VlQ29tbWVudDIxNzE4MzU0Mw==,1872600,2016-05-05T15:19:55Z,2016-05-05T15:19:55Z,NONE,"It also seems consistent to me to return a Dataset. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,153126324
https://github.com/pydata/xarray/issues/567#issuecomment-216944939,https://api.github.com/repos/pydata/xarray/issues/567,216944939,MDEyOklzc3VlQ29tbWVudDIxNjk0NDkzOQ==,1872600,2016-05-04T17:45:06Z,2016-05-04T17:45:06Z,NONE,":+1:  -- I think this would be super-useful general functionality for the xarray community that doesn't come with any downside.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,105688738
https://github.com/pydata/xarray/issues/704#issuecomment-169553668,https://api.github.com/repos/pydata/xarray/issues/704,169553668,MDEyOklzc3VlQ29tbWVudDE2OTU1MzY2OA==,1872600,2016-01-07T05:19:04Z,2016-01-07T05:19:04Z,NONE,"I think it would be nicer to get rid of the floating black lines (axis) altogether 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,124867009
https://github.com/pydata/xarray/issues/704#issuecomment-169299086,https://api.github.com/repos/pydata/xarray/issues/704,169299086,MDEyOklzc3VlQ29tbWVudDE2OTI5OTA4Ng==,1872600,2016-01-06T11:10:04Z,2016-01-06T11:10:33Z,NONE,"Yet another vote for `import xarray as xr`
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,124867009
https://github.com/pydata/xarray/issues/567#issuecomment-139055592,https://api.github.com/repos/pydata/xarray/issues/567,139055592,MDEyOklzc3VlQ29tbWVudDEzOTA1NTU5Mg==,1872600,2015-09-09T21:48:02Z,2015-09-09T21:48:02Z,NONE,"I was thinking that the data variables that matched a specified `standard_name` would be a subset of the variables in the `data_vars` object. 
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,105688738
https://github.com/pydata/xarray/issues/476#issuecomment-121802784,https://api.github.com/repos/pydata/xarray/issues/476,121802784,MDEyOklzc3VlQ29tbWVudDEyMTgwMjc4NA==,1872600,2015-07-16T02:17:31Z,2015-07-16T02:17:31Z,NONE,"Indeed, with master, it's working.
http://nbviewer.ipython.org/gist/rsignell-usgs/047235496029529585cc

Closing....
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,95222803