html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/2233#issuecomment-1078439763,https://api.github.com/repos/pydata/xarray/issues/2233,1078439763,IC_kwDOAMm_X85AR69T,1872600,2022-03-24T22:26:07Z,2023-07-16T15:13:39Z,NONE,"https://github.com/pydata/xarray/issues/2233#issuecomment-397602084 Would the [new xarray index/coordinate internal refactoring](https://twitter.com/xarray_dev/status/1506004653594488833) now allow us to address this issue? cc @kthyng","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,332471780 https://github.com/pydata/xarray/issues/6318#issuecomment-1056917100,https://api.github.com/repos/pydata/xarray/issues/6318,1056917100,IC_kwDOAMm_X84-_0Zs,1872600,2022-03-02T13:13:24Z,2022-03-02T13:14:40Z,NONE,"While I was typing this, @keewis provided a workaround here: https://github.com/fsspec/kerchunk/issues/130#issuecomment-1056897730 ! Leaving this open until I know whether this is something best left for users to implement or something to be handled in xarray. #6318 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1157163377 https://github.com/pydata/xarray/pull/4140#issuecomment-985769385,https://api.github.com/repos/pydata/xarray/issues/4140,985769385,IC_kwDOAMm_X846waWp,1872600,2021-12-03T19:22:13Z,2021-12-03T19:22:13Z,NONE,"Thanks @snowman2 ! Done in https://github.com/corteva/rioxarray/issues/440 ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,636451398 https://github.com/pydata/xarray/pull/4140#issuecomment-985530331,https://api.github.com/repos/pydata/xarray/issues/4140,985530331,IC_kwDOAMm_X846vf_b,1872600,2021-12-03T13:41:35Z,2021-12-03T13:43:33Z,NONE,"I'd like to use this cool new rasterio/fspec functionality in xarray! I must be doing something wrong here in cell [5]: https://nbviewer.org/gist/rsignell-usgs/dbf3d8e952895ca255f300790759c60f ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,636451398 https://github.com/pydata/xarray/issues/2697#issuecomment-832761716,https://api.github.com/repos/pydata/xarray/issues/2697,832761716,MDEyOklzc3VlQ29tbWVudDgzMjc2MTcxNg==,1872600,2021-05-05T15:02:55Z,2021-05-05T15:04:59Z,NONE,"It's worth pointing out that you can create [FileReferenceSystem JSON](https://github.com/intake/fsspec-reference-maker#version-1) to accomplish many of the tasks we used to use NcML for: * create a single virtual dataset that points to a collection of files * modify dataset and variable attributes It also has the nice feature that it makes your dataset faster to work with on the cloud because the map to the data is loaded in one shot! ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,401874795 https://github.com/pydata/xarray/pull/4461#issuecomment-741889071,https://api.github.com/repos/pydata/xarray/issues/4461,741889071,MDEyOklzc3VlQ29tbWVudDc0MTg4OTA3MQ==,1872600,2020-12-09T16:31:37Z,2021-01-19T14:46:49Z,NONE,"I'm really looking forward to getting this merged so I can open the National Water Model Zarr I created last week thusly: ```python ds = xr.open_dataset(s3://noaa-nwm-retro-v2.0-zarr-pds', engine='zarr', backend_kwargs={'consolidated':True, ""storage_options"": {'anon':True}}) ``` @martindurant tells me this takes only **3 s** with the new async capability! That would be pretty awesome, because now it takes **1min 15s** to open this dataset!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,709187212 https://github.com/pydata/xarray/issues/4122#issuecomment-745520766,https://api.github.com/repos/pydata/xarray/issues/4122,745520766,MDEyOklzc3VlQ29tbWVudDc0NTUyMDc2Ng==,1872600,2020-12-15T19:39:16Z,2020-12-15T19:39:16Z,NONE,"I'm closing this the recommended approach for writing NetCDF to object stroage is to write locally, then push. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,631085856 https://github.com/pydata/xarray/pull/4461#issuecomment-741942375,https://api.github.com/repos/pydata/xarray/issues/4461,741942375,MDEyOklzc3VlQ29tbWVudDc0MTk0MjM3NQ==,1872600,2020-12-09T17:50:04Z,2020-12-09T17:50:04Z,NONE,"@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?","{""total_count"": 1, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 1, ""eyes"": 0}",,709187212 https://github.com/pydata/xarray/issues/4470#issuecomment-727222443,https://api.github.com/repos/pydata/xarray/issues/4470,727222443,MDEyOklzc3VlQ29tbWVudDcyNzIyMjQ0Mw==,1872600,2020-11-14T15:22:49Z,2020-11-14T15:23:28Z,NONE,"Just a note that the only unstructured grid (triangular mesh) example I have is: http://gallery.pangeo.io/repos/rsignell-usgs/esip-gallery/01_hurricane_ike_water_levels.html I figured out how to make that notebook from the info at: https://earthsim.holoviz.org/user_guide/Visualizing_Meshes.html The ""earthsim"" project was developed by the Holoviz team (@jbednar & co) funded by USACE when @dharhas was there. Would be cool to revive this. The Holoviz team and USACE might not have been aware of the [UGRID conventions](http://ugrid-conventions.github.io/ugrid-conventions/) when they developed that code, so currently it's a bit awkward to go from a UGRID-compliant NetCDF dataset to visualization with Holoviz (as you can see from the Hurricane Ike notebook). That would be low-hanging fruit for any future effort. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,710357592 https://github.com/pydata/xarray/pull/3804#issuecomment-680138664,https://api.github.com/repos/pydata/xarray/issues/3804,680138664,MDEyOklzc3VlQ29tbWVudDY4MDEzODY2NA==,1872600,2020-08-25T16:39:34Z,2020-08-25T17:07:42Z,NONE,"Drumroll.... @dcherian, epic cymbal crash?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572251686 https://github.com/pydata/xarray/issues/4338#issuecomment-673433045,https://api.github.com/repos/pydata/xarray/issues/4338,673433045,MDEyOklzc3VlQ29tbWVudDY3MzQzMzA0NQ==,1872600,2020-08-13T11:54:10Z,2020-08-13T12:04:11Z,NONE,"@nicholaskgeorge your minimal test would be monotonic if `square2` and `square4` had `x` coordinates `[3,4,5]` instead of `[2,3,4]`, but it seems `combine_by_coords` doesn't mind that?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,677773328 https://github.com/pydata/xarray/pull/3804#issuecomment-665163886,https://api.github.com/repos/pydata/xarray/issues/3804,665163886,MDEyOklzc3VlQ29tbWVudDY2NTE2Mzg4Ng==,1872600,2020-07-28T17:10:47Z,2020-07-28T17:11:33Z,NONE,"@dcherian , are we just waiting for one more ""+1"" here, or are the failing checks related to this PR? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572251686 https://github.com/pydata/xarray/issues/4082#issuecomment-642841283,https://api.github.com/repos/pydata/xarray/issues/4082,642841283,MDEyOklzc3VlQ29tbWVudDY0Mjg0MTI4Mw==,1872600,2020-06-11T17:58:30Z,2020-06-11T18:00:28Z,NONE,"@jswhit, do you know if https://github.com/Unidata/netcdf4-python is doing the caching? Just to catch you up quickly, we have a workflow that opens a bunch of opendap datasets, and while the default `file_cache_maxsize=128` works on Linux, if this exceeds 25 files on windows it fails: ``` xr.set_options(file_cache_maxsize=25) # works #xr.set_options(file_cache_maxsize=26) # fails ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286 https://github.com/pydata/xarray/issues/4082#issuecomment-641236117,https://api.github.com/repos/pydata/xarray/issues/4082,641236117,MDEyOklzc3VlQ29tbWVudDY0MTIzNjExNw==,1872600,2020-06-09T11:42:38Z,2020-06-09T11:42:38Z,NONE,"@DennisHeimbigner , do you not agree that this issue on windows is related to the number of files cached from OPeNDAP requests? Clearly there are some differences with cache files on windows: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg11190.html ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286 https://github.com/pydata/xarray/issues/4082#issuecomment-640808125,https://api.github.com/repos/pydata/xarray/issues/4082,640808125,MDEyOklzc3VlQ29tbWVudDY0MDgwODEyNQ==,1872600,2020-06-08T18:51:37Z,2020-06-08T18:51:37Z,NONE,"@DennisHeimbigner I don't understand how it can be a DAP or code issue since: - it runs on Linux without errors with default `file_cache_maxsize=128`. - it runs on Windows without errors with `file_cache_maxsize=25` Right? Or am I missing something?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286 https://github.com/pydata/xarray/issues/4082#issuecomment-640590247,https://api.github.com/repos/pydata/xarray/issues/4082,640590247,MDEyOklzc3VlQ29tbWVudDY0MDU5MDI0Nw==,1872600,2020-06-08T13:05:28Z,2020-06-08T13:05:28Z,NONE,"Or perhaps Unidata's @WardF, who leads NetCDF development. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286 https://github.com/pydata/xarray/issues/4122#issuecomment-640548620,https://api.github.com/repos/pydata/xarray/issues/4122,640548620,MDEyOklzc3VlQ29tbWVudDY0MDU0ODYyMA==,1872600,2020-06-08T11:36:14Z,2020-06-08T11:37:21Z,NONE,"@martindurant, I asked @ajelenak offline and he reminded me that: > File metadata are dispersed throughout an HDF5 [and NetCDF4] file in order to support writing and modifying array sizes at any time of execution Looking forward to `simplecache::` for writing in `fsspec=0.7.5`!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,631085856 https://github.com/pydata/xarray/issues/4122#issuecomment-639771646,https://api.github.com/repos/pydata/xarray/issues/4122,639771646,MDEyOklzc3VlQ29tbWVudDYzOTc3MTY0Ng==,1872600,2020-06-05T20:08:37Z,2020-06-05T20:54:36Z,NONE,"Okay @scottyhq, I tried setting `engine='h5netcdf'`, but still got: ``` OSError: Seek only available in read mode ``` Thinking about this a little more, it's pretty clear why writing NetCDF to S3 would require seek mode. I asked @martindurant about supporting seek for writing in `fsspec` and he said that would be pretty hard. And in fact, the performance probably would be pretty terrible as lots of little writes would be required. So maybe it's best just to write netcdf files locally and then push them to S3. And to facilitate that, @martindurant [merged a PR yesterday](https://github.com/intake/filesystem_spec/pull/309) to enable `simplecache` for writing in `fsspec`, so after doing: ``` pip install git+https://github.com/intake/filesystem_spec.git ``` in my environment, this now works: ```python import xarray as xr import fsspec ds = xr.open_dataset('http://geoport.usgs.esipfed.org/thredds/dodsC' '/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf') outfile = fsspec.open('simplecache::s3://chs-pangeo-data-bucket/rsignell/foo2.nc', mode='wb', s3=dict(profile='default')) with outfile as f: ds.to_netcdf(f) ``` (Here I'm telling `fsspec` to use the AWS credentials in my ""default"" profile) Thanks Martin!!!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,631085856 https://github.com/pydata/xarray/issues/4082#issuecomment-639450932,https://api.github.com/repos/pydata/xarray/issues/4082,639450932,MDEyOklzc3VlQ29tbWVudDYzOTQ1MDkzMg==,1872600,2020-06-05T12:26:14Z,2020-06-05T12:26:14Z,NONE,"@shoyer, unfortunately these opendap datasets contain only 1 time record (1 daily value) each. And it works fine on Linux with `file_cache_maxsize=128`, so it must be some Windows cache thing right? So since I just picked `file_cache_maxsize=10` arbitrarily, I thought it would be useful to see what the maximum value was. Using the good old bi-section method, I determined that (for this case anyway), the maximum size that works is 25. In other words: ``` xr.set_options(file_cache_maxsize=25) # works #xr.set_options(file_cache_maxsize=26) # fails ``` I would bet money that Unidata's @DennisHeimbigner knows what's going on here!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286 https://github.com/pydata/xarray/issues/4082#issuecomment-639111588,https://api.github.com/repos/pydata/xarray/issues/4082,639111588,MDEyOklzc3VlQ29tbWVudDYzOTExMTU4OA==,1872600,2020-06-04T20:55:49Z,2020-06-04T20:55:49Z,NONE,"@EliT1626 , I confirmed that this problem exists on Windows, but not on Linux. The error: ``` IOError: [Errno -37] NetCDF: Write to read only: 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201703/oisst-avhrr-v02r01.20170304.nc' ``` suggested some kind of cache problem, and as you noted it always fails after a certain number of dates, so I tried increasing the number of cached files from the default 128 to 256: ``` xr.set_options(file_cache_maxsize=256) ``` but that had no effect. Just to see if it would fail earlier, I then tried *decreasing* the number of cached files: ``` xr.set_options(file_cache_maxsize=10) ``` and to my surprise, it ran all the way through: https://nbviewer.jupyter.org/gist/rsignell-usgs/c52fadd8626734bdd32a432279bc6779 I'm hoping someone who worked on the caching (@shoyer?) might have some idea of what is going on, but at least you can execute your workflow now on windows! ","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,621177286 https://github.com/pydata/xarray/pull/3804#issuecomment-592094766,https://api.github.com/repos/pydata/xarray/issues/3804,592094766,MDEyOklzc3VlQ29tbWVudDU5MjA5NDc2Ng==,1872600,2020-02-27T17:59:13Z,2020-02-27T17:59:13Z,NONE,This PR is motivated by the work described in this [Medium blog post](https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314),"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,572251686 https://github.com/pydata/xarray/issues/3339#issuecomment-534722389,https://api.github.com/repos/pydata/xarray/issues/3339,534722389,MDEyOklzc3VlQ29tbWVudDUzNDcyMjM4OQ==,1872600,2019-09-24T19:56:17Z,2019-09-24T19:56:17Z,NONE,"Yep, upgrading to dask=2.4.0 fixed the problem! Phew. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,497823072 https://github.com/pydata/xarray/issues/3339#issuecomment-534710770,https://api.github.com/repos/pydata/xarray/issues/3339,534710770,MDEyOklzc3VlQ29tbWVudDUzNDcxMDc3MA==,1872600,2019-09-24T19:23:25Z,2019-09-24T19:23:25Z,NONE,"@shoyer , indeed, while I have the same xarray=0.13 and numpy=1.17.2 as @jhamman, he has dask=2.4.0 and I have dask=2.2.0. I'll try upgrading and will report back.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,497823072 https://github.com/pydata/xarray/issues/2501#issuecomment-510144707,https://api.github.com/repos/pydata/xarray/issues/2501,510144707,MDEyOklzc3VlQ29tbWVudDUxMDE0NDcwNw==,1872600,2019-07-10T16:59:12Z,2019-07-11T11:47:02Z,NONE,"@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the `feature_id` coordinate to prevent `open_mfdataset` from trying to harmonize that coordinate from all the chunks. So if I use this code, the `open_mfdataset` command finishes: ```python def drop_coords(ds): ds = ds.drop(['reference_time','feature_id']) return ds.reset_coords(drop=True) ``` and I can then add back in the dropped coordinate values at the end: ```python dsets = [xr.open_dataset(f) for f in files[:3]] ds.coords['feature_id'] = dsets[0].coords['feature_id'] ``` I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509379294,https://api.github.com/repos/pydata/xarray/issues/2501,509379294,MDEyOklzc3VlQ29tbWVudDUwOTM3OTI5NA==,1872600,2019-07-08T20:28:48Z,2019-07-08T20:29:20Z,NONE,"@TomAugspurger , I thought @rabernat's suggestion of implementing ```python def drop_coords(ds): return ds.reset_coords(drop=True) ``` would avoid this checking. Did I understand or implement this incorrectly?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509341467,https://api.github.com/repos/pydata/xarray/issues/2501,509341467,MDEyOklzc3VlQ29tbWVudDUwOTM0MTQ2Nw==,1872600,2019-07-08T18:34:02Z,2019-07-08T18:34:02Z,NONE,"@rabernat , to answer your question, if I open just two files: ``` ds = xr.open_mfdataset(files[:2], preprocess=drop_coords, autoclose=True, parallel=True) ``` the resulting dataset is: ``` Dimensions: (feature_id: 2729077, reference_time: 1, time: 2) Coordinates: * reference_time (reference_time) datetime64[ns] 2009-01-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 * time (time) datetime64[ns] 2009-01-01 2009-01-01T01:00:00 Data variables: streamflow (time, feature_id) float64 dask.array q_lateral (time, feature_id) float64 dask.array velocity (time, feature_id) float64 dask.array qSfcLatRunoff (time, feature_id) float64 dask.array qBucket (time, feature_id) float64 dask.array qBtmVertRunoff (time, feature_id) float64 dask.array Attributes: featureType: timeSeries proj4: +proj=longlat +datum=NAD83 +no_defs model_initialization_time: 2009-01-01_00:00:00 station_dimension: feature_id model_output_valid_time: 2009-01-01_00:00:00 stream_order_output: 1 cdm_datatype: Station esri_pe_string: GEOGCS[GCS_North_American_1983,DATUM[D_North_... Conventions: CF-1.6 model_version: NWM 1.2 dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me... ``` ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509340139,https://api.github.com/repos/pydata/xarray/issues/2501,509340139,MDEyOklzc3VlQ29tbWVudDUwOTM0MDEzOQ==,1872600,2019-07-08T18:30:18Z,2019-07-08T18:30:18Z,NONE,"@TomAugspurger, okay, I just ran the above code again and here's what happens: The `open_mfdataset` proceeds nicely on my 8 workers with 40 cores, eventually completing the 8760 `open_dataset` tasks in about 10 minutes. One interesting thing is that the number of tasks keep dropping as time goes on. Not sure why that would be: ![2019-07-08_13-40-09](https://user-images.githubusercontent.com/1872600/60832559-2d5ae080-a18a-11e9-9b0d-e7e39196412d.png) ![2019-07-08_13-42-21](https://user-images.githubusercontent.com/1872600/60832572-3481ee80-a18a-11e9-8bba-e9ee783894da.png) ![2019-07-08_13-43-15](https://user-images.githubusercontent.com/1872600/60832578-377cdf00-a18a-11e9-9b89-0d80353a62c9.png) ![2019-07-08_13-43-58](https://user-images.githubusercontent.com/1872600/60832589-3cda2980-a18a-11e9-989c-0a95754e9e46.png) ![2019-07-08_13-49-57](https://user-images.githubusercontent.com/1872600/60832613-4d8a9f80-a18a-11e9-8c54-7029a3cfd08c.png) The memory usage on the workers seems okay during this process: ![2019-07-08_13-38-52](https://user-images.githubusercontent.com/1872600/60832649-66935080-a18a-11e9-8075-dc2fca79f830.png) Then, despite the tasks showing on the dashboard being completed, the `open_mfdataset` command does not complete, but nothing has died, and I'm not sure what's happening. I check `top` and get this: ![2019-07-08_13-51-13](https://user-images.githubusercontent.com/1872600/60832847-eb7e6a00-a18a-11e9-84cc-18e8796fede9.png) then after about 10 more minutes, I get these warnings: ![2019-07-08_13-56-19](https://user-images.githubusercontent.com/1872600/60832800-c853ba80-a18a-11e9-839a-487fd1276460.png) and then the errors: ```python-traceback distributed.client - WARNING - Couldn't gather 17520 keys, rescheduling {'getattr-fd038834-befa-4a9b-b78f-51f9aa2b28e5': ('tcp://127.0.0.1:45640',), 'drop_coords-39be9e52-59de-4e1f-b6d8-27e7d931b5af': ('tcp://127.0.0.1:55881',), 'drop_coords-8bd07037-9ca4-4f97-83fb-8b02d7ad0333': ('tcp://127.0.0.1:56164',), 'drop_coords-ca3dd72b-e5af-4099-b593-89dc97717718': ('tcp://127.0.0.1:59961',), 'getattr-c0af8992-e928-4d42-9e64-340303143454': ('tcp://127.0.0.1:42989',), 'drop_coords-8cdfe5fb-7a29-4606-8692-efa747be5bc1': ('tcp://127.0.0.1:35445',), 'getattr-03669206-0d26-46a1-988d-690fe830e52f': ... ``` Full error listing here: https://gist.github.com/rsignell-usgs/3b7101966b8c6d05f48a0e01695f35d6 Does this help? I'd be happy to screenshare if that would be useful.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-509282831,https://api.github.com/repos/pydata/xarray/issues/2501,509282831,MDEyOklzc3VlQ29tbWVudDUwOTI4MjgzMQ==,1872600,2019-07-08T15:51:23Z,2019-07-08T15:51:23Z,NONE,"@TomAugspurger, I'm back from vacation now and ready to attack this again. Any updates on your end? ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-506475819,https://api.github.com/repos/pydata/xarray/issues/2501,506475819,MDEyOklzc3VlQ29tbWVudDUwNjQ3NTgxOQ==,1872600,2019-06-27T19:16:28Z,2019-06-27T19:24:31Z,NONE,"I tried this, and either I didn't apply it right, or it didn't work. The memory use kept growing until the process died. My code to process the 8760 netcdf files with `open_mfdataset` looks like this: ```python import xarray as xr from dask.distributed import Client, progress, LocalCluster cluster = LocalCluster() client = Client(cluster) import pandas as pd dates = pd.date_range(start='2009-01-01 00:00',end='2009-12-31 23:00', freq='1h') files = ['./nc/{}/{}.CHRTOUT_DOMAIN1.comp'.format(date.strftime('%Y'),date.strftime('%Y%m%d%H%M')) for date in dates] def drop_coords(ds): return ds.reset_coords(drop=True) ds = xr.open_mfdataset(files, preprocess=drop_coords, autoclose=True, parallel=True) ds1 = ds.chunk(chunks={'time':168, 'feature_id':209929}) import numcodecs numcodecs.blosc.use_threads = False ds1.to_zarr('zarr/2009', mode='w', consolidated=True) ``` I transfered the netcdf files from AWS S3 to my local disk to run this, using this command: ``` rclone sync --include '*.CHRTOUT_DOMAIN1.comp' aws-east:nwm-archive/2009 . --checksum --fast-list --transfers 16 ``` @TomAugspurger, if you could take a look, that would be great, and if you have any ideas of how to make this example simpler/more easily reproducible, please let me know.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2501#issuecomment-497381301,https://api.github.com/repos/pydata/xarray/issues/2501,497381301,MDEyOklzc3VlQ29tbWVudDQ5NzM4MTMwMQ==,1872600,2019-05-30T15:55:56Z,2019-05-30T15:58:48Z,NONE,"I'm hitting some memory issues with using `open_mfdataset` with a cluster also. Specifically, I'm trying to open 8760 NetCDF files with an 8 node, 40 cpu LocalCluster. When I issue: ``` ds = xr.open_mfdataset(files, parallel=True) ``` all looks good on the Dask dashboard: ![2019-05-30_9-55-05](https://user-images.githubusercontent.com/1872600/58641001-51442000-82c8-11e9-81e0-9580ec2271b1.png) ![2019-05-30_9-54-49](https://user-images.githubusercontent.com/1872600/58641007-530de380-82c8-11e9-9c1f-46e5fca187da.png) and the tasks complete with no errors in about 4 minutes. Then 4 more minutes go by before I get a bunch of errors like: ``` distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting distributed.nanny - WARNING - Worker process 26054 was killed by unknown signal distributed.nanny - WARNING - Restarting worker ``` and my cell doesn't complete. Any suggestions?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074 https://github.com/pydata/xarray/issues/2368#issuecomment-443227318,https://api.github.com/repos/pydata/xarray/issues/2368,443227318,MDEyOklzc3VlQ29tbWVudDQ0MzIyNzMxOA==,1872600,2018-11-30T14:53:13Z,2018-11-30T14:53:13Z,NONE,"@nordam , can you provide an example?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,350899839 https://github.com/pydata/xarray/issues/2503#issuecomment-432743208,https://api.github.com/repos/pydata/xarray/issues/2503,432743208,MDEyOklzc3VlQ29tbWVudDQzMjc0MzIwOA==,1872600,2018-10-24T17:02:34Z,2018-10-24T17:02:34Z,NONE,"The version that is working in [@rabernat's esgf binder env](https://github.com/rabernat/pangeo_esgf_demo/blob/master/binder/environment.yml) is: ``` libnetcdf 4.6.1 h9cd6fdc_11 conda-forge ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666 https://github.com/pydata/xarray/issues/2503#issuecomment-432706068,https://api.github.com/repos/pydata/xarray/issues/2503,432706068,MDEyOklzc3VlQ29tbWVudDQzMjcwNjA2OA==,1872600,2018-10-24T15:27:33Z,2018-10-24T15:27:33Z,NONE,"I fired up my notebook on @rabernat's binder env and it worked fine also: https://nbviewer.jupyter.org/gist/rsignell-usgs/aebdac44a1d773b99673cb132c2ef5eb","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666 https://github.com/pydata/xarray/issues/2503#issuecomment-432416114,https://api.github.com/repos/pydata/xarray/issues/2503,432416114,MDEyOklzc3VlQ29tbWVudDQzMjQxNjExNA==,1872600,2018-10-23T20:55:42Z,2018-10-23T20:55:42Z,NONE,"@lesserwhirls , is this the issue you are referring to? https://github.com/Unidata/netcdf4-python/issues/836","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666 https://github.com/pydata/xarray/issues/2503#issuecomment-432415704,https://api.github.com/repos/pydata/xarray/issues/2503,432415704,MDEyOklzc3VlQ29tbWVudDQzMjQxNTcwNA==,1872600,2018-10-23T20:54:24Z,2018-10-23T20:54:24Z,NONE,"@jhamman, doesn't this dask status plot tell us that multiple workers are connecting and getting data? ![2018-10-23_16-53-20](https://user-images.githubusercontent.com/1872600/47390007-4ac34980-d6e4-11e8-8f54-b8f7b6d0c25d.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666 https://github.com/pydata/xarray/issues/2503#issuecomment-432389980,https://api.github.com/repos/pydata/xarray/issues/2503,432389980,MDEyOklzc3VlQ29tbWVudDQzMjM4OTk4MA==,1872600,2018-10-23T19:39:09Z,2018-10-23T19:39:09Z,NONE,"Perhaps it's also worth mentioning that I don't see any errors on the THREDDS server side on either the tomcat catalina or thredds threddsServlet logs. @lesserwhirls, any ideas?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666 https://github.com/pydata/xarray/issues/2503#issuecomment-432374559,https://api.github.com/repos/pydata/xarray/issues/2503,432374559,MDEyOklzc3VlQ29tbWVudDQzMjM3NDU1OQ==,1872600,2018-10-23T18:53:28Z,2018-10-23T19:39:08Z,NONE,"FWIW, in my workflow there was nothing fundamentally wrong, meaning that the requests worked for a while, but eventually would die with the `NetCDF: Malformed or inaccessible DAP DDS` message. So for just a short time period (in this case 50 time steps, 2 chunks in time), it would usually work: https://nbviewer.jupyter.org/gist/rsignell-usgs/1155c76ed3440858ced8132e4cd81df4 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666 https://github.com/pydata/xarray/issues/2503#issuecomment-432367931,https://api.github.com/repos/pydata/xarray/issues/2503,432367931,MDEyOklzc3VlQ29tbWVudDQzMjM2NzkzMQ==,1872600,2018-10-23T18:34:48Z,2018-10-23T19:18:52Z,NONE,"I tried a similar workflow last week with an AWS kubernetes cluster with opendap endpoints and it also failed: https://nbviewer.jupyter.org/gist/rsignell-usgs/8583ea8f8b5e1c926b0409bd536095a9 I thought it was likely some intermittent problem that wasn't handled well. In my case after a while I get: ``` distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=_ElementwiseFunctionArray(LazilyOuterIndexedArray(array=, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(, encoded_fill_values={1e+37}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(375, 400, None), slice(0, 7, None), slice(0, 670, None), slice(0, 300, None))) kwargs: {} Exception: OSError(-72, 'NetCDF: Malformed or inaccessible DAP DDS') ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,373121666 https://github.com/pydata/xarray/issues/2323#issuecomment-408606913,https://api.github.com/repos/pydata/xarray/issues/2323,408606913,MDEyOklzc3VlQ29tbWVudDQwODYwNjkxMw==,1872600,2018-07-28T13:07:39Z,2018-07-28T13:07:39Z,NONE,"@shoyer, if we a `znetcdf` library like `h5netcdf` we could get `mf_dataset` ""for free"" though, right? Zarr definitely has more and different compression options than NetCDF -- does that make this concept problematic?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,345354038 https://github.com/pydata/xarray/issues/2233#issuecomment-397596002,https://api.github.com/repos/pydata/xarray/issues/2233,397596002,MDEyOklzc3VlQ29tbWVudDM5NzU5NjAwMg==,1872600,2018-06-15T11:44:35Z,2018-06-15T11:44:35Z,NONE,"@rabernat , this unstructured grid model output follows the [UGRID Conventions](http://ugrid-conventions.github.io/ugrid-conventions/), which layer on top of the CF Conventions. The issue Xarray is having here is with the vertical coordinate however, so this issue could arise with any CF convention model where the vertical stretching function varies over the domain. As requested, here is the ncdump of this URL: ``` jovyan@jupyter-rsignell-2dusgs:~$ ncdump -h http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc netcdf NECOFS_GOM3_FORECAST { dimensions: time = UNLIMITED ; // (145 currently) maxStrlen64 = 64 ; nele = 99137 ; node = 53087 ; siglay = 40 ; three = 3 ; variables: float lon(node) ; lon:long_name = ""nodal longitude"" ; lon:standard_name = ""longitude"" ; lon:units = ""degrees_east"" ; float lat(node) ; lat:long_name = ""nodal latitude"" ; lat:standard_name = ""latitude"" ; lat:units = ""degrees_north"" ; float xc(nele) ; xc:long_name = ""zonal x-coordinate"" ; xc:units = ""meters"" ; float yc(nele) ; yc:long_name = ""zonal y-coordinate"" ; yc:units = ""meters"" ; float lonc(nele) ; lonc:long_name = ""zonal longitude"" ; lonc:standard_name = ""longitude"" ; lonc:units = ""degrees_east"" ; float latc(nele) ; latc:long_name = ""zonal latitude"" ; latc:standard_name = ""latitude"" ; latc:units = ""degrees_north"" ; float siglay(siglay, node) ; siglay:long_name = ""Sigma Layers"" ; siglay:standard_name = ""ocean_sigma_coordinate"" ; siglay:positive = ""up"" ; siglay:valid_min = -1. ; siglay:valid_max = 0. ; siglay:formula_terms = ""sigma: siglay eta: zeta depth: h"" ; float h(node) ; h:long_name = ""Bathymetry"" ; h:standard_name = ""sea_floor_depth_below_geoid"" ; h:units = ""m"" ; h:coordinates = ""lat lon"" ; h:type = ""data"" ; h:mesh = ""fvcom_mesh"" ; h:location = ""node"" ; int nv(three, nele) ; nv:long_name = ""nodes surrounding element"" ; nv:cf_role = ""face_node_connnectivity"" ; nv:start_index = 1 ; float time(time) ; time:long_name = ""time"" ; time:units = ""days since 1858-11-17 00:00:00"" ; time:format = ""modified julian day (MJD)"" ; time:time_zone = ""UTC"" ; time:standard_name = ""time"" ; float zeta(time, node) ; zeta:long_name = ""Water Surface Elevation"" ; zeta:units = ""meters"" ; zeta:standard_name = ""sea_surface_height_above_geoid"" ; zeta:coordinates = ""time lat lon"" ; zeta:type = ""data"" ; zeta:missing_value = -999. ; zeta:field = ""elev, scalar"" ; zeta:coverage_content_type = ""modelResult"" ; zeta:mesh = ""fvcom_mesh"" ; zeta:location = ""node"" ; int nbe(three, nele) ; nbe:long_name = ""elements surrounding each element"" ; float u(time, siglay, nele) ; u:long_name = ""Eastward Water Velocity"" ; u:units = ""meters s-1"" ; u:type = ""data"" ; u:missing_value = -999. ; u:field = ""ua, scalar"" ; u:coverage_content_type = ""modelResult"" ; u:standard_name = ""eastward_sea_water_velocity"" ; u:coordinates = ""time siglay latc lonc"" ; u:mesh = ""fvcom_mesh"" ; u:location = ""face"" ; float v(time, siglay, nele) ; v:long_name = ""Northward Water Velocity"" ; v:units = ""meters s-1"" ; v:type = ""data"" ; v:missing_value = -999. ; v:field = ""va, scalar"" ; v:coverage_content_type = ""modelResult"" ; v:standard_name = ""northward_sea_water_velocity"" ; v:coordinates = ""time siglay latc lonc"" ; v:mesh = ""fvcom_mesh"" ; v:location = ""face"" ; float ww(time, siglay, nele) ; ww:long_name = ""Upward Water Velocity"" ; ww:units = ""meters s-1"" ; ww:type = ""data"" ; ww:coverage_content_type = ""modelResult"" ; ww:standard_name = ""upward_sea_water_velocity"" ; ww:coordinates = ""time siglay latc lonc"" ; ww:mesh = ""fvcom_mesh"" ; ww:location = ""face"" ; float ua(time, nele) ; ua:long_name = ""Vertically Averaged x-velocity"" ; ua:units = ""meters s-1"" ; ua:type = ""data"" ; ua:missing_value = -999. ; ua:field = ""ua, scalar"" ; ua:coverage_content_type = ""modelResult"" ; ua:standard_name = ""barotropic_eastward_sea_water_velocity"" ; ua:coordinates = ""time latc lonc"" ; ua:mesh = ""fvcom_mesh"" ; ua:location = ""face"" ; float va(time, nele) ; va:long_name = ""Vertically Averaged y-velocity"" ; va:units = ""meters s-1"" ; va:type = ""data"" ; va:missing_value = -999. ; va:field = ""va, scalar"" ; va:coverage_content_type = ""modelResult"" ; va:standard_name = ""barotropic_northward_sea_water_velocity"" ; va:coordinates = ""time latc lonc"" ; va:mesh = ""fvcom_mesh"" ; va:location = ""face"" ; float temp(time, siglay, node) ; temp:long_name = ""temperature"" ; temp:standard_name = ""sea_water_potential_temperature"" ; temp:units = ""degrees_C"" ; temp:coordinates = ""time siglay lat lon"" ; temp:type = ""data"" ; temp:coverage_content_type = ""modelResult"" ; temp:mesh = ""fvcom_mesh"" ; temp:location = ""node"" ; float salinity(time, siglay, node) ; salinity:long_name = ""salinity"" ; salinity:standard_name = ""sea_water_salinity"" ; salinity:units = ""0.001"" ; salinity:coordinates = ""time siglay lat lon"" ; salinity:type = ""data"" ; salinity:coverage_content_type = ""modelResult"" ; salinity:mesh = ""fvcom_mesh"" ; salinity:location = ""node"" ; int fvcom_mesh ; fvcom_mesh:cf_role = ""mesh_topology"" ; fvcom_mesh:topology_dimension = 2 ; fvcom_mesh:node_coordinates = ""lon lat"" ; fvcom_mesh:face_coordinates = ""lonc latc"" ; fvcom_mesh:face_node_connectivity = ""nv"" ; // global attributes: :title = ""NECOFS GOM3 (FVCOM) - Northeast US - Latest Forecast"" ; :institution = ""School for Marine Science and Technology"" ; :source = ""FVCOM_3.0"" ; :Conventions = ""CF-1.0, UGRID-1.0"" ; :summary = ""Latest forecast from the FVCOM Northeast Coastal Ocean Forecast System using an newer, higher-resolution GOM3 mesh (GOM2 was the preceding mesh)"" ; ```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,332471780 https://github.com/pydata/xarray/pull/2131#issuecomment-395535173,https://api.github.com/repos/pydata/xarray/issues/2131,395535173,MDEyOklzc3VlQ29tbWVudDM5NTUzNTE3Mw==,1872600,2018-06-07T19:20:24Z,2018-06-07T19:20:24Z,NONE,Sounds good. Thanks @shoyer!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930 https://github.com/pydata/xarray/pull/2131#issuecomment-395524953,https://api.github.com/repos/pydata/xarray/issues/2131,395524953,MDEyOklzc3VlQ29tbWVudDM5NTUyNDk1Mw==,1872600,2018-06-07T18:45:42Z,2018-06-07T18:45:42Z,NONE,Might this PR warrant a new minor release?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930 https://github.com/pydata/xarray/pull/2131#issuecomment-395476675,https://api.github.com/repos/pydata/xarray/issues/2131,395476675,MDEyOklzc3VlQ29tbWVudDM5NTQ3NjY3NQ==,1872600,2018-06-07T16:07:14Z,2018-06-07T16:11:08Z,NONE,"@jhamman woohoo! Cell [20] completes nicely now: https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594 I'm getting some errors in Cell [20], but I think those are unrelated and didn't affect the successful completion of the tasks, right? (this is on an HPC system)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930 https://github.com/pydata/xarray/pull/2131#issuecomment-395447613,https://api.github.com/repos/pydata/xarray/issues/2131,395447613,MDEyOklzc3VlQ29tbWVudDM5NTQ0NzYxMw==,1872600,2018-06-07T14:46:21Z,2018-06-07T14:47:07Z,NONE,"@jhamman , although I'm getting distributed workers to compute the mean from a bunch of images, I'm getting a ""Failed to Serialize"" error in cell [23] of this notebook: https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594 If this is a bug, I think it was there before the recent updates. You should be able to run this notebook without modification. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930 https://github.com/pydata/xarray/pull/2131#issuecomment-394887291,https://api.github.com/repos/pydata/xarray/issues/2131,394887291,MDEyOklzc3VlQ29tbWVudDM5NDg4NzI5MQ==,1872600,2018-06-05T23:00:51Z,2018-06-05T23:13:08Z,NONE,"@jhamman , still very much interested in this -- could the existing functionality be merged and enhanced later?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930 https://github.com/pydata/xarray/pull/2131#issuecomment-389330810,https://api.github.com/repos/pydata/xarray/issues/2131,389330810,MDEyOklzc3VlQ29tbWVudDM4OTMzMDgxMA==,1872600,2018-05-15T22:15:22Z,2018-05-15T22:15:22Z,NONE,"It's working for me! https://gist.github.com/rsignell-usgs/ef81fb4306dac3a2406d0adb575b340f","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930 https://github.com/pydata/xarray/pull/2131#issuecomment-389277628,https://api.github.com/repos/pydata/xarray/issues/2131,389277628,MDEyOklzc3VlQ29tbWVudDM4OTI3NzYyOA==,1872600,2018-05-15T19:02:06Z,2018-05-15T19:02:06Z,NONE,@jhamman should I test this out on my original workflow or wait a bit?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323017930 https://github.com/pydata/xarray/issues/2121#issuecomment-388786292,https://api.github.com/repos/pydata/xarray/issues/2121,388786292,MDEyOklzc3VlQ29tbWVudDM4ODc4NjI5Mg==,1872600,2018-05-14T11:34:45Z,2018-05-14T11:34:45Z,NONE,"@jhamman what kind of expertise would it take to do this job (e.g, it just a copy-and-paste with some small changes that a newbie could probably do, or would it be best for core dev team)? And is there any workaround that can be used in the interim?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,322445312 https://github.com/pydata/xarray/pull/1811#issuecomment-382466626,https://api.github.com/repos/pydata/xarray/issues/1811,382466626,MDEyOklzc3VlQ29tbWVudDM4MjQ2NjYyNg==,1872600,2018-04-18T17:30:25Z,2018-04-18T17:32:21Z,NONE,"@jhamman, I was just using `client = Client()`. Should I be using `LocalCluster` instead? (there is no kubernetes on this JupyterHub). Also, is there a better place to have this sort of discussion or is it okay here?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795 https://github.com/pydata/xarray/pull/1811#issuecomment-382421609,https://api.github.com/repos/pydata/xarray/issues/1811,382421609,MDEyOklzc3VlQ29tbWVudDM4MjQyMTYwOQ==,1872600,2018-04-18T15:11:02Z,2018-04-18T15:14:12Z,NONE,"@jhamman, I tried the same code with a single-threaded scheduler: ```python ... delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False) persist_store = delayed_store.persist(retries=100, get=dask.local.get_sync) ``` and it ran to completion with no errors (taking 2 hours for 100GB to Zarr). What should I try next?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,286542795 https://github.com/pydata/xarray/pull/1811#issuecomment-381969631,https://api.github.com/repos/pydata/xarray/issues/1811,381969631,MDEyOklzc3VlQ29tbWVudDM4MTk2OTYzMQ==,1872600,2018-04-17T12:12:15Z,2018-04-17T12:15:19Z,NONE,"@jhamman , I'm trying to test `compute=False` out this code: ```python # Write National Water Model data to Zarr from dask.distributed import Client import pandas as pd import xarray as xr import s3fs import zarr if __name__ == '__main__': client = Client() root = '/projects/water/nwm/data/forcing_short_range/' # Local Files # root = 'http://tds.renci.org:8080/thredds/dodsC/nwm/forcing_short_range/' # OPenDAP bucket_endpoint='https://s3.us-west-1.amazonaws.com/' # bucket_endpoint='https://iu.jetstream-cloud.org:8080' f_zarr = 'rsignell/nwm/test_week' dates = pd.date_range(start='2018-04-01T00:00', end='2018-04-07T23:00', freq='H') urls = ['{}{}/nwm.t{}z.short_range.forcing.f001.conus.nc'.format(root,a.strftime('%Y%m%d'),a.strftime('%H')) for a in dates] ds = xr.open_mfdataset(urls, concat_dim='time', lock=True) ds = ds.drop(['ProjectionCoordinateSystem']) fs = s3fs.S3FileSystem(anon=False, client_kwargs=dict(endpoint_url=bucket_endpoint)) d = s3fs.S3Map(f_zarr, s3=fs) compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2) encoding = {vname: {'compressor': compressor} for vname in ds.data_vars} delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False) persist_store = delayed_store.persist(retries=100) ``` and after 20 seconds or so, the process dies with this error: ```python-traceback /home/rsignell/my-conda-envs/zarr/lib/python3.6/site-packages/distributed/worker.py:742: UserWarning: Large object of size 1.23 MB detected in task graph: ( My understanding of CF standard names is that `forecast_period` should be equal to the difference between time and `forecast_reference_time`, i.e., `forecast_period` = `time` - `forecast_reference_time`. If you specified your `time_offset` variable with units in the form ""hours"", then it would be decoded to `timedelta64`, along with `datetime64` for time and time_run, so xarray's arithmetic would actually satisfy this identity. You might find this useful if you only wanted to include two of these variables and wanted to calculate the third on the fly. On the other hand, you probably don't want to convert the `Tper` variable to `timedelta64`. Technically, it is also a time period, but it's not a variable that makes sense to compare to time. I understand the potential issue here, but I think Xarray should follow [CF conventions for time](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#time-coordinate), and only treat variables as time coordinates if they have valid CF time units (`