github: issue_comments: 59 rows where user = 1872600 sorted by updated

59 rows where user = 1872600 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1078439763	https://github.com/pydata/xarray/issues/2233#issuecomment-1078439763	https://api.github.com/repos/pydata/xarray/issues/2233	IC_kwDOAMm_X85AR69T	rsignell-usgs 1872600	2022-03-24T22:26:07Z	2023-07-16T15:13:39Z	NONE	https://github.com/pydata/xarray/issues/2233#issuecomment-397602084 Would the new xarray index/coordinate internal refactoring now allow us to address this issue? cc @kthyng	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 332471780
1056917100	https://github.com/pydata/xarray/issues/6318#issuecomment-1056917100	https://api.github.com/repos/pydata/xarray/issues/6318	IC_kwDOAMm_X84-_0Zs	rsignell-usgs 1872600	2022-03-02T13:13:24Z	2022-03-02T13:14:40Z	NONE	While I was typing this, @keewis provided a workaround here: https://github.com/fsspec/kerchunk/issues/130#issuecomment-1056897730 ! Leaving this open until I know whether this is something best left for users to implement or something to be handled in xarray. #6318	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	'numpy.datetime64' object has no attribute 'year' when writing to zarr or netcdf 1157163377
985769385	https://github.com/pydata/xarray/pull/4140#issuecomment-985769385	https://api.github.com/repos/pydata/xarray/issues/4140	IC_kwDOAMm_X846waWp	rsignell-usgs 1872600	2021-12-03T19:22:13Z	2021-12-03T19:22:13Z	NONE	Thanks @snowman2 ! Done in https://github.com/corteva/rioxarray/issues/440	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	support file-like objects in xarray.open_rasterio 636451398
985530331	https://github.com/pydata/xarray/pull/4140#issuecomment-985530331	https://api.github.com/repos/pydata/xarray/issues/4140	IC_kwDOAMm_X846vf_b	rsignell-usgs 1872600	2021-12-03T13:41:35Z	2021-12-03T13:43:33Z	NONE	I'd like to use this cool new rasterio/fspec functionality in xarray! I must be doing something wrong here in cell [5]: https://nbviewer.org/gist/rsignell-usgs/dbf3d8e952895ca255f300790759c60f	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	support file-like objects in xarray.open_rasterio 636451398
832761716	https://github.com/pydata/xarray/issues/2697#issuecomment-832761716	https://api.github.com/repos/pydata/xarray/issues/2697	MDEyOklzc3VlQ29tbWVudDgzMjc2MTcxNg==	rsignell-usgs 1872600	2021-05-05T15:02:55Z	2021-05-05T15:04:59Z	NONE	It's worth pointing out that you can create FileReferenceSystem JSON to accomplish many of the tasks we used to use NcML for: * create a single virtual dataset that points to a collection of files * modify dataset and variable attributes It also has the nice feature that it makes your dataset faster to work with on the cloud because the map to the data is loaded in one shot!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	read ncml files to create multifile datasets 401874795
741889071	https://github.com/pydata/xarray/pull/4461#issuecomment-741889071	https://api.github.com/repos/pydata/xarray/issues/4461	MDEyOklzc3VlQ29tbWVudDc0MTg4OTA3MQ==	rsignell-usgs 1872600	2020-12-09T16:31:37Z	2021-01-19T14:46:49Z	NONE	I'm really looking forward to getting this merged so I can open the National Water Model Zarr I created last week thusly: `python ds = xr.open_dataset(s3://noaa-nwm-retro-v2.0-zarr-pds', engine='zarr', backend_kwargs={'consolidated':True, "storage_options": {'anon':True}})` @martindurant tells me this takes only 3 s with the new async capability! That would be pretty awesome, because now it takes 1min 15s to open this dataset!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow fsspec/zarr/mfdataset 709187212
745520766	https://github.com/pydata/xarray/issues/4122#issuecomment-745520766	https://api.github.com/repos/pydata/xarray/issues/4122	MDEyOklzc3VlQ29tbWVudDc0NTUyMDc2Ng==	rsignell-usgs 1872600	2020-12-15T19:39:16Z	2020-12-15T19:39:16Z	NONE	I'm closing this the recommended approach for writing NetCDF to object stroage is to write locally, then push.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Document writing netcdf from xarray directly to S3 631085856
741942375	https://github.com/pydata/xarray/pull/4461#issuecomment-741942375	https://api.github.com/repos/pydata/xarray/issues/4461	MDEyOklzc3VlQ29tbWVudDc0MTk0MjM3NQ==	rsignell-usgs 1872600	2020-12-09T17:50:04Z	2020-12-09T17:50:04Z	NONE	@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 }	Allow fsspec/zarr/mfdataset 709187212
727222443	https://github.com/pydata/xarray/issues/4470#issuecomment-727222443	https://api.github.com/repos/pydata/xarray/issues/4470	MDEyOklzc3VlQ29tbWVudDcyNzIyMjQ0Mw==	rsignell-usgs 1872600	2020-11-14T15:22:49Z	2020-11-14T15:23:28Z	NONE	Just a note that the only unstructured grid (triangular mesh) example I have is: http://gallery.pangeo.io/repos/rsignell-usgs/esip-gallery/01_hurricane_ike_water_levels.html I figured out how to make that notebook from the info at: https://earthsim.holoviz.org/user_guide/Visualizing_Meshes.html The "earthsim" project was developed by the Holoviz team (@jbednar & co) funded by USACE when @dharhas was there. Would be cool to revive this. The Holoviz team and USACE might not have been aware of the UGRID conventions when they developed that code, so currently it's a bit awkward to go from a UGRID-compliant NetCDF dataset to visualization with Holoviz (as you can see from the Hurricane Ike notebook). That would be low-hanging fruit for any future effort.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray / vtk integration 710357592
680138664	https://github.com/pydata/xarray/pull/3804#issuecomment-680138664	https://api.github.com/repos/pydata/xarray/issues/3804	MDEyOklzc3VlQ29tbWVudDY4MDEzODY2NA==	rsignell-usgs 1872600	2020-08-25T16:39:34Z	2020-08-25T17:07:42Z	NONE	Drumroll.... @dcherian, epic cymbal crash?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow chunk_store argument when opening Zarr datasets 572251686
673433045	https://github.com/pydata/xarray/issues/4338#issuecomment-673433045	https://api.github.com/repos/pydata/xarray/issues/4338	MDEyOklzc3VlQ29tbWVudDY3MzQzMzA0NQ==	rsignell-usgs 1872600	2020-08-13T11:54:10Z	2020-08-13T12:04:11Z	NONE	@nicholaskgeorge your minimal test would be monotonic if `square2` and `square4` had `x` coordinates `[3,4,5]` instead of `[2,3,4]`, but it seems `combine_by_coords` doesn't mind that?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Combining tiled data sets in xarray 677773328
665163886	https://github.com/pydata/xarray/pull/3804#issuecomment-665163886	https://api.github.com/repos/pydata/xarray/issues/3804	MDEyOklzc3VlQ29tbWVudDY2NTE2Mzg4Ng==	rsignell-usgs 1872600	2020-07-28T17:10:47Z	2020-07-28T17:11:33Z	NONE	@dcherian , are we just waiting for one more "+1" here, or are the failing checks related to this PR?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow chunk_store argument when opening Zarr datasets 572251686
642841283	https://github.com/pydata/xarray/issues/4082#issuecomment-642841283	https://api.github.com/repos/pydata/xarray/issues/4082	MDEyOklzc3VlQ29tbWVudDY0Mjg0MTI4Mw==	rsignell-usgs 1872600	2020-06-11T17:58:30Z	2020-06-11T18:00:28Z	NONE	@jswhit, do you know if https://github.com/Unidata/netcdf4-python is doing the caching? Just to catch you up quickly, we have a workflow that opens a bunch of opendap datasets, and while the default `file_cache_maxsize=128` works on Linux, if this exceeds 25 files on windows it fails: ``` xr.set_options(file_cache_maxsize=25) # works xr.set_options(file_cache_maxsize=26) # fails ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	"write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
641236117	https://github.com/pydata/xarray/issues/4082#issuecomment-641236117	https://api.github.com/repos/pydata/xarray/issues/4082	MDEyOklzc3VlQ29tbWVudDY0MTIzNjExNw==	rsignell-usgs 1872600	2020-06-09T11:42:38Z	2020-06-09T11:42:38Z	NONE	@DennisHeimbigner , do you not agree that this issue on windows is related to the number of files cached from OPeNDAP requests? Clearly there are some differences with cache files on windows: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg11190.html	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	"write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
640808125	https://github.com/pydata/xarray/issues/4082#issuecomment-640808125	https://api.github.com/repos/pydata/xarray/issues/4082	MDEyOklzc3VlQ29tbWVudDY0MDgwODEyNQ==	rsignell-usgs 1872600	2020-06-08T18:51:37Z	2020-06-08T18:51:37Z	NONE	@DennisHeimbigner I don't understand how it can be a DAP or code issue since: - it runs on Linux without errors with default `file_cache_maxsize=128`. - it runs on Windows without errors with `file_cache_maxsize=25` Right? Or am I missing something?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	"write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
640590247	https://github.com/pydata/xarray/issues/4082#issuecomment-640590247	https://api.github.com/repos/pydata/xarray/issues/4082	MDEyOklzc3VlQ29tbWVudDY0MDU5MDI0Nw==	rsignell-usgs 1872600	2020-06-08T13:05:28Z	2020-06-08T13:05:28Z	NONE	Or perhaps Unidata's @WardF, who leads NetCDF development.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	"write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
640548620	https://github.com/pydata/xarray/issues/4122#issuecomment-640548620	https://api.github.com/repos/pydata/xarray/issues/4122	MDEyOklzc3VlQ29tbWVudDY0MDU0ODYyMA==	rsignell-usgs 1872600	2020-06-08T11:36:14Z	2020-06-08T11:37:21Z	NONE	@martindurant, I asked @ajelenak offline and he reminded me that: File metadata are dispersed throughout an HDF5 [and NetCDF4] file in order to support writing and modifying array sizes at any time of execution Looking forward to `simplecache::` for writing in `fsspec=0.7.5`!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Document writing netcdf from xarray directly to S3 631085856
639771646	https://github.com/pydata/xarray/issues/4122#issuecomment-639771646	https://api.github.com/repos/pydata/xarray/issues/4122	MDEyOklzc3VlQ29tbWVudDYzOTc3MTY0Ng==	rsignell-usgs 1872600	2020-06-05T20:08:37Z	2020-06-05T20:54:36Z	NONE	Okay @scottyhq, I tried setting `engine='h5netcdf'`, but still got: `OSError: Seek only available in read mode` Thinking about this a little more, it's pretty clear why writing NetCDF to S3 would require seek mode. I asked @martindurant about supporting seek for writing in `fsspec` and he said that would be pretty hard. And in fact, the performance probably would be pretty terrible as lots of little writes would be required. So maybe it's best just to write netcdf files locally and then push them to S3. And to facilitate that, @martindurant merged a PR yesterday to enable `simplecache` for writing in `fsspec`, so after doing: `pip install git+https://github.com/intake/filesystem_spec.git` in my environment, this now works: ```python import xarray as xr import fsspec ds = xr.open_dataset('http://geoport.usgs.esipfed.org/thredds/dodsC' '/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf') outfile = fsspec.open('simplecache::s3://chs-pangeo-data-bucket/rsignell/foo2.nc', mode='wb', s3=dict(profile='default')) with outfile as f: ds.to_netcdf(f) `` (Here I'm tellingfsspec` to use the AWS credentials in my "default" profile) Thanks Martin!!!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Document writing netcdf from xarray directly to S3 631085856
639450932	https://github.com/pydata/xarray/issues/4082#issuecomment-639450932	https://api.github.com/repos/pydata/xarray/issues/4082	MDEyOklzc3VlQ29tbWVudDYzOTQ1MDkzMg==	rsignell-usgs 1872600	2020-06-05T12:26:14Z	2020-06-05T12:26:14Z	NONE	@shoyer, unfortunately these opendap datasets contain only 1 time record (1 daily value) each. And it works fine on Linux with `file_cache_maxsize=128`, so it must be some Windows cache thing right? So since I just picked `file_cache_maxsize=10` arbitrarily, I thought it would be useful to see what the maximum value was. Using the good old bi-section method, I determined that (for this case anyway), the maximum size that works is 25. In other words: ``` xr.set_options(file_cache_maxsize=25) # works xr.set_options(file_cache_maxsize=26) # fails ``` I would bet money that Unidata's @DennisHeimbigner knows what's going on here!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	"write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
639111588	https://github.com/pydata/xarray/issues/4082#issuecomment-639111588	https://api.github.com/repos/pydata/xarray/issues/4082	MDEyOklzc3VlQ29tbWVudDYzOTExMTU4OA==	rsignell-usgs 1872600	2020-06-04T20:55:49Z	2020-06-04T20:55:49Z	NONE	@EliT1626 , I confirmed that this problem exists on Windows, but not on Linux. The error: `IOError: [Errno -37] NetCDF: Write to read only: 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201703/oisst-avhrr-v02r01.20170304.nc'` suggested some kind of cache problem, and as you noted it always fails after a certain number of dates, so I tried increasing the number of cached files from the default 128 to 256: `xr.set_options(file_cache_maxsize=256)` but that had no effect. Just to see if it would fail earlier, I then tried decreasing the number of cached files: `xr.set_options(file_cache_maxsize=10)` and to my surprise, it ran all the way through: https://nbviewer.jupyter.org/gist/rsignell-usgs/c52fadd8626734bdd32a432279bc6779 I'm hoping someone who worked on the caching (@shoyer?) might have some idea of what is going on, but at least you can execute your workflow now on windows!	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	"write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
592094766	https://github.com/pydata/xarray/pull/3804#issuecomment-592094766	https://api.github.com/repos/pydata/xarray/issues/3804	MDEyOklzc3VlQ29tbWVudDU5MjA5NDc2Ng==	rsignell-usgs 1872600	2020-02-27T17:59:13Z	2020-02-27T17:59:13Z	NONE	This PR is motivated by the work described in this Medium blog post	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow chunk_store argument when opening Zarr datasets 572251686
534722389	https://github.com/pydata/xarray/issues/3339#issuecomment-534722389	https://api.github.com/repos/pydata/xarray/issues/3339	MDEyOklzc3VlQ29tbWVudDUzNDcyMjM4OQ==	rsignell-usgs 1872600	2019-09-24T19:56:17Z	2019-09-24T19:56:17Z	NONE	Yep, upgrading to dask=2.4.0 fixed the problem! Phew.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Version 0.13 broke my ufunc 497823072
534710770	https://github.com/pydata/xarray/issues/3339#issuecomment-534710770	https://api.github.com/repos/pydata/xarray/issues/3339	MDEyOklzc3VlQ29tbWVudDUzNDcxMDc3MA==	rsignell-usgs 1872600	2019-09-24T19:23:25Z	2019-09-24T19:23:25Z	NONE	@shoyer , indeed, while I have the same xarray=0.13 and numpy=1.17.2 as @jhamman, he has dask=2.4.0 and I have dask=2.2.0. I'll try upgrading and will report back.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Version 0.13 broke my ufunc 497823072
510144707	https://github.com/pydata/xarray/issues/2501#issuecomment-510144707	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUxMDE0NDcwNw==	rsignell-usgs 1872600	2019-07-10T16:59:12Z	2019-07-11T11:47:02Z	NONE	@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the `feature_id` coordinate to prevent `open_mfdataset` from trying to harmonize that coordinate from all the chunks. So if I use this code, the `open_mfdataset` command finishes: `python def drop_coords(ds): ds = ds.drop(['reference_time','feature_id']) return ds.reset_coords(drop=True)` and I can then add back in the dropped coordinate values at the end: `python dsets = [xr.open_dataset(f) for f in files[:3]] ds.coords['feature_id'] = dsets[0].coords['feature_id']` I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
509379294	https://github.com/pydata/xarray/issues/2501#issuecomment-509379294	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwOTM3OTI5NA==	rsignell-usgs 1872600	2019-07-08T20:28:48Z	2019-07-08T20:29:20Z	NONE	@TomAugspurger , I thought @rabernat's suggestion of implementing `python def drop_coords(ds): return ds.reset_coords(drop=True)` would avoid this checking. Did I understand or implement this incorrectly?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
509341467	https://github.com/pydata/xarray/issues/2501#issuecomment-509341467	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwOTM0MTQ2Nw==	rsignell-usgs 1872600	2019-07-08T18:34:02Z	2019-07-08T18:34:02Z	NONE	@rabernat , to answer your question, if I open just two files: `ds = xr.open_mfdataset(files[:2], preprocess=drop_coords, autoclose=True, parallel=True)` the resulting dataset is: <xarray.Dataset> Dimensions: (feature_id: 2729077, reference_time: 1, time: 2) Coordinates: * reference_time (reference_time) datetime64[ns] 2009-01-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 * time (time) datetime64[ns] 2009-01-01 2009-01-01T01:00:00 Data variables: streamflow (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> q_lateral (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> velocity (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> qSfcLatRunoff (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> qBucket (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> qBtmVertRunoff (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> Attributes: featureType: timeSeries proj4: +proj=longlat +datum=NAD83 +no_defs model_initialization_time: 2009-01-01_00:00:00 station_dimension: feature_id model_output_valid_time: 2009-01-01_00:00:00 stream_order_output: 1 cdm_datatype: Station esri_pe_string: GEOGCS[GCS_North_American_1983,DATUM[D_North_... Conventions: CF-1.6 model_version: NWM 1.2 dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
509340139	https://github.com/pydata/xarray/issues/2501#issuecomment-509340139	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwOTM0MDEzOQ==	rsignell-usgs 1872600	2019-07-08T18:30:18Z	2019-07-08T18:30:18Z	NONE	@TomAugspurger, okay, I just ran the above code again and here's what happens: The `open_mfdataset` proceeds nicely on my 8 workers with 40 cores, eventually completing the 8760 `open_dataset` tasks in about 10 minutes. One interesting thing is that the number of tasks keep dropping as time goes on. Not sure why that would be: The memory usage on the workers seems okay during this process: Then, despite the tasks showing on the dashboard being completed, the `open_mfdataset` command does not complete, but nothing has died, and I'm not sure what's happening. I check `top` and get this: then after about 10 more minutes, I get these warnings: and then the errors: python-traceback distributed.client - WARNING - Couldn't gather 17520 keys, rescheduling {'getattr-fd038834-befa-4a9b-b78f-51f9aa2b28e5': ('tcp://127.0.0.1:45640',), 'drop_coords-39be9e52-59de-4e1f-b6d8-27e7d931b5af': ('tcp://127.0.0.1:55881',), 'drop_coords-8bd07037-9ca4-4f97-83fb-8b02d7ad0333': ('tcp://127.0.0.1:56164',), 'drop_coords-ca3dd72b-e5af-4099-b593-89dc97717718': ('tcp://127.0.0.1:59961',), 'getattr-c0af8992-e928-4d42-9e64-340303143454': ('tcp://127.0.0.1:42989',), 'drop_coords-8cdfe5fb-7a29-4606-8692-efa747be5bc1': ('tcp://127.0.0.1:35445',), 'getattr-03669206-0d26-46a1-988d-690fe830e52f': ... Full error listing here: https://gist.github.com/rsignell-usgs/3b7101966b8c6d05f48a0e01695f35d6 Does this help? I'd be happy to screenshare if that would be useful.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
509282831	https://github.com/pydata/xarray/issues/2501#issuecomment-509282831	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwOTI4MjgzMQ==	rsignell-usgs 1872600	2019-07-08T15:51:23Z	2019-07-08T15:51:23Z	NONE	@TomAugspurger, I'm back from vacation now and ready to attack this again. Any updates on your end?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
506475819	https://github.com/pydata/xarray/issues/2501#issuecomment-506475819	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDUwNjQ3NTgxOQ==	rsignell-usgs 1872600	2019-06-27T19:16:28Z	2019-06-27T19:24:31Z	NONE	I tried this, and either I didn't apply it right, or it didn't work. The memory use kept growing until the process died. My code to process the 8760 netcdf files with `open_mfdataset` looks like this: ```python import xarray as xr from dask.distributed import Client, progress, LocalCluster cluster = LocalCluster() client = Client(cluster) import pandas as pd dates = pd.date_range(start='2009-01-01 00:00',end='2009-12-31 23:00', freq='1h') files = ['./nc/{}/{}.CHRTOUT_DOMAIN1.comp'.format(date.strftime('%Y'),date.strftime('%Y%m%d%H%M')) for date in dates] def drop_coords(ds): return ds.reset_coords(drop=True) ds = xr.open_mfdataset(files, preprocess=drop_coords, autoclose=True, parallel=True) ds1 = ds.chunk(chunks={'time':168, 'feature_id':209929}) import numcodecs numcodecs.blosc.use_threads = False ds1.to_zarr('zarr/2009', mode='w', consolidated=True) ``` I transfered the netcdf files from AWS S3 to my local disk to run this, using this command: `rclone sync --include '*.CHRTOUT_DOMAIN1.comp' aws-east:nwm-archive/2009 . --checksum --fast-list --transfers 16` @TomAugspurger, if you could take a look, that would be great, and if you have any ideas of how to make this example simpler/more easily reproducible, please let me know.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
497381301	https://github.com/pydata/xarray/issues/2501#issuecomment-497381301	https://api.github.com/repos/pydata/xarray/issues/2501	MDEyOklzc3VlQ29tbWVudDQ5NzM4MTMwMQ==	rsignell-usgs 1872600	2019-05-30T15:55:56Z	2019-05-30T15:58:48Z	NONE	I'm hitting some memory issues with using `open_mfdataset` with a cluster also. Specifically, I'm trying to open 8760 NetCDF files with an 8 node, 40 cpu LocalCluster. When I issue: `ds = xr.open_mfdataset(files, parallel=True)` all looks good on the Dask dashboard: and the tasks complete with no errors in about 4 minutes. Then 4 more minutes go by before I get a bunch of errors like: `distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting distributed.nanny - WARNING - Worker process 26054 was killed by unknown signal distributed.nanny - WARNING - Restarting worker` and my cell doesn't complete. Any suggestions?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	open_mfdataset usage and limitations. 372848074
443227318	https://github.com/pydata/xarray/issues/2368#issuecomment-443227318	https://api.github.com/repos/pydata/xarray/issues/2368	MDEyOklzc3VlQ29tbWVudDQ0MzIyNzMxOA==	rsignell-usgs 1872600	2018-11-30T14:53:13Z	2018-11-30T14:53:13Z	NONE	@nordam , can you provide an example?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Let's list all the netCDF files that xarray can't open 350899839
432743208	https://github.com/pydata/xarray/issues/2503#issuecomment-432743208	https://api.github.com/repos/pydata/xarray/issues/2503	MDEyOklzc3VlQ29tbWVudDQzMjc0MzIwOA==	rsignell-usgs 1872600	2018-10-24T17:02:34Z	2018-10-24T17:02:34Z	NONE	The version that is working in @rabernat's esgf binder env is: `libnetcdf 4.6.1 h9cd6fdc_11 conda-forge`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problems with distributed and opendap netCDF endpoint 373121666
432706068	https://github.com/pydata/xarray/issues/2503#issuecomment-432706068	https://api.github.com/repos/pydata/xarray/issues/2503	MDEyOklzc3VlQ29tbWVudDQzMjcwNjA2OA==	rsignell-usgs 1872600	2018-10-24T15:27:33Z	2018-10-24T15:27:33Z	NONE	I fired up my notebook on @rabernat's binder env and it worked fine also: https://nbviewer.jupyter.org/gist/rsignell-usgs/aebdac44a1d773b99673cb132c2ef5eb	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problems with distributed and opendap netCDF endpoint 373121666
432416114	https://github.com/pydata/xarray/issues/2503#issuecomment-432416114	https://api.github.com/repos/pydata/xarray/issues/2503	MDEyOklzc3VlQ29tbWVudDQzMjQxNjExNA==	rsignell-usgs 1872600	2018-10-23T20:55:42Z	2018-10-23T20:55:42Z	NONE	@lesserwhirls , is this the issue you are referring to? https://github.com/Unidata/netcdf4-python/issues/836	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problems with distributed and opendap netCDF endpoint 373121666
432415704	https://github.com/pydata/xarray/issues/2503#issuecomment-432415704	https://api.github.com/repos/pydata/xarray/issues/2503	MDEyOklzc3VlQ29tbWVudDQzMjQxNTcwNA==	rsignell-usgs 1872600	2018-10-23T20:54:24Z	2018-10-23T20:54:24Z	NONE	@jhamman, doesn't this dask status plot tell us that multiple workers are connecting and getting data?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problems with distributed and opendap netCDF endpoint 373121666
432389980	https://github.com/pydata/xarray/issues/2503#issuecomment-432389980	https://api.github.com/repos/pydata/xarray/issues/2503	MDEyOklzc3VlQ29tbWVudDQzMjM4OTk4MA==	rsignell-usgs 1872600	2018-10-23T19:39:09Z	2018-10-23T19:39:09Z	NONE	Perhaps it's also worth mentioning that I don't see any errors on the THREDDS server side on either the tomcat catalina or thredds threddsServlet logs. @lesserwhirls, any ideas?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problems with distributed and opendap netCDF endpoint 373121666
432374559	https://github.com/pydata/xarray/issues/2503#issuecomment-432374559	https://api.github.com/repos/pydata/xarray/issues/2503	MDEyOklzc3VlQ29tbWVudDQzMjM3NDU1OQ==	rsignell-usgs 1872600	2018-10-23T18:53:28Z	2018-10-23T19:39:08Z	NONE	FWIW, in my workflow there was nothing fundamentally wrong, meaning that the requests worked for a while, but eventually would die with the `NetCDF: Malformed or inaccessible DAP DDS` message. So for just a short time period (in this case 50 time steps, 2 chunks in time), it would usually work: https://nbviewer.jupyter.org/gist/rsignell-usgs/1155c76ed3440858ced8132e4cd81df4	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problems with distributed and opendap netCDF endpoint 373121666
432367931	https://github.com/pydata/xarray/issues/2503#issuecomment-432367931	https://api.github.com/repos/pydata/xarray/issues/2503	MDEyOklzc3VlQ29tbWVudDQzMjM2NzkzMQ==	rsignell-usgs 1872600	2018-10-23T18:34:48Z	2018-10-23T19:18:52Z	NONE	I tried a similar workflow last week with an AWS kubernetes cluster with opendap endpoints and it also failed: https://nbviewer.jupyter.org/gist/rsignell-usgs/8583ea8f8b5e1c926b0409bd536095a9 I thought it was likely some intermittent problem that wasn't handled well. In my case after a while I get: distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=_ElementwiseFunctionArray(LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7ff93cbbd828>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x7ff945421378>, encoded_fill_values={1e+37}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(375, 400, None), slice(0, 7, None), slice(0, 670, None), slice(0, 300, None))) kwargs: {} Exception: OSError(-72, 'NetCDF: Malformed or inaccessible DAP DDS')	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problems with distributed and opendap netCDF endpoint 373121666
408606913	https://github.com/pydata/xarray/issues/2323#issuecomment-408606913	https://api.github.com/repos/pydata/xarray/issues/2323	MDEyOklzc3VlQ29tbWVudDQwODYwNjkxMw==	rsignell-usgs 1872600	2018-07-28T13:07:39Z	2018-07-28T13:07:39Z	NONE	@shoyer, if we a `znetcdf` library like `h5netcdf` we could get `mf_dataset` "for free" though, right? Zarr definitely has more and different compression options than NetCDF -- does that make this concept problematic?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	znetcdf: h5netcdf analog for zarr? 345354038
397596002	https://github.com/pydata/xarray/issues/2233#issuecomment-397596002	https://api.github.com/repos/pydata/xarray/issues/2233	MDEyOklzc3VlQ29tbWVudDM5NzU5NjAwMg==	rsignell-usgs 1872600	2018-06-15T11:44:35Z	2018-06-15T11:44:35Z	NONE	@rabernat , this unstructured grid model output follows the UGRID Conventions, which layer on top of the CF Conventions. The issue Xarray is having here is with the vertical coordinate however, so this issue could arise with any CF convention model where the vertical stretching function varies over the domain. As requested, here is the ncdump of this URL: ``` jovyan@jupyter-rsignell-2dusgs:~$ ncdump -h http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc netcdf NECOFS_GOM3_FORECAST { dimensions: time = UNLIMITED ; // (145 currently) maxStrlen64 = 64 ; nele = 99137 ; node = 53087 ; siglay = 40 ; three = 3 ; variables: float lon(node) ; lon:long_name = "nodal longitude" ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; float lat(node) ; lat:long_name = "nodal latitude" ; lat:standard_name = "latitude" ; lat:units = "degrees_north" ; float xc(nele) ; xc:long_name = "zonal x-coordinate" ; xc:units = "meters" ; float yc(nele) ; yc:long_name = "zonal y-coordinate" ; yc:units = "meters" ; float lonc(nele) ; lonc:long_name = "zonal longitude" ; lonc:standard_name = "longitude" ; lonc:units = "degrees_east" ; float latc(nele) ; latc:long_name = "zonal latitude" ; latc:standard_name = "latitude" ; latc:units = "degrees_north" ; float siglay(siglay, node) ; siglay:long_name = "Sigma Layers" ; siglay:standard_name = "ocean_sigma_coordinate" ; siglay:positive = "up" ; siglay:valid_min = -1. ; siglay:valid_max = 0. ; siglay:formula_terms = "sigma: siglay eta: zeta depth: h" ; float h(node) ; h:long_name = "Bathymetry" ; h:standard_name = "sea_floor_depth_below_geoid" ; h:units = "m" ; h:coordinates = "lat lon" ; h:type = "data" ; h:mesh = "fvcom_mesh" ; h:location = "node" ; int nv(three, nele) ; nv:long_name = "nodes surrounding element" ; nv:cf_role = "face_node_connnectivity" ; nv:start_index = 1 ; float time(time) ; time:long_name = "time" ; time:units = "days since 1858-11-17 00:00:00" ; time:format = "modified julian day (MJD)" ; time:time_zone = "UTC" ; time:standard_name = "time" ; float zeta(time, node) ; zeta:long_name = "Water Surface Elevation" ; zeta:units = "meters" ; zeta:standard_name = "sea_surface_height_above_geoid" ; zeta:coordinates = "time lat lon" ; zeta:type = "data" ; zeta:missing_value = -999. ; zeta:field = "elev, scalar" ; zeta:coverage_content_type = "modelResult" ; zeta:mesh = "fvcom_mesh" ; zeta:location = "node" ; int nbe(three, nele) ; nbe:long_name = "elements surrounding each element" ; float u(time, siglay, nele) ; u:long_name = "Eastward Water Velocity" ; u:units = "meters s-1" ; u:type = "data" ; u:missing_value = -999. ; u:field = "ua, scalar" ; u:coverage_content_type = "modelResult" ; u:standard_name = "eastward_sea_water_velocity" ; u:coordinates = "time siglay latc lonc" ; u:mesh = "fvcom_mesh" ; u:location = "face" ; float v(time, siglay, nele) ; v:long_name = "Northward Water Velocity" ; v:units = "meters s-1" ; v:type = "data" ; v:missing_value = -999. ; v:field = "va, scalar" ; v:coverage_content_type = "modelResult" ; v:standard_name = "northward_sea_water_velocity" ; v:coordinates = "time siglay latc lonc" ; v:mesh = "fvcom_mesh" ; v:location = "face" ; float ww(time, siglay, nele) ; ww:long_name = "Upward Water Velocity" ; ww:units = "meters s-1" ; ww:type = "data" ; ww:coverage_content_type = "modelResult" ; ww:standard_name = "upward_sea_water_velocity" ; ww:coordinates = "time siglay latc lonc" ; ww:mesh = "fvcom_mesh" ; ww:location = "face" ; float ua(time, nele) ; ua:long_name = "Vertically Averaged x-velocity" ; ua:units = "meters s-1" ; ua:type = "data" ; ua:missing_value = -999. ; ua:field = "ua, scalar" ; ua:coverage_content_type = "modelResult" ; ua:standard_name = "barotropic_eastward_sea_water_velocity" ; ua:coordinates = "time latc lonc" ; ua:mesh = "fvcom_mesh" ; ua:location = "face" ; float va(time, nele) ; va:long_name = "Vertically Averaged y-velocity" ; va:units = "meters s-1" ; va:type = "data" ; va:missing_value = -999. ; va:field = "va, scalar" ; va:coverage_content_type = "modelResult" ; va:standard_name = "barotropic_northward_sea_water_velocity" ; va:coordinates = "time latc lonc" ; va:mesh = "fvcom_mesh" ; va:location = "face" ; float temp(time, siglay, node) ; temp:long_name = "temperature" ; temp:standard_name = "sea_water_potential_temperature" ; temp:units = "degrees_C" ; temp:coordinates = "time siglay lat lon" ; temp:type = "data" ; temp:coverage_content_type = "modelResult" ; temp:mesh = "fvcom_mesh" ; temp:location = "node" ; float salinity(time, siglay, node) ; salinity:long_name = "salinity" ; salinity:standard_name = "sea_water_salinity" ; salinity:units = "0.001" ; salinity:coordinates = "time siglay lat lon" ; salinity:type = "data" ; salinity:coverage_content_type = "modelResult" ; salinity:mesh = "fvcom_mesh" ; salinity:location = "node" ; int fvcom_mesh ; fvcom_mesh:cf_role = "mesh_topology" ; fvcom_mesh:topology_dimension = 2 ; fvcom_mesh:node_coordinates = "lon lat" ; fvcom_mesh:face_coordinates = "lonc latc" ; fvcom_mesh:face_node_connectivity = "nv" ; // global attributes: :title = "NECOFS GOM3 (FVCOM) - Northeast US - Latest Forecast" ; :institution = "School for Marine Science and Technology" ; :source = "FVCOM_3.0" ; :Conventions = "CF-1.0, UGRID-1.0" ; :summary = "Latest forecast from the FVCOM Northeast Coastal Ocean Forecast System using an newer, higher-resolution GOM3 mesh (GOM2 was the preceding mesh)" ; ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 332471780
395535173	https://github.com/pydata/xarray/pull/2131#issuecomment-395535173	https://api.github.com/repos/pydata/xarray/issues/2131	MDEyOklzc3VlQ29tbWVudDM5NTUzNTE3Mw==	rsignell-usgs 1872600	2018-06-07T19:20:24Z	2018-06-07T19:20:24Z	NONE	Sounds good. Thanks @shoyer!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/pickle rasterio 323017930
395524953	https://github.com/pydata/xarray/pull/2131#issuecomment-395524953	https://api.github.com/repos/pydata/xarray/issues/2131	MDEyOklzc3VlQ29tbWVudDM5NTUyNDk1Mw==	rsignell-usgs 1872600	2018-06-07T18:45:42Z	2018-06-07T18:45:42Z	NONE	Might this PR warrant a new minor release?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/pickle rasterio 323017930
395476675	https://github.com/pydata/xarray/pull/2131#issuecomment-395476675	https://api.github.com/repos/pydata/xarray/issues/2131	MDEyOklzc3VlQ29tbWVudDM5NTQ3NjY3NQ==	rsignell-usgs 1872600	2018-06-07T16:07:14Z	2018-06-07T16:11:08Z	NONE	@jhamman woohoo! Cell [20] completes nicely now: https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594 I'm getting some errors in Cell [20], but I think those are unrelated and didn't affect the successful completion of the tasks, right? (this is on an HPC system)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/pickle rasterio 323017930
395447613	https://github.com/pydata/xarray/pull/2131#issuecomment-395447613	https://api.github.com/repos/pydata/xarray/issues/2131	MDEyOklzc3VlQ29tbWVudDM5NTQ0NzYxMw==	rsignell-usgs 1872600	2018-06-07T14:46:21Z	2018-06-07T14:47:07Z	NONE	@jhamman , although I'm getting distributed workers to compute the mean from a bunch of images, I'm getting a "Failed to Serialize" error in cell [23] of this notebook: https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594 If this is a bug, I think it was there before the recent updates. You should be able to run this notebook without modification.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/pickle rasterio 323017930
394887291	https://github.com/pydata/xarray/pull/2131#issuecomment-394887291	https://api.github.com/repos/pydata/xarray/issues/2131	MDEyOklzc3VlQ29tbWVudDM5NDg4NzI5MQ==	rsignell-usgs 1872600	2018-06-05T23:00:51Z	2018-06-05T23:13:08Z	NONE	@jhamman , still very much interested in this -- could the existing functionality be merged and enhanced later?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/pickle rasterio 323017930
389330810	https://github.com/pydata/xarray/pull/2131#issuecomment-389330810	https://api.github.com/repos/pydata/xarray/issues/2131	MDEyOklzc3VlQ29tbWVudDM4OTMzMDgxMA==	rsignell-usgs 1872600	2018-05-15T22:15:22Z	2018-05-15T22:15:22Z	NONE	It's working for me! https://gist.github.com/rsignell-usgs/ef81fb4306dac3a2406d0adb575b340f	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/pickle rasterio 323017930
389277628	https://github.com/pydata/xarray/pull/2131#issuecomment-389277628	https://api.github.com/repos/pydata/xarray/issues/2131	MDEyOklzc3VlQ29tbWVudDM4OTI3NzYyOA==	rsignell-usgs 1872600	2018-05-15T19:02:06Z	2018-05-15T19:02:06Z	NONE	@jhamman should I test this out on my original workflow or wait a bit?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Feature/pickle rasterio 323017930
388786292	https://github.com/pydata/xarray/issues/2121#issuecomment-388786292	https://api.github.com/repos/pydata/xarray/issues/2121	MDEyOklzc3VlQ29tbWVudDM4ODc4NjI5Mg==	rsignell-usgs 1872600	2018-05-14T11:34:45Z	2018-05-14T11:34:45Z	NONE	@jhamman what kind of expertise would it take to do this job (e.g, it just a copy-and-paste with some small changes that a newbie could probably do, or would it be best for core dev team)? And is there any workaround that can be used in the interim?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	rasterio backend should use DataStorePickleMixin (or something similar) 322445312
382466626	https://github.com/pydata/xarray/pull/1811#issuecomment-382466626	https://api.github.com/repos/pydata/xarray/issues/1811	MDEyOklzc3VlQ29tbWVudDM4MjQ2NjYyNg==	rsignell-usgs 1872600	2018-04-18T17:30:25Z	2018-04-18T17:32:21Z	NONE	@jhamman, I was just using `client = Client()`. Should I be using `LocalCluster` instead? (there is no kubernetes on this JupyterHub). Also, is there a better place to have this sort of discussion or is it okay here?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Compute==False for to_zarr and to_netcdf 286542795
382421609	https://github.com/pydata/xarray/pull/1811#issuecomment-382421609	https://api.github.com/repos/pydata/xarray/issues/1811	MDEyOklzc3VlQ29tbWVudDM4MjQyMTYwOQ==	rsignell-usgs 1872600	2018-04-18T15:11:02Z	2018-04-18T15:14:12Z	NONE	@jhamman, I tried the same code with a single-threaded scheduler: `python ... delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False) persist_store = delayed_store.persist(retries=100, get=dask.local.get_sync)` and it ran to completion with no errors (taking 2 hours for 100GB to Zarr). What should I try next?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Compute==False for to_zarr and to_netcdf 286542795
381969631	https://github.com/pydata/xarray/pull/1811#issuecomment-381969631	https://api.github.com/repos/pydata/xarray/issues/1811	MDEyOklzc3VlQ29tbWVudDM4MTk2OTYzMQ==	rsignell-usgs 1872600	2018-04-17T12:12:15Z	2018-04-17T12:15:19Z	NONE	@jhamman , I'm trying to test `compute=False` out this code: ```python Write National Water Model data to Zarr from dask.distributed import Client import pandas as pd import xarray as xr import s3fs import zarr if name == 'main': `client = Client() root = '/projects/water/nwm/data/forcing_short_range/' # Local Files` root = 'http://tds.renci.org:8080/thredds/dodsC/nwm/forcing_short_range/' # OPenDAP `bucket_endpoint='https://s3.us-west-1.amazonaws.com/'` bucket_endpoint='https://iu.jetstream-cloud.org:8080' f_zarr = 'rsignell/nwm/test_week' dates = pd.date_range(start='2018-04-01T00:00', end='2018-04-07T23:00', freq='H') urls = ['{}{}/nwm.t{}z.short_range.forcing.f001.conus.nc'.format(root,a.strftime('%Y%m%d'),a.strftime('%H')) for a in dates] ds = xr.open_mfdataset(urls, concat_dim='time', lock=True) ds = ds.drop(['ProjectionCoordinateSystem']) fs = s3fs.S3FileSystem(anon=False, client_kwargs=dict(endpoint_url=bucket_endpoint)) d = s3fs.S3Map(f_zarr, s3=fs) compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2) encoding = {vname: {'compressor': compressor} for vname in ds.data_vars} delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False) persist_store = delayed_store.persist(retries=100) ``` and after 20 seconds or so, the process dies with this error: ```python-traceback /home/rsignell/my-conda-envs/zarr/lib/python3.6/site-packages/distributed/worker.py:742: UserWarning: Large object of size 1.23 MB detected in task graph: (<xarray.backends.zarr.ZarrStore object at 0x7f5d8 ... deedecefab224') Consider scattering large objects ahead of time with client.scatter to reduce scheduler burden and keep data on workers `future = client.submit(func, big_data) # bad big_future = client.scatter(big_data) # good future = client.submit(func, big_future) # good` % (format_bytes(len(b)), s)) ``` Do you have suggestions on how to modify my code?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	WIP: Compute==False for to_zarr and to_netcdf 286542795
339093278	https://github.com/pydata/xarray/issues/1621#issuecomment-339093278	https://api.github.com/repos/pydata/xarray/issues/1621	MDEyOklzc3VlQ29tbWVudDMzOTA5MzI3OA==	rsignell-usgs 1872600	2017-10-24T18:50:21Z	2017-10-24T18:50:21Z	NONE	I vote for `1` also. How many makes a quorum? :smile_cat:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) 264321376
338172936	https://github.com/pydata/xarray/issues/1621#issuecomment-338172936	https://api.github.com/repos/pydata/xarray/issues/1621	MDEyOklzc3VlQ29tbWVudDMzODE3MjkzNg==	rsignell-usgs 1872600	2017-10-20T10:46:53Z	2017-10-20T10:50:18Z	NONE	On https://stackoverflow.com/a/46675990/2005869, @shoyer explains: My understanding of CF standard names is that `forecast_period` should be equal to the difference between time and `forecast_reference_time`, i.e., `forecast_period` = `time` - `forecast_reference_time`. If you specified your `time_offset` variable with units in the form "hours", then it would be decoded to `timedelta64`, along with `datetime64` for time and time_run, so xarray's arithmetic would actually satisfy this identity. You might find this useful if you only wanted to include two of these variables and wanted to calculate the third on the fly. On the other hand, you probably don't want to convert the `Tper` variable to `timedelta64`. Technically, it is also a time period, but it's not a variable that makes sense to compare to time. I understand the potential issue here, but I think Xarray should follow CF conventions for time, and only treat variables as time coordinates if they have valid CF time units (`<time unit> since <date>`). We know of thousands of datasets (every dataset with waves!) where the current Xarray behavior is a problem.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) 264321376
217183543	https://github.com/pydata/xarray/pull/844#issuecomment-217183543	https://api.github.com/repos/pydata/xarray/issues/844	MDEyOklzc3VlQ29tbWVudDIxNzE4MzU0Mw==	rsignell-usgs 1872600	2016-05-05T15:19:55Z	2016-05-05T15:19:55Z	NONE	It also seems consistent to me to return a Dataset.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Add a filter_by_attrs method to Dataset 153126324
216944939	https://github.com/pydata/xarray/issues/567#issuecomment-216944939	https://api.github.com/repos/pydata/xarray/issues/567	MDEyOklzc3VlQ29tbWVudDIxNjk0NDkzOQ==	rsignell-usgs 1872600	2016-05-04T17:45:06Z	2016-05-04T17:45:06Z	NONE	:+1: -- I think this would be super-useful general functionality for the xarray community that doesn't come with any downside.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Best way to find data variables by standard_name 105688738
169553668	https://github.com/pydata/xarray/issues/704#issuecomment-169553668	https://api.github.com/repos/pydata/xarray/issues/704	MDEyOklzc3VlQ29tbWVudDE2OTU1MzY2OA==	rsignell-usgs 1872600	2016-01-07T05:19:04Z	2016-01-07T05:19:04Z	NONE	I think it would be nicer to get rid of the floating black lines (axis) altogether	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Complete renaming xray -> xarray 124867009
169299086	https://github.com/pydata/xarray/issues/704#issuecomment-169299086	https://api.github.com/repos/pydata/xarray/issues/704	MDEyOklzc3VlQ29tbWVudDE2OTI5OTA4Ng==	rsignell-usgs 1872600	2016-01-06T11:10:04Z	2016-01-06T11:10:33Z	NONE	Yet another vote for `import xarray as xr`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Complete renaming xray -> xarray 124867009
139055592	https://github.com/pydata/xarray/issues/567#issuecomment-139055592	https://api.github.com/repos/pydata/xarray/issues/567	MDEyOklzc3VlQ29tbWVudDEzOTA1NTU5Mg==	rsignell-usgs 1872600	2015-09-09T21:48:02Z	2015-09-09T21:48:02Z	NONE	I was thinking that the data variables that matched a specified `standard_name` would be a subset of the variables in the `data_vars` object.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Best way to find data variables by standard_name 105688738
121802784	https://github.com/pydata/xarray/issues/476#issuecomment-121802784	https://api.github.com/repos/pydata/xarray/issues/476	MDEyOklzc3VlQ29tbWVudDEyMTgwMjc4NA==	rsignell-usgs 1872600	2015-07-16T02:17:31Z	2015-07-16T02:17:31Z	NONE	Indeed, with master, it's working. http://nbviewer.ipython.org/gist/rsignell-usgs/047235496029529585cc Closing....	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf failing for datasets with a single time value 95222803

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

59 rows where user = 1872600 sorted by updated_at descending

xr.set_options(file_cache_maxsize=26) # fails

xr.set_options(file_cache_maxsize=26) # fails

Write National Water Model data to Zarr

root = 'http://tds.renci.org:8080/thredds/dodsC/nwm/forcing_short_range/' # OPenDAP

bucket_endpoint='https://iu.jetstream-cloud.org:8080'

Advanced export