home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

59 rows where user = 1872600 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 23

  • Feature/pickle rasterio 7
  • open_mfdataset usage and limitations. 7
  • Problems with distributed and opendap netCDF endpoint 7
  • "write to read-only" Error in xarray.open_mfdataset() with opendap datasets 6
  • WIP: Compute==False for to_zarr and to_netcdf 3
  • Allow chunk_store argument when opening Zarr datasets 3
  • Document writing netcdf from xarray directly to S3 3
  • Best way to find data variables by standard_name 2
  • Complete renaming xray -> xarray 2
  • Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) 2
  • Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 2
  • Version 0.13 broke my ufunc 2
  • support file-like objects in xarray.open_rasterio 2
  • Allow fsspec/zarr/mfdataset 2
  • to_netcdf failing for datasets with a single time value 1
  • Add a filter_by_attrs method to Dataset 1
  • rasterio backend should use DataStorePickleMixin (or something similar) 1
  • znetcdf: h5netcdf analog for zarr? 1
  • Let's list all the netCDF files that xarray can't open 1
  • read ncml files to create multifile datasets 1
  • Combining tiled data sets in xarray 1
  • xarray / vtk integration 1
  • 'numpy.datetime64' object has no attribute 'year' when writing to zarr or netcdf 1

user 1

  • rsignell-usgs · 59 ✖

author_association 1

  • NONE 59
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1078439763 https://github.com/pydata/xarray/issues/2233#issuecomment-1078439763 https://api.github.com/repos/pydata/xarray/issues/2233 IC_kwDOAMm_X85AR69T rsignell-usgs 1872600 2022-03-24T22:26:07Z 2023-07-16T15:13:39Z NONE

https://github.com/pydata/xarray/issues/2233#issuecomment-397602084 Would the new xarray index/coordinate internal refactoring now allow us to address this issue?

cc @kthyng

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 332471780
1056917100 https://github.com/pydata/xarray/issues/6318#issuecomment-1056917100 https://api.github.com/repos/pydata/xarray/issues/6318 IC_kwDOAMm_X84-_0Zs rsignell-usgs 1872600 2022-03-02T13:13:24Z 2022-03-02T13:14:40Z NONE

While I was typing this, @keewis provided a workaround here: https://github.com/fsspec/kerchunk/issues/130#issuecomment-1056897730 ! Leaving this open until I know whether this is something best left for users to implement or something to be handled in xarray. #6318

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'numpy.datetime64' object has no attribute 'year' when writing to zarr or netcdf 1157163377
985769385 https://github.com/pydata/xarray/pull/4140#issuecomment-985769385 https://api.github.com/repos/pydata/xarray/issues/4140 IC_kwDOAMm_X846waWp rsignell-usgs 1872600 2021-12-03T19:22:13Z 2021-12-03T19:22:13Z NONE

Thanks @snowman2 ! Done in https://github.com/corteva/rioxarray/issues/440

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support file-like objects in xarray.open_rasterio 636451398
985530331 https://github.com/pydata/xarray/pull/4140#issuecomment-985530331 https://api.github.com/repos/pydata/xarray/issues/4140 IC_kwDOAMm_X846vf_b rsignell-usgs 1872600 2021-12-03T13:41:35Z 2021-12-03T13:43:33Z NONE

I'd like to use this cool new rasterio/fspec functionality in xarray!

I must be doing something wrong here in cell [5]: https://nbviewer.org/gist/rsignell-usgs/dbf3d8e952895ca255f300790759c60f

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  support file-like objects in xarray.open_rasterio 636451398
832761716 https://github.com/pydata/xarray/issues/2697#issuecomment-832761716 https://api.github.com/repos/pydata/xarray/issues/2697 MDEyOklzc3VlQ29tbWVudDgzMjc2MTcxNg== rsignell-usgs 1872600 2021-05-05T15:02:55Z 2021-05-05T15:04:59Z NONE

It's worth pointing out that you can create FileReferenceSystem JSON to accomplish many of the tasks we used to use NcML for: * create a single virtual dataset that points to a collection of files * modify dataset and variable attributes

It also has the nice feature that it makes your dataset faster to work with on the cloud because the map to the data is loaded in one shot!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  read ncml files to create multifile datasets 401874795
741889071 https://github.com/pydata/xarray/pull/4461#issuecomment-741889071 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTg4OTA3MQ== rsignell-usgs 1872600 2020-12-09T16:31:37Z 2021-01-19T14:46:49Z NONE

I'm really looking forward to getting this merged so I can open the National Water Model Zarr I created last week thusly:
python ds = xr.open_dataset(s3://noaa-nwm-retro-v2.0-zarr-pds', engine='zarr', backend_kwargs={'consolidated':True, "storage_options": {'anon':True}}) @martindurant tells me this takes only 3 s with the new async capability!

That would be pretty awesome, because now it takes 1min 15s to open this dataset!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
745520766 https://github.com/pydata/xarray/issues/4122#issuecomment-745520766 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDc0NTUyMDc2Ng== rsignell-usgs 1872600 2020-12-15T19:39:16Z 2020-12-15T19:39:16Z NONE

I'm closing this the recommended approach for writing NetCDF to object stroage is to write locally, then push.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
741942375 https://github.com/pydata/xarray/pull/4461#issuecomment-741942375 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0MTk0MjM3NQ== rsignell-usgs 1872600 2020-12-09T17:50:04Z 2020-12-09T17:50:04Z NONE

@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
727222443 https://github.com/pydata/xarray/issues/4470#issuecomment-727222443 https://api.github.com/repos/pydata/xarray/issues/4470 MDEyOklzc3VlQ29tbWVudDcyNzIyMjQ0Mw== rsignell-usgs 1872600 2020-11-14T15:22:49Z 2020-11-14T15:23:28Z NONE

Just a note that the only unstructured grid (triangular mesh) example I have is: http://gallery.pangeo.io/repos/rsignell-usgs/esip-gallery/01_hurricane_ike_water_levels.html

I figured out how to make that notebook from the info at: https://earthsim.holoviz.org/user_guide/Visualizing_Meshes.html

The "earthsim" project was developed by the Holoviz team (@jbednar & co) funded by USACE when @dharhas was there. Would be cool to revive this.

The Holoviz team and USACE might not have been aware of the UGRID conventions when they developed that code, so currently it's a bit awkward to go from a UGRID-compliant NetCDF dataset to visualization with Holoviz (as you can see from the Hurricane Ike notebook). That would be low-hanging fruit for any future effort.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray / vtk integration 710357592
680138664 https://github.com/pydata/xarray/pull/3804#issuecomment-680138664 https://api.github.com/repos/pydata/xarray/issues/3804 MDEyOklzc3VlQ29tbWVudDY4MDEzODY2NA== rsignell-usgs 1872600 2020-08-25T16:39:34Z 2020-08-25T17:07:42Z NONE

Drumroll.... @dcherian, epic cymbal crash?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow chunk_store argument when opening Zarr datasets 572251686
673433045 https://github.com/pydata/xarray/issues/4338#issuecomment-673433045 https://api.github.com/repos/pydata/xarray/issues/4338 MDEyOklzc3VlQ29tbWVudDY3MzQzMzA0NQ== rsignell-usgs 1872600 2020-08-13T11:54:10Z 2020-08-13T12:04:11Z NONE

@nicholaskgeorge your minimal test would be monotonic if square2 and square4 had x coordinates [3,4,5] instead of [2,3,4], but it seems combine_by_coords doesn't mind that?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Combining tiled data sets in xarray 677773328
665163886 https://github.com/pydata/xarray/pull/3804#issuecomment-665163886 https://api.github.com/repos/pydata/xarray/issues/3804 MDEyOklzc3VlQ29tbWVudDY2NTE2Mzg4Ng== rsignell-usgs 1872600 2020-07-28T17:10:47Z 2020-07-28T17:11:33Z NONE

@dcherian , are we just waiting for one more "+1" here, or are the failing checks related to this PR?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow chunk_store argument when opening Zarr datasets 572251686
642841283 https://github.com/pydata/xarray/issues/4082#issuecomment-642841283 https://api.github.com/repos/pydata/xarray/issues/4082 MDEyOklzc3VlQ29tbWVudDY0Mjg0MTI4Mw== rsignell-usgs 1872600 2020-06-11T17:58:30Z 2020-06-11T18:00:28Z NONE

@jswhit, do you know if https://github.com/Unidata/netcdf4-python is doing the caching?

Just to catch you up quickly, we have a workflow that opens a bunch of opendap datasets, and while the default file_cache_maxsize=128 works on Linux, if this exceeds 25 files on windows it fails: ``` xr.set_options(file_cache_maxsize=25) # works

xr.set_options(file_cache_maxsize=26) # fails

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  "write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
641236117 https://github.com/pydata/xarray/issues/4082#issuecomment-641236117 https://api.github.com/repos/pydata/xarray/issues/4082 MDEyOklzc3VlQ29tbWVudDY0MTIzNjExNw== rsignell-usgs 1872600 2020-06-09T11:42:38Z 2020-06-09T11:42:38Z NONE

@DennisHeimbigner , do you not agree that this issue on windows is related to the number of files cached from OPeNDAP requests? Clearly there are some differences with cache files on windows: https://www.unidata.ucar.edu/support/help/MailArchives/netcdf/msg11190.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  "write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
640808125 https://github.com/pydata/xarray/issues/4082#issuecomment-640808125 https://api.github.com/repos/pydata/xarray/issues/4082 MDEyOklzc3VlQ29tbWVudDY0MDgwODEyNQ== rsignell-usgs 1872600 2020-06-08T18:51:37Z 2020-06-08T18:51:37Z NONE

@DennisHeimbigner I don't understand how it can be a DAP or code issue since: - it runs on Linux without errors with default file_cache_maxsize=128. - it runs on Windows without errors with file_cache_maxsize=25 Right? Or am I missing something?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  "write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
640590247 https://github.com/pydata/xarray/issues/4082#issuecomment-640590247 https://api.github.com/repos/pydata/xarray/issues/4082 MDEyOklzc3VlQ29tbWVudDY0MDU5MDI0Nw== rsignell-usgs 1872600 2020-06-08T13:05:28Z 2020-06-08T13:05:28Z NONE

Or perhaps Unidata's @WardF, who leads NetCDF development.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  "write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
640548620 https://github.com/pydata/xarray/issues/4122#issuecomment-640548620 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDY0MDU0ODYyMA== rsignell-usgs 1872600 2020-06-08T11:36:14Z 2020-06-08T11:37:21Z NONE

@martindurant, I asked @ajelenak offline and he reminded me that:

File metadata are dispersed throughout an HDF5 [and NetCDF4] file in order to support writing and modifying array sizes at any time of execution

Looking forward to simplecache:: for writing in fsspec=0.7.5!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
639771646 https://github.com/pydata/xarray/issues/4122#issuecomment-639771646 https://api.github.com/repos/pydata/xarray/issues/4122 MDEyOklzc3VlQ29tbWVudDYzOTc3MTY0Ng== rsignell-usgs 1872600 2020-06-05T20:08:37Z 2020-06-05T20:54:36Z NONE

Okay @scottyhq, I tried setting engine='h5netcdf', but still got: OSError: Seek only available in read mode Thinking about this a little more, it's pretty clear why writing NetCDF to S3 would require seek mode.

I asked @martindurant about supporting seek for writing in fsspec and he said that would be pretty hard. And in fact, the performance probably would be pretty terrible as lots of little writes would be required.

So maybe it's best just to write netcdf files locally and then push them to S3.

And to facilitate that, @martindurant merged a PR yesterday to enable simplecache for writing in fsspec, so after doing: pip install git+https://github.com/intake/filesystem_spec.git in my environment, this now works: ```python import xarray as xr import fsspec

ds = xr.open_dataset('http://geoport.usgs.esipfed.org/thredds/dodsC' '/silt/usgs/Projects/stellwagen/CF-1.6/BUZZ_BAY/2651-A.cdf')

outfile = fsspec.open('simplecache::s3://chs-pangeo-data-bucket/rsignell/foo2.nc', mode='wb', s3=dict(profile='default')) with outfile as f: ds.to_netcdf(f) `` (Here I'm tellingfsspec` to use the AWS credentials in my "default" profile)

Thanks Martin!!!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Document writing netcdf from xarray directly to S3 631085856
639450932 https://github.com/pydata/xarray/issues/4082#issuecomment-639450932 https://api.github.com/repos/pydata/xarray/issues/4082 MDEyOklzc3VlQ29tbWVudDYzOTQ1MDkzMg== rsignell-usgs 1872600 2020-06-05T12:26:14Z 2020-06-05T12:26:14Z NONE

@shoyer, unfortunately these opendap datasets contain only 1 time record (1 daily value) each. And it works fine on Linux with file_cache_maxsize=128, so it must be some Windows cache thing right?

So since I just picked file_cache_maxsize=10 arbitrarily, I thought it would be useful to see what the maximum value was. Using the good old bi-section method, I determined that (for this case anyway), the maximum size that works is 25.

In other words: ``` xr.set_options(file_cache_maxsize=25) # works

xr.set_options(file_cache_maxsize=26) # fails

``` I would bet money that Unidata's @DennisHeimbigner knows what's going on here!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  "write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
639111588 https://github.com/pydata/xarray/issues/4082#issuecomment-639111588 https://api.github.com/repos/pydata/xarray/issues/4082 MDEyOklzc3VlQ29tbWVudDYzOTExMTU4OA== rsignell-usgs 1872600 2020-06-04T20:55:49Z 2020-06-04T20:55:49Z NONE

@EliT1626 , I confirmed that this problem exists on Windows, but not on Linux.

The error: IOError: [Errno -37] NetCDF: Write to read only: 'https://www.ncei.noaa.gov/thredds/dodsC/OisstBase/NetCDF/V2.1/AVHRR/201703/oisst-avhrr-v02r01.20170304.nc' suggested some kind of cache problem, and as you noted it always fails after a certain number of dates, so I tried increasing the number of cached files from the default 128 to 256: xr.set_options(file_cache_maxsize=256) but that had no effect.

Just to see if it would fail earlier, I then tried decreasing the number of cached files: xr.set_options(file_cache_maxsize=10) and to my surprise, it ran all the way through: https://nbviewer.jupyter.org/gist/rsignell-usgs/c52fadd8626734bdd32a432279bc6779

I'm hoping someone who worked on the caching (@shoyer?) might have some idea of what is going on, but at least you can execute your workflow now on windows!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  "write to read-only" Error in xarray.open_mfdataset() with opendap datasets 621177286
592094766 https://github.com/pydata/xarray/pull/3804#issuecomment-592094766 https://api.github.com/repos/pydata/xarray/issues/3804 MDEyOklzc3VlQ29tbWVudDU5MjA5NDc2Ng== rsignell-usgs 1872600 2020-02-27T17:59:13Z 2020-02-27T17:59:13Z NONE

This PR is motivated by the work described in this Medium blog post

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow chunk_store argument when opening Zarr datasets 572251686
534722389 https://github.com/pydata/xarray/issues/3339#issuecomment-534722389 https://api.github.com/repos/pydata/xarray/issues/3339 MDEyOklzc3VlQ29tbWVudDUzNDcyMjM4OQ== rsignell-usgs 1872600 2019-09-24T19:56:17Z 2019-09-24T19:56:17Z NONE

Yep, upgrading to dask=2.4.0 fixed the problem! Phew.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Version 0.13 broke my ufunc 497823072
534710770 https://github.com/pydata/xarray/issues/3339#issuecomment-534710770 https://api.github.com/repos/pydata/xarray/issues/3339 MDEyOklzc3VlQ29tbWVudDUzNDcxMDc3MA== rsignell-usgs 1872600 2019-09-24T19:23:25Z 2019-09-24T19:23:25Z NONE

@shoyer , indeed, while I have the same xarray=0.13 and numpy=1.17.2 as @jhamman, he has dask=2.4.0 and I have dask=2.2.0. I'll try upgrading and will report back.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Version 0.13 broke my ufunc 497823072
510144707 https://github.com/pydata/xarray/issues/2501#issuecomment-510144707 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUxMDE0NDcwNw== rsignell-usgs 1872600 2019-07-10T16:59:12Z 2019-07-11T11:47:02Z NONE

@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the feature_id coordinate to prevent open_mfdataset from trying to harmonize that coordinate from all the chunks.

So if I use this code, the open_mfdataset command finishes: python def drop_coords(ds): ds = ds.drop(['reference_time','feature_id']) return ds.reset_coords(drop=True) and I can then add back in the dropped coordinate values at the end: python dsets = [xr.open_dataset(f) for f in files[:3]] ds.coords['feature_id'] = dsets[0].coords['feature_id']

I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509379294 https://github.com/pydata/xarray/issues/2501#issuecomment-509379294 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTM3OTI5NA== rsignell-usgs 1872600 2019-07-08T20:28:48Z 2019-07-08T20:29:20Z NONE

@TomAugspurger , I thought @rabernat's suggestion of implementing python def drop_coords(ds): return ds.reset_coords(drop=True) would avoid this checking. Did I understand or implement this incorrectly?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509341467 https://github.com/pydata/xarray/issues/2501#issuecomment-509341467 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTM0MTQ2Nw== rsignell-usgs 1872600 2019-07-08T18:34:02Z 2019-07-08T18:34:02Z NONE

@rabernat , to answer your question, if I open just two files: ds = xr.open_mfdataset(files[:2], preprocess=drop_coords, autoclose=True, parallel=True) the resulting dataset is: <xarray.Dataset> Dimensions: (feature_id: 2729077, reference_time: 1, time: 2) Coordinates: * reference_time (reference_time) datetime64[ns] 2009-01-01 * feature_id (feature_id) int32 101 179 181 ... 1180001803 1180001804 * time (time) datetime64[ns] 2009-01-01 2009-01-01T01:00:00 Data variables: streamflow (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> q_lateral (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> velocity (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> qSfcLatRunoff (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> qBucket (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> qBtmVertRunoff (time, feature_id) float64 dask.array<shape=(2, 2729077), chunksize=(1, 2729077)> Attributes: featureType: timeSeries proj4: +proj=longlat +datum=NAD83 +no_defs model_initialization_time: 2009-01-01_00:00:00 station_dimension: feature_id model_output_valid_time: 2009-01-01_00:00:00 stream_order_output: 1 cdm_datatype: Station esri_pe_string: GEOGCS[GCS_North_American_1983,DATUM[D_North_... Conventions: CF-1.6 model_version: NWM 1.2 dev_OVRTSWCRT: 1 dev_NOAH_TIMESTEP: 3600 dev_channel_only: 0 dev_channelBucket_only: 0 dev: dev_ prefix indicates development/internal me...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509340139 https://github.com/pydata/xarray/issues/2501#issuecomment-509340139 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTM0MDEzOQ== rsignell-usgs 1872600 2019-07-08T18:30:18Z 2019-07-08T18:30:18Z NONE

@TomAugspurger, okay, I just ran the above code again and here's what happens:

The open_mfdataset proceeds nicely on my 8 workers with 40 cores, eventually completing the 8760 open_dataset tasks in about 10 minutes. One interesting thing is that the number of tasks keep dropping as time goes on. Not sure why that would be: The memory usage on the workers seems okay during this process:

Then, despite the tasks showing on the dashboard being completed, the open_mfdataset command does not complete, but nothing has died, and I'm not sure what's happening. I check top and get this:

then after about 10 more minutes, I get these warnings:

and then the errors: python-traceback distributed.client - WARNING - Couldn't gather 17520 keys, rescheduling {'getattr-fd038834-befa-4a9b-b78f-51f9aa2b28e5': ('tcp://127.0.0.1:45640',), 'drop_coords-39be9e52-59de-4e1f-b6d8-27e7d931b5af': ('tcp://127.0.0.1:55881',), 'drop_coords-8bd07037-9ca4-4f97-83fb-8b02d7ad0333': ('tcp://127.0.0.1:56164',), 'drop_coords-ca3dd72b-e5af-4099-b593-89dc97717718': ('tcp://127.0.0.1:59961',), 'getattr-c0af8992-e928-4d42-9e64-340303143454': ('tcp://127.0.0.1:42989',), 'drop_coords-8cdfe5fb-7a29-4606-8692-efa747be5bc1': ('tcp://127.0.0.1:35445',), 'getattr-03669206-0d26-46a1-988d-690fe830e52f': ... Full error listing here: https://gist.github.com/rsignell-usgs/3b7101966b8c6d05f48a0e01695f35d6

Does this help? I'd be happy to screenshare if that would be useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509282831 https://github.com/pydata/xarray/issues/2501#issuecomment-509282831 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTI4MjgzMQ== rsignell-usgs 1872600 2019-07-08T15:51:23Z 2019-07-08T15:51:23Z NONE

@TomAugspurger, I'm back from vacation now and ready to attack this again. Any updates on your end?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
506475819 https://github.com/pydata/xarray/issues/2501#issuecomment-506475819 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwNjQ3NTgxOQ== rsignell-usgs 1872600 2019-06-27T19:16:28Z 2019-06-27T19:24:31Z NONE

I tried this, and either I didn't apply it right, or it didn't work. The memory use kept growing until the process died. My code to process the 8760 netcdf files with open_mfdataset looks like this:

```python import xarray as xr from dask.distributed import Client, progress, LocalCluster

cluster = LocalCluster() client = Client(cluster)

import pandas as pd

dates = pd.date_range(start='2009-01-01 00:00',end='2009-12-31 23:00', freq='1h') files = ['./nc/{}/{}.CHRTOUT_DOMAIN1.comp'.format(date.strftime('%Y'),date.strftime('%Y%m%d%H%M')) for date in dates]

def drop_coords(ds): return ds.reset_coords(drop=True)

ds = xr.open_mfdataset(files, preprocess=drop_coords, autoclose=True, parallel=True) ds1 = ds.chunk(chunks={'time':168, 'feature_id':209929})

import numcodecs numcodecs.blosc.use_threads = False ds1.to_zarr('zarr/2009', mode='w', consolidated=True) ```

I transfered the netcdf files from AWS S3 to my local disk to run this, using this command:

rclone sync --include '*.CHRTOUT_DOMAIN1.comp' aws-east:nwm-archive/2009 . --checksum --fast-list --transfers 16 @TomAugspurger, if you could take a look, that would be great, and if you have any ideas of how to make this example simpler/more easily reproducible, please let me know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
497381301 https://github.com/pydata/xarray/issues/2501#issuecomment-497381301 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDQ5NzM4MTMwMQ== rsignell-usgs 1872600 2019-05-30T15:55:56Z 2019-05-30T15:58:48Z NONE

I'm hitting some memory issues with using open_mfdataset with a cluster also.

Specifically, I'm trying to open 8760 NetCDF files with an 8 node, 40 cpu LocalCluster.

When I issue: ds = xr.open_mfdataset(files, parallel=True) all looks good on the Dask dashboard: and the tasks complete with no errors in about 4 minutes.

Then 4 more minutes go by before I get a bunch of errors like: distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting distributed.nanny - WARNING - Worker process 26054 was killed by unknown signal distributed.nanny - WARNING - Restarting worker and my cell doesn't complete.

Any suggestions?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
443227318 https://github.com/pydata/xarray/issues/2368#issuecomment-443227318 https://api.github.com/repos/pydata/xarray/issues/2368 MDEyOklzc3VlQ29tbWVudDQ0MzIyNzMxOA== rsignell-usgs 1872600 2018-11-30T14:53:13Z 2018-11-30T14:53:13Z NONE

@nordam , can you provide an example?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Let's list all the netCDF files that xarray can't open 350899839
432743208 https://github.com/pydata/xarray/issues/2503#issuecomment-432743208 https://api.github.com/repos/pydata/xarray/issues/2503 MDEyOklzc3VlQ29tbWVudDQzMjc0MzIwOA== rsignell-usgs 1872600 2018-10-24T17:02:34Z 2018-10-24T17:02:34Z NONE

The version that is working in @rabernat's esgf binder env is: libnetcdf 4.6.1 h9cd6fdc_11 conda-forge

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problems with distributed and opendap netCDF endpoint 373121666
432706068 https://github.com/pydata/xarray/issues/2503#issuecomment-432706068 https://api.github.com/repos/pydata/xarray/issues/2503 MDEyOklzc3VlQ29tbWVudDQzMjcwNjA2OA== rsignell-usgs 1872600 2018-10-24T15:27:33Z 2018-10-24T15:27:33Z NONE

I fired up my notebook on @rabernat's binder env and it worked fine also: https://nbviewer.jupyter.org/gist/rsignell-usgs/aebdac44a1d773b99673cb132c2ef5eb

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problems with distributed and opendap netCDF endpoint 373121666
432416114 https://github.com/pydata/xarray/issues/2503#issuecomment-432416114 https://api.github.com/repos/pydata/xarray/issues/2503 MDEyOklzc3VlQ29tbWVudDQzMjQxNjExNA== rsignell-usgs 1872600 2018-10-23T20:55:42Z 2018-10-23T20:55:42Z NONE

@lesserwhirls , is this the issue you are referring to? https://github.com/Unidata/netcdf4-python/issues/836

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problems with distributed and opendap netCDF endpoint 373121666
432415704 https://github.com/pydata/xarray/issues/2503#issuecomment-432415704 https://api.github.com/repos/pydata/xarray/issues/2503 MDEyOklzc3VlQ29tbWVudDQzMjQxNTcwNA== rsignell-usgs 1872600 2018-10-23T20:54:24Z 2018-10-23T20:54:24Z NONE

@jhamman, doesn't this dask status plot tell us that multiple workers are connecting and getting data?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problems with distributed and opendap netCDF endpoint 373121666
432389980 https://github.com/pydata/xarray/issues/2503#issuecomment-432389980 https://api.github.com/repos/pydata/xarray/issues/2503 MDEyOklzc3VlQ29tbWVudDQzMjM4OTk4MA== rsignell-usgs 1872600 2018-10-23T19:39:09Z 2018-10-23T19:39:09Z NONE

Perhaps it's also worth mentioning that I don't see any errors on the THREDDS server side on either the tomcat catalina or thredds threddsServlet logs. @lesserwhirls, any ideas?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problems with distributed and opendap netCDF endpoint 373121666
432374559 https://github.com/pydata/xarray/issues/2503#issuecomment-432374559 https://api.github.com/repos/pydata/xarray/issues/2503 MDEyOklzc3VlQ29tbWVudDQzMjM3NDU1OQ== rsignell-usgs 1872600 2018-10-23T18:53:28Z 2018-10-23T19:39:08Z NONE

FWIW, in my workflow there was nothing fundamentally wrong, meaning that the requests worked for a while, but eventually would die with the NetCDF: Malformed or inaccessible DAP DDS message.

So for just a short time period (in this case 50 time steps, 2 chunks in time), it would usually work: https://nbviewer.jupyter.org/gist/rsignell-usgs/1155c76ed3440858ced8132e4cd81df4

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problems with distributed and opendap netCDF endpoint 373121666
432367931 https://github.com/pydata/xarray/issues/2503#issuecomment-432367931 https://api.github.com/repos/pydata/xarray/issues/2503 MDEyOklzc3VlQ29tbWVudDQzMjM2NzkzMQ== rsignell-usgs 1872600 2018-10-23T18:34:48Z 2018-10-23T19:18:52Z NONE

I tried a similar workflow last week with an AWS kubernetes cluster with opendap endpoints and it also failed: https://nbviewer.jupyter.org/gist/rsignell-usgs/8583ea8f8b5e1c926b0409bd536095a9

I thought it was likely some intermittent problem that wasn't handled well. In my case after a while I get: distributed.worker - WARNING - Compute Failed Function: getter args: (ImplicitToExplicitIndexingAdapter(array=CopyOnWriteArray(array=LazilyOuterIndexedArray(array=_ElementwiseFunctionArray(LazilyOuterIndexedArray(array=<xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7ff93cbbd828>, key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))), func=functools.partial(<function _apply_mask at 0x7ff945421378>, encoded_fill_values={1e+37}, decoded_fill_value=nan, dtype=dtype('float64')), dtype=dtype('float64')), key=BasicIndexer((slice(None, None, None), slice(None, None, None), slice(None, None, None), slice(None, None, None)))))), (slice(375, 400, None), slice(0, 7, None), slice(0, 670, None), slice(0, 300, None))) kwargs: {} Exception: OSError(-72, 'NetCDF: Malformed or inaccessible DAP DDS')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problems with distributed and opendap netCDF endpoint 373121666
408606913 https://github.com/pydata/xarray/issues/2323#issuecomment-408606913 https://api.github.com/repos/pydata/xarray/issues/2323 MDEyOklzc3VlQ29tbWVudDQwODYwNjkxMw== rsignell-usgs 1872600 2018-07-28T13:07:39Z 2018-07-28T13:07:39Z NONE

@shoyer, if we a znetcdf library like h5netcdf we could get mf_dataset "for free" though, right?
Zarr definitely has more and different compression options than NetCDF -- does that make this concept problematic?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  znetcdf: h5netcdf analog for zarr?  345354038
397596002 https://github.com/pydata/xarray/issues/2233#issuecomment-397596002 https://api.github.com/repos/pydata/xarray/issues/2233 MDEyOklzc3VlQ29tbWVudDM5NzU5NjAwMg== rsignell-usgs 1872600 2018-06-15T11:44:35Z 2018-06-15T11:44:35Z NONE

@rabernat , this unstructured grid model output follows the UGRID Conventions, which layer on top of the CF Conventions. The issue Xarray is having here is with the vertical coordinate however, so this issue could arise with any CF convention model where the vertical stretching function varies over the domain.

As requested, here is the ncdump of this URL: ``` jovyan@jupyter-rsignell-2dusgs:~$ ncdump -h http://www.smast.umassd.edu:8080/thredds/dodsC/FVCOM/NECOFS/Forecasts/NECOFS_GOM3_FORECAST.nc netcdf NECOFS_GOM3_FORECAST { dimensions: time = UNLIMITED ; // (145 currently) maxStrlen64 = 64 ; nele = 99137 ; node = 53087 ; siglay = 40 ; three = 3 ; variables: float lon(node) ; lon:long_name = "nodal longitude" ; lon:standard_name = "longitude" ; lon:units = "degrees_east" ; float lat(node) ; lat:long_name = "nodal latitude" ; lat:standard_name = "latitude" ; lat:units = "degrees_north" ; float xc(nele) ; xc:long_name = "zonal x-coordinate" ; xc:units = "meters" ; float yc(nele) ; yc:long_name = "zonal y-coordinate" ; yc:units = "meters" ; float lonc(nele) ; lonc:long_name = "zonal longitude" ; lonc:standard_name = "longitude" ; lonc:units = "degrees_east" ; float latc(nele) ; latc:long_name = "zonal latitude" ; latc:standard_name = "latitude" ; latc:units = "degrees_north" ; float siglay(siglay, node) ; siglay:long_name = "Sigma Layers" ; siglay:standard_name = "ocean_sigma_coordinate" ; siglay:positive = "up" ; siglay:valid_min = -1. ; siglay:valid_max = 0. ; siglay:formula_terms = "sigma: siglay eta: zeta depth: h" ; float h(node) ; h:long_name = "Bathymetry" ; h:standard_name = "sea_floor_depth_below_geoid" ; h:units = "m" ; h:coordinates = "lat lon" ; h:type = "data" ; h:mesh = "fvcom_mesh" ; h:location = "node" ; int nv(three, nele) ; nv:long_name = "nodes surrounding element" ; nv:cf_role = "face_node_connnectivity" ; nv:start_index = 1 ; float time(time) ; time:long_name = "time" ; time:units = "days since 1858-11-17 00:00:00" ; time:format = "modified julian day (MJD)" ; time:time_zone = "UTC" ; time:standard_name = "time" ; float zeta(time, node) ; zeta:long_name = "Water Surface Elevation" ; zeta:units = "meters" ; zeta:standard_name = "sea_surface_height_above_geoid" ; zeta:coordinates = "time lat lon" ; zeta:type = "data" ; zeta:missing_value = -999. ; zeta:field = "elev, scalar" ; zeta:coverage_content_type = "modelResult" ; zeta:mesh = "fvcom_mesh" ; zeta:location = "node" ; int nbe(three, nele) ; nbe:long_name = "elements surrounding each element" ; float u(time, siglay, nele) ; u:long_name = "Eastward Water Velocity" ; u:units = "meters s-1" ; u:type = "data" ; u:missing_value = -999. ; u:field = "ua, scalar" ; u:coverage_content_type = "modelResult" ; u:standard_name = "eastward_sea_water_velocity" ; u:coordinates = "time siglay latc lonc" ; u:mesh = "fvcom_mesh" ; u:location = "face" ; float v(time, siglay, nele) ; v:long_name = "Northward Water Velocity" ; v:units = "meters s-1" ; v:type = "data" ; v:missing_value = -999. ; v:field = "va, scalar" ; v:coverage_content_type = "modelResult" ; v:standard_name = "northward_sea_water_velocity" ; v:coordinates = "time siglay latc lonc" ; v:mesh = "fvcom_mesh" ; v:location = "face" ; float ww(time, siglay, nele) ; ww:long_name = "Upward Water Velocity" ; ww:units = "meters s-1" ; ww:type = "data" ; ww:coverage_content_type = "modelResult" ; ww:standard_name = "upward_sea_water_velocity" ; ww:coordinates = "time siglay latc lonc" ; ww:mesh = "fvcom_mesh" ; ww:location = "face" ; float ua(time, nele) ; ua:long_name = "Vertically Averaged x-velocity" ; ua:units = "meters s-1" ; ua:type = "data" ; ua:missing_value = -999. ; ua:field = "ua, scalar" ; ua:coverage_content_type = "modelResult" ; ua:standard_name = "barotropic_eastward_sea_water_velocity" ; ua:coordinates = "time latc lonc" ; ua:mesh = "fvcom_mesh" ; ua:location = "face" ; float va(time, nele) ; va:long_name = "Vertically Averaged y-velocity" ; va:units = "meters s-1" ; va:type = "data" ; va:missing_value = -999. ; va:field = "va, scalar" ; va:coverage_content_type = "modelResult" ; va:standard_name = "barotropic_northward_sea_water_velocity" ; va:coordinates = "time latc lonc" ; va:mesh = "fvcom_mesh" ; va:location = "face" ; float temp(time, siglay, node) ; temp:long_name = "temperature" ; temp:standard_name = "sea_water_potential_temperature" ; temp:units = "degrees_C" ; temp:coordinates = "time siglay lat lon" ; temp:type = "data" ; temp:coverage_content_type = "modelResult" ; temp:mesh = "fvcom_mesh" ; temp:location = "node" ; float salinity(time, siglay, node) ; salinity:long_name = "salinity" ; salinity:standard_name = "sea_water_salinity" ; salinity:units = "0.001" ; salinity:coordinates = "time siglay lat lon" ; salinity:type = "data" ; salinity:coverage_content_type = "modelResult" ; salinity:mesh = "fvcom_mesh" ; salinity:location = "node" ; int fvcom_mesh ; fvcom_mesh:cf_role = "mesh_topology" ; fvcom_mesh:topology_dimension = 2 ; fvcom_mesh:node_coordinates = "lon lat" ; fvcom_mesh:face_coordinates = "lonc latc" ; fvcom_mesh:face_node_connectivity = "nv" ;

// global attributes: :title = "NECOFS GOM3 (FVCOM) - Northeast US - Latest Forecast" ; :institution = "School for Marine Science and Technology" ; :source = "FVCOM_3.0" ; :Conventions = "CF-1.0, UGRID-1.0" ; :summary = "Latest forecast from the FVCOM Northeast Coastal Ocean Forecast System using an newer, higher-resolution GOM3 mesh (GOM2 was the preceding mesh)" ;

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem opening unstructured grid ocean forecasts with 4D vertical coordinates 332471780
395535173 https://github.com/pydata/xarray/pull/2131#issuecomment-395535173 https://api.github.com/repos/pydata/xarray/issues/2131 MDEyOklzc3VlQ29tbWVudDM5NTUzNTE3Mw== rsignell-usgs 1872600 2018-06-07T19:20:24Z 2018-06-07T19:20:24Z NONE

Sounds good. Thanks @shoyer!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/pickle rasterio 323017930
395524953 https://github.com/pydata/xarray/pull/2131#issuecomment-395524953 https://api.github.com/repos/pydata/xarray/issues/2131 MDEyOklzc3VlQ29tbWVudDM5NTUyNDk1Mw== rsignell-usgs 1872600 2018-06-07T18:45:42Z 2018-06-07T18:45:42Z NONE

Might this PR warrant a new minor release?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/pickle rasterio 323017930
395476675 https://github.com/pydata/xarray/pull/2131#issuecomment-395476675 https://api.github.com/repos/pydata/xarray/issues/2131 MDEyOklzc3VlQ29tbWVudDM5NTQ3NjY3NQ== rsignell-usgs 1872600 2018-06-07T16:07:14Z 2018-06-07T16:11:08Z NONE

@jhamman woohoo! Cell [20] completes nicely now: https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594 I'm getting some errors in Cell [20], but I think those are unrelated and didn't affect the successful completion of the tasks, right? (this is on an HPC system)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/pickle rasterio 323017930
395447613 https://github.com/pydata/xarray/pull/2131#issuecomment-395447613 https://api.github.com/repos/pydata/xarray/issues/2131 MDEyOklzc3VlQ29tbWVudDM5NTQ0NzYxMw== rsignell-usgs 1872600 2018-06-07T14:46:21Z 2018-06-07T14:47:07Z NONE

@jhamman , although I'm getting distributed workers to compute the mean from a bunch of images, I'm getting a "Failed to Serialize" error in cell [23] of this notebook: https://gist.github.com/rsignell-usgs/90f15e2da918e3c6ba6ee5bb6095d594 If this is a bug, I think it was there before the recent updates.

You should be able to run this notebook without modification.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/pickle rasterio 323017930
394887291 https://github.com/pydata/xarray/pull/2131#issuecomment-394887291 https://api.github.com/repos/pydata/xarray/issues/2131 MDEyOklzc3VlQ29tbWVudDM5NDg4NzI5MQ== rsignell-usgs 1872600 2018-06-05T23:00:51Z 2018-06-05T23:13:08Z NONE

@jhamman , still very much interested in this -- could the existing functionality be merged and enhanced later?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/pickle rasterio 323017930
389330810 https://github.com/pydata/xarray/pull/2131#issuecomment-389330810 https://api.github.com/repos/pydata/xarray/issues/2131 MDEyOklzc3VlQ29tbWVudDM4OTMzMDgxMA== rsignell-usgs 1872600 2018-05-15T22:15:22Z 2018-05-15T22:15:22Z NONE

It's working for me! https://gist.github.com/rsignell-usgs/ef81fb4306dac3a2406d0adb575b340f

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/pickle rasterio 323017930
389277628 https://github.com/pydata/xarray/pull/2131#issuecomment-389277628 https://api.github.com/repos/pydata/xarray/issues/2131 MDEyOklzc3VlQ29tbWVudDM4OTI3NzYyOA== rsignell-usgs 1872600 2018-05-15T19:02:06Z 2018-05-15T19:02:06Z NONE

@jhamman should I test this out on my original workflow or wait a bit?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/pickle rasterio 323017930
388786292 https://github.com/pydata/xarray/issues/2121#issuecomment-388786292 https://api.github.com/repos/pydata/xarray/issues/2121 MDEyOklzc3VlQ29tbWVudDM4ODc4NjI5Mg== rsignell-usgs 1872600 2018-05-14T11:34:45Z 2018-05-14T11:34:45Z NONE

@jhamman what kind of expertise would it take to do this job (e.g, it just a copy-and-paste with some small changes that a newbie could probably do, or would it be best for core dev team)?

And is there any workaround that can be used in the interim?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  rasterio backend should use DataStorePickleMixin (or something similar) 322445312
382466626 https://github.com/pydata/xarray/pull/1811#issuecomment-382466626 https://api.github.com/repos/pydata/xarray/issues/1811 MDEyOklzc3VlQ29tbWVudDM4MjQ2NjYyNg== rsignell-usgs 1872600 2018-04-18T17:30:25Z 2018-04-18T17:32:21Z NONE

@jhamman, I was just using client = Client(). Should I be using LocalCluster instead?
(there is no kubernetes on this JupyterHub).
Also, is there a better place to have this sort of discussion or is it okay here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Compute==False for to_zarr and to_netcdf 286542795
382421609 https://github.com/pydata/xarray/pull/1811#issuecomment-382421609 https://api.github.com/repos/pydata/xarray/issues/1811 MDEyOklzc3VlQ29tbWVudDM4MjQyMTYwOQ== rsignell-usgs 1872600 2018-04-18T15:11:02Z 2018-04-18T15:14:12Z NONE

@jhamman, I tried the same code with a single-threaded scheduler: python ... delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False) persist_store = delayed_store.persist(retries=100, get=dask.local.get_sync) and it ran to completion with no errors (taking 2 hours for 100GB to Zarr). What should I try next?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Compute==False for to_zarr and to_netcdf 286542795
381969631 https://github.com/pydata/xarray/pull/1811#issuecomment-381969631 https://api.github.com/repos/pydata/xarray/issues/1811 MDEyOklzc3VlQ29tbWVudDM4MTk2OTYzMQ== rsignell-usgs 1872600 2018-04-17T12:12:15Z 2018-04-17T12:15:19Z NONE

@jhamman , I'm trying to test compute=False out this code: ```python

Write National Water Model data to Zarr

from dask.distributed import Client import pandas as pd import xarray as xr import s3fs import zarr

if name == 'main':

client = Client()

root = '/projects/water/nwm/data/forcing_short_range/'                      # Local Files

root = 'http://tds.renci.org:8080/thredds/dodsC/nwm/forcing_short_range/' # OPenDAP

bucket_endpoint='https://s3.us-west-1.amazonaws.com/'

bucket_endpoint='https://iu.jetstream-cloud.org:8080'

f_zarr = 'rsignell/nwm/test_week'

dates = pd.date_range(start='2018-04-01T00:00', end='2018-04-07T23:00', freq='H')
urls = ['{}{}/nwm.t{}z.short_range.forcing.f001.conus.nc'.format(root,a.strftime('%Y%m%d'),a.strftime('%H')) for a in dates]

ds = xr.open_mfdataset(urls, concat_dim='time', lock=True)
ds = ds.drop(['ProjectionCoordinateSystem'])

fs = s3fs.S3FileSystem(anon=False, client_kwargs=dict(endpoint_url=bucket_endpoint))
d = s3fs.S3Map(f_zarr, s3=fs)

compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2)
encoding = {vname: {'compressor': compressor} for vname in ds.data_vars}

delayed_store = ds.to_zarr(store=d, mode='w', encoding=encoding, compute=False)
persist_store = delayed_store.persist(retries=100)

``` and after 20 seconds or so, the process dies with this error:

```python-traceback /home/rsignell/my-conda-envs/zarr/lib/python3.6/site-packages/distributed/worker.py:742: UserWarning: Large object of size 1.23 MB detected in task graph:

(<xarray.backends.zarr.ZarrStore object at 0x7f5d8 ... deedecefab224')

Consider scattering large objects ahead of time with client.scatter to reduce scheduler burden and keep data on workers

future = client.submit(func, big_data)    # bad

big_future = client.scatter(big_data)     # good
future = client.submit(func, big_future)  # good

% (format_bytes(len(b)), s)) ``` Do you have suggestions on how to modify my code?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Compute==False for to_zarr and to_netcdf 286542795
339093278 https://github.com/pydata/xarray/issues/1621#issuecomment-339093278 https://api.github.com/repos/pydata/xarray/issues/1621 MDEyOklzc3VlQ29tbWVudDMzOTA5MzI3OA== rsignell-usgs 1872600 2017-10-24T18:50:21Z 2017-10-24T18:50:21Z NONE

I vote for 1 also. How many makes a quorum? :smile_cat:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) 264321376
338172936 https://github.com/pydata/xarray/issues/1621#issuecomment-338172936 https://api.github.com/repos/pydata/xarray/issues/1621 MDEyOklzc3VlQ29tbWVudDMzODE3MjkzNg== rsignell-usgs 1872600 2017-10-20T10:46:53Z 2017-10-20T10:50:18Z NONE

On https://stackoverflow.com/a/46675990/2005869, @shoyer explains:

My understanding of CF standard names is that forecast_period should be equal to the difference between time and forecast_reference_time, i.e., forecast_period = time - forecast_reference_time. If you specified your time_offset variable with units in the form "hours", then it would be decoded to timedelta64, along with datetime64 for time and time_run, so xarray's arithmetic would actually satisfy this identity. You might find this useful if you only wanted to include two of these variables and wanted to calculate the third on the fly. On the other hand, you probably don't want to convert the Tper variable to timedelta64. Technically, it is also a time period, but it's not a variable that makes sense to compare to time.

I understand the potential issue here, but I think Xarray should follow CF conventions for time, and only treat variables as time coordinates if they have valid CF time units (<time unit> since <date>).

We know of thousands of datasets (every dataset with waves!) where the current Xarray behavior is a problem.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Undesired decoding to timedelta64 (was: units of "seconds" translated to time coordinate) 264321376
217183543 https://github.com/pydata/xarray/pull/844#issuecomment-217183543 https://api.github.com/repos/pydata/xarray/issues/844 MDEyOklzc3VlQ29tbWVudDIxNzE4MzU0Mw== rsignell-usgs 1872600 2016-05-05T15:19:55Z 2016-05-05T15:19:55Z NONE

It also seems consistent to me to return a Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add a filter_by_attrs method to Dataset 153126324
216944939 https://github.com/pydata/xarray/issues/567#issuecomment-216944939 https://api.github.com/repos/pydata/xarray/issues/567 MDEyOklzc3VlQ29tbWVudDIxNjk0NDkzOQ== rsignell-usgs 1872600 2016-05-04T17:45:06Z 2016-05-04T17:45:06Z NONE

:+1: -- I think this would be super-useful general functionality for the xarray community that doesn't come with any downside.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Best way to find data variables by standard_name 105688738
169553668 https://github.com/pydata/xarray/issues/704#issuecomment-169553668 https://api.github.com/repos/pydata/xarray/issues/704 MDEyOklzc3VlQ29tbWVudDE2OTU1MzY2OA== rsignell-usgs 1872600 2016-01-07T05:19:04Z 2016-01-07T05:19:04Z NONE

I think it would be nicer to get rid of the floating black lines (axis) altogether

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Complete renaming xray -> xarray 124867009
169299086 https://github.com/pydata/xarray/issues/704#issuecomment-169299086 https://api.github.com/repos/pydata/xarray/issues/704 MDEyOklzc3VlQ29tbWVudDE2OTI5OTA4Ng== rsignell-usgs 1872600 2016-01-06T11:10:04Z 2016-01-06T11:10:33Z NONE

Yet another vote for import xarray as xr

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Complete renaming xray -> xarray 124867009
139055592 https://github.com/pydata/xarray/issues/567#issuecomment-139055592 https://api.github.com/repos/pydata/xarray/issues/567 MDEyOklzc3VlQ29tbWVudDEzOTA1NTU5Mg== rsignell-usgs 1872600 2015-09-09T21:48:02Z 2015-09-09T21:48:02Z NONE

I was thinking that the data variables that matched a specified standard_name would be a subset of the variables in the data_vars object.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Best way to find data variables by standard_name 105688738
121802784 https://github.com/pydata/xarray/issues/476#issuecomment-121802784 https://api.github.com/repos/pydata/xarray/issues/476 MDEyOklzc3VlQ29tbWVudDEyMTgwMjc4NA== rsignell-usgs 1872600 2015-07-16T02:17:31Z 2015-07-16T02:17:31Z NONE

Indeed, with master, it's working. http://nbviewer.ipython.org/gist/rsignell-usgs/047235496029529585cc

Closing....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf failing for datasets with a single time value 95222803

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 23.865ms · About: xarray-datasette