html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue https://github.com/pydata/xarray/issues/3306#issuecomment-531617569,https://api.github.com/repos/pydata/xarray/issues/3306,531617569,MDEyOklzc3VlQ29tbWVudDUzMTYxNzU2OQ==,15016780,2019-09-16T01:22:09Z,2019-09-16T01:22:09Z,NONE,"Thanks @rabernat. I tried what you suggested (with a small subset, the source files are quite large) and it seems to work on smaller subsets, writing locally. Which leads me to suspect trying to run the same process with larger datasets might be overloading memory, but I can't assert the root cause yet. This isn't blocking my current strategy so closing for now.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,493058488 https://github.com/pydata/xarray/issues/3306#issuecomment-531499393,https://api.github.com/repos/pydata/xarray/issues/3306,531499393,MDEyOklzc3VlQ29tbWVudDUzMTQ5OTM5Mw==,1197350,2019-09-14T17:47:10Z,2019-09-14T17:47:10Z,MEMBER,"What if you just use a dask local cluster, rather than a distributed cluster? Then you can just write to a local directory. And what if you don’t use a distributed cluster at all, just the threaded scheduler? In my experience with these problems, by systematically removing layers of complexity from the scenario, we often come to the root of the issue > On Sep 14, 2019, at 11:03 AM, Aimee Barciauskas wrote: > > @rabernat good points. One thing I'm not sure of how to make reproducible is calling a remote file store, since I think it usually requires calling to a write-protected cloud storage provider. Any tips on this? > > I have what should be an otherwise working example here: https://gist.github.com/abarciauskas-bgse/d0aac2ae9bf0b06f52a577d0a6251b2d - let me know if this is an ok format to share for reproducing the issue. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,493058488 https://github.com/pydata/xarray/issues/3306#issuecomment-531493820,https://api.github.com/repos/pydata/xarray/issues/3306,531493820,MDEyOklzc3VlQ29tbWVudDUzMTQ5MzgyMA==,15016780,2019-09-14T16:34:56Z,2019-09-14T16:34:56Z,NONE,I recall this also happening when storing locally but I can't reproduce that at the moment since the kubernetes cluster I am using now is not a pangeo hub and not setup to use EFS.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,493058488 https://github.com/pydata/xarray/issues/3306#issuecomment-531489772,https://api.github.com/repos/pydata/xarray/issues/3306,531489772,MDEyOklzc3VlQ29tbWVudDUzMTQ4OTc3Mg==,1197350,2019-09-14T15:44:32Z,2019-09-14T15:44:32Z,MEMBER,"Does the problem only arise when writing to s3fs? Or can you reproduce it writing to a local Zarr directory store? Sent from my iPhone > On Sep 14, 2019, at 11:03 AM, Aimee Barciauskas wrote: > > @rabernat good points. One thing I'm not sure of how to make reproducible is calling a remote file store, since I think it usually requires calling to a write-protected cloud storage provider. Any tips on this? > > I have what should be an otherwise working example here: https://gist.github.com/abarciauskas-bgse/d0aac2ae9bf0b06f52a577d0a6251b2d - let me know if this is an ok format to share for reproducing the issue. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,493058488 https://github.com/pydata/xarray/issues/3306#issuecomment-531486715,https://api.github.com/repos/pydata/xarray/issues/3306,531486715,MDEyOklzc3VlQ29tbWVudDUzMTQ4NjcxNQ==,15016780,2019-09-14T15:03:04Z,2019-09-14T15:03:04Z,NONE,"@rabernat good points. One thing I'm not sure of how to make reproducible is calling a remote file store, since I think it usually requires calling to a write-protected cloud storage provider. Any tips on this? I have what should be an otherwise working example here: https://gist.github.com/abarciauskas-bgse/d0aac2ae9bf0b06f52a577d0a6251b2d - let me know if this is an ok format to share for reproducing the issue. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,493058488 https://github.com/pydata/xarray/issues/3306#issuecomment-531437477,https://api.github.com/repos/pydata/xarray/issues/3306,531437477,MDEyOklzc3VlQ29tbWVudDUzMTQzNzQ3Nw==,1197350,2019-09-14T02:00:54Z,2019-09-14T02:00:54Z,MEMBER,"@aidanheerdegen - thanks so much for posting this issue! I think a lot of people run into these sorts of problems, so it's useful to have an example on the issue tracker. These problems can unfortunately be very hard to debug. If other developers can quickly reproduce your exact same error on their own systems, then we can try to dig deeper. However, I can't run the code you shared. If I paste it into a notebook, I get ``` FileNotFoundError: [Errno 2] No such file or directory: b'/home/jovyan/cmip6-bot/2018/153/20180602090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc' ``` Your example requires your files, which I don't have. I noticed some later examples point to a podaac opendap sever, commented out with the comment ``` # This works fine # base_url = 'https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/' ``` but I couldn't tell if I was supposed to run that part or not. So unfortunately I have to ask you for some tweaks to your question. Could you either: 1. edit this example so that it points to an opendap server or other globally accessible endpoint on the internet? OR 1. replace the real data with synthetically generated data (e.g. use `dask.random.random` to create arrays instead of loading from disk) Although 2 is a pain, it actually usually helps surface bugs by removing part of the I/O. Thanks again for your contribution.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,493058488 https://github.com/pydata/xarray/issues/3306#issuecomment-531435069,https://api.github.com/repos/pydata/xarray/issues/3306,531435069,MDEyOklzc3VlQ29tbWVudDUzMTQzNTA2OQ==,15016780,2019-09-14T01:42:22Z,2019-09-14T01:42:22Z,NONE,"Update: I've made some progress on determining the source of this issue. It seems related to the source dataset's variables. When I use 2 opendap urls with 4 parameterized variables things work fine Using 2 urls like: ``` https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc?time[0:1:0],lat[0:1:17998],lon[0:1:35999],analysed_sst[0:1:0][0:1:17998][0:1:35999],analysis_error[0:1:0][0:1:17998][0:1:35999],mask[0:1:0][0:1:17998][0:1:35999],sea_ice_fraction[0:1:0][0:1:17998][0:1:35999] ``` I get back a dataset : ``` Dimensions: (lat: 17999, lon: 36000, time: 2) Coordinates: * lat (lat) float32 -89.99 -89.98 -89.97 ... 89.97 89.98 89.99 * lon (lon) float32 -179.99 -179.98 -179.97 ... 179.99 180.0 * time (time) datetime64[ns] 2018-04-22T09:00:00 2018-04-23T09:00:00 Data variables: analysed_sst (time, lat, lon) float32 dask.array analysis_error (time, lat, lon) float32 dask.array Attributes: Conventions: CF-1.5 title: Daily MUR SST, Final product ``` however if I omit the parameterized data variables using urls such as: ``` https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc ``` I get back an additional variable: ``` Dimensions: (lat: 17999, lon: 36000, time: 2) Coordinates: * lat (lat) float32 -89.99 -89.98 -89.97 ... 89.97 89.98 89.99 * lon (lon) float32 -179.99 -179.98 -179.97 ... 179.99 180.0 * time (time) datetime64[ns] 2018-04-22T09:00:00 2018-04-23T09:00:00 Data variables: analysed_sst (time, lat, lon) float32 dask.array analysis_error (time, lat, lon) float32 dask.array mask (time, lat, lon) float32 dask.array sea_ice_fraction (time, lat, lon) float32 dask.array dt_1km_data (time, lat, lon) timedelta64[ns] dask.array Attributes: Conventions: CF-1.5 title: Daily MUR SST, Final product ``` In the first case (with the parameterized variables) I achieve the expected result (data is stored on S3). In the second case (no parameterized variables), `store` store is never included in the graph the workers seem to stall. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,493058488