home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 493058488 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • abarciauskas-bgse 4
  • rabernat 3

author_association 2

  • NONE 4
  • MEMBER 3

issue 1

  • `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
531617569 https://github.com/pydata/xarray/issues/3306#issuecomment-531617569 https://api.github.com/repos/pydata/xarray/issues/3306 MDEyOklzc3VlQ29tbWVudDUzMTYxNzU2OQ== abarciauskas-bgse 15016780 2019-09-16T01:22:09Z 2019-09-16T01:22:09Z NONE

Thanks @rabernat. I tried what you suggested (with a small subset, the source files are quite large) and it seems to work on smaller subsets, writing locally. Which leads me to suspect trying to run the same process with larger datasets might be overloading memory, but I can't assert the root cause yet. This isn't blocking my current strategy so closing for now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531499393 https://github.com/pydata/xarray/issues/3306#issuecomment-531499393 https://api.github.com/repos/pydata/xarray/issues/3306 MDEyOklzc3VlQ29tbWVudDUzMTQ5OTM5Mw== rabernat 1197350 2019-09-14T17:47:10Z 2019-09-14T17:47:10Z MEMBER

What if you just use a dask local cluster, rather than a distributed cluster? Then you can just write to a local directory.

And what if you don’t use a distributed cluster at all, just the threaded scheduler?

In my experience with these problems, by systematically removing layers of complexity from the scenario, we often come to the root of the issue

On Sep 14, 2019, at 11:03 AM, Aimee Barciauskas notifications@github.com wrote:

@rabernat good points. One thing I'm not sure of how to make reproducible is calling a remote file store, since I think it usually requires calling to a write-protected cloud storage provider. Any tips on this?

I have what should be an otherwise working example here: https://gist.github.com/abarciauskas-bgse/d0aac2ae9bf0b06f52a577d0a6251b2d - let me know if this is an ok format to share for reproducing the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531493820 https://github.com/pydata/xarray/issues/3306#issuecomment-531493820 https://api.github.com/repos/pydata/xarray/issues/3306 MDEyOklzc3VlQ29tbWVudDUzMTQ5MzgyMA== abarciauskas-bgse 15016780 2019-09-14T16:34:56Z 2019-09-14T16:34:56Z NONE

I recall this also happening when storing locally but I can't reproduce that at the moment since the kubernetes cluster I am using now is not a pangeo hub and not setup to use EFS.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531489772 https://github.com/pydata/xarray/issues/3306#issuecomment-531489772 https://api.github.com/repos/pydata/xarray/issues/3306 MDEyOklzc3VlQ29tbWVudDUzMTQ4OTc3Mg== rabernat 1197350 2019-09-14T15:44:32Z 2019-09-14T15:44:32Z MEMBER

Does the problem only arise when writing to s3fs? Or can you reproduce it writing to a local Zarr directory store?

Sent from my iPhone

On Sep 14, 2019, at 11:03 AM, Aimee Barciauskas notifications@github.com wrote:

@rabernat good points. One thing I'm not sure of how to make reproducible is calling a remote file store, since I think it usually requires calling to a write-protected cloud storage provider. Any tips on this?

I have what should be an otherwise working example here: https://gist.github.com/abarciauskas-bgse/d0aac2ae9bf0b06f52a577d0a6251b2d - let me know if this is an ok format to share for reproducing the issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531486715 https://github.com/pydata/xarray/issues/3306#issuecomment-531486715 https://api.github.com/repos/pydata/xarray/issues/3306 MDEyOklzc3VlQ29tbWVudDUzMTQ4NjcxNQ== abarciauskas-bgse 15016780 2019-09-14T15:03:04Z 2019-09-14T15:03:04Z NONE

@rabernat good points. One thing I'm not sure of how to make reproducible is calling a remote file store, since I think it usually requires calling to a write-protected cloud storage provider. Any tips on this?

I have what should be an otherwise working example here: https://gist.github.com/abarciauskas-bgse/d0aac2ae9bf0b06f52a577d0a6251b2d - let me know if this is an ok format to share for reproducing the issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531437477 https://github.com/pydata/xarray/issues/3306#issuecomment-531437477 https://api.github.com/repos/pydata/xarray/issues/3306 MDEyOklzc3VlQ29tbWVudDUzMTQzNzQ3Nw== rabernat 1197350 2019-09-14T02:00:54Z 2019-09-14T02:00:54Z MEMBER

@aidanheerdegen - thanks so much for posting this issue! I think a lot of people run into these sorts of problems, so it's useful to have an example on the issue tracker.

These problems can unfortunately be very hard to debug. If other developers can quickly reproduce your exact same error on their own systems, then we can try to dig deeper. However, I can't run the code you shared. If I paste it into a notebook, I get FileNotFoundError: [Errno 2] No such file or directory: b'/home/jovyan/cmip6-bot/2018/153/20180602090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc' Your example requires your files, which I don't have. I noticed some later examples point to a podaac opendap sever, commented out with the comment ```

This works fine

base_url = 'https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/'

``` but I couldn't tell if I was supposed to run that part or not.

So unfortunately I have to ask you for some tweaks to your question. Could you either: 1. edit this example so that it points to an opendap server or other globally accessible endpoint on the internet? OR 1. replace the real data with synthetically generated data (e.g. use dask.random.random to create arrays instead of loading from disk)

Although 2 is a pain, it actually usually helps surface bugs by removing part of the I/O.

Thanks again for your contribution.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531435069 https://github.com/pydata/xarray/issues/3306#issuecomment-531435069 https://api.github.com/repos/pydata/xarray/issues/3306 MDEyOklzc3VlQ29tbWVudDUzMTQzNTA2OQ== abarciauskas-bgse 15016780 2019-09-14T01:42:22Z 2019-09-14T01:42:22Z NONE

Update: I've made some progress on determining the source of this issue. It seems related to the source dataset's variables. When I use 2 opendap urls with 4 parameterized variables things work fine

Using 2 urls like:

https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc?time[0:1:0],lat[0:1:17998],lon[0:1:35999],analysed_sst[0:1:0][0:1:17998][0:1:35999],analysis_error[0:1:0][0:1:17998][0:1:35999],mask[0:1:0][0:1:17998][0:1:35999],sea_ice_fraction[0:1:0][0:1:17998][0:1:35999]

I get back a dataset :

<xarray.Dataset> Dimensions: (lat: 17999, lon: 36000, time: 2) Coordinates: * lat (lat) float32 -89.99 -89.98 -89.97 ... 89.97 89.98 89.99 * lon (lon) float32 -179.99 -179.98 -179.97 ... 179.99 180.0 * time (time) datetime64[ns] 2018-04-22T09:00:00 2018-04-23T09:00:00 Data variables: analysed_sst (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> analysis_error (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> Attributes: Conventions: CF-1.5 title: Daily MUR SST, Final product

however if I omit the parameterized data variables using urls such as:

https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc

I get back an additional variable:

<xarray.Dataset> Dimensions: (lat: 17999, lon: 36000, time: 2) Coordinates: * lat (lat) float32 -89.99 -89.98 -89.97 ... 89.97 89.98 89.99 * lon (lon) float32 -179.99 -179.98 -179.97 ... 179.99 180.0 * time (time) datetime64[ns] 2018-04-22T09:00:00 2018-04-23T09:00:00 Data variables: analysed_sst (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> analysis_error (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> mask (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> sea_ice_fraction (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> dt_1km_data (time, lat, lon) timedelta64[ns] dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> Attributes: Conventions: CF-1.5 title: Daily MUR SST, Final product

In the first case (with the parameterized variables) I achieve the expected result (data is stored on S3). In the second case (no parameterized variables), store store is never included in the graph the workers seem to stall.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.437ms · About: xarray-datasette