home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

18 rows where issue = 129150619 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • mrocklin 8
  • Scheibs 7
  • shoyer 3

author_association 2

  • MEMBER 11
  • NONE 7

issue 1

  • Cannot write dask Dataset to NetCDF file · 18 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
186264260 https://github.com/pydata/xarray/issues/729#issuecomment-186264260 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4NjI2NDI2MA== Scheibs 16919188 2016-02-19T15:40:20Z 2016-02-19T15:40:20Z NONE

Tank you for your help on this, I will try myself to improve my knoledge of xarray and dask ;)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
184390218 https://github.com/pydata/xarray/issues/729#issuecomment-184390218 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4NDM5MDIxOA== shoyer 1217238 2016-02-15T20:57:35Z 2016-02-15T20:57:35Z MEMBER

I just downloaded the data, too, and will see if can simply the task graph into something understandable by humans :).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
184351600 https://github.com/pydata/xarray/issues/729#issuecomment-184351600 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4NDM1MTYwMA== mrocklin 306380 2016-02-15T19:16:26Z 2016-02-15T19:16:26Z MEMBER

Looking at the task graph my first guess is that @shoyer is correct, and that we've found another case that the scheduler should be able to handle well, but doesn't. This hasn't happened in a while, but it always leads to improvements whenever we find such a problem.

For a case this complex I think we either need to reduce it to a particular graph motif on which we schedule poorly or we first need to develop a better way to visualize traces of the scheduler's behavior. I've started a separate dask issue: https://github.com/dask/dask/issues/994

For the near future I don't have a solution to @Scheibs 's research problem (sorry!) This will probably require tweaking dask scheduler internals which probably won't happen by me in the next couple of weeks. I'm very happy that people brought this to my attention though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
184344834 https://github.com/pydata/xarray/issues/729#issuecomment-184344834 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4NDM0NDgzNA== mrocklin 306380 2016-02-15T18:51:13Z 2016-02-15T18:51:13Z MEMBER

Slowly taking a look at this now. Large PDF for the full computation, if anyone is interested:

dask.pdf

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
184283995 https://github.com/pydata/xarray/issues/729#issuecomment-184283995 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4NDI4Mzk5NQ== mrocklin 306380 2016-02-15T16:29:42Z 2016-02-15T16:29:42Z MEMBER

Downloaded

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
184157315 https://github.com/pydata/xarray/issues/729#issuecomment-184157315 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4NDE1NzMxNQ== Scheibs 16919188 2016-02-15T10:33:36Z 2016-02-15T10:33:36Z NONE

@mrocklin is that new link working ? Because it's gonna be closed tomorow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
182875655 https://github.com/pydata/xarray/issues/729#issuecomment-182875655 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4Mjg3NTY1NQ== Scheibs 16919188 2016-02-11T13:57:51Z 2016-02-11T13:57:51Z NONE

You don't have to apologize for the delay, i'm already grateful for your time on this !

I don't understand why this download won't work with you; I tried with other and everything was ok... Maybe you can try with this link :
https://sharing.oodrive.com/easyshare/fwd/link=BS6TALAs9CiQg.WNotfITA

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
182663029 https://github.com/pydata/xarray/issues/729#issuecomment-182663029 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4MjY2MzAyOQ== mrocklin 306380 2016-02-11T01:00:45Z 2016-02-11T01:00:45Z MEMBER

My apologies for the slow response (very busy week, lots of exciting stuff, sadly results in poor user response).

I can access that page easily but the download seems to halt after 11MB

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
181418291 https://github.com/pydata/xarray/issues/729#issuecomment-181418291 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4MTQxODI5MQ== Scheibs 16919188 2016-02-08T15:12:21Z 2016-02-08T15:12:21Z NONE

@mrocklin I finally gave you a free acces path on ftp://ftp.irsn.fr/argon/SHARE/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
181401562 https://github.com/pydata/xarray/issues/729#issuecomment-181401562 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4MTQwMTU2Mg== mrocklin 306380 2016-02-08T14:42:49Z 2016-02-08T14:42:49Z MEMBER

mrocklin continuum io

On Mon, Feb 8, 2016 at 1:10 AM, Scheibs notifications@github.com wrote:

@mrocklin https://github.com/mrocklin I put my folder on a ftp server, can I have an email address to send you the login information ?

@shoyer https://github.com/shoyer I have tried replacing sum() by dot() but I get an error "Data Array has no attribute 'dot' "

— Reply to this email directly or view it on GitHub https://github.com/pydata/xarray/issues/729#issuecomment-181265026.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
181265026 https://github.com/pydata/xarray/issues/729#issuecomment-181265026 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4MTI2NTAyNg== Scheibs 16919188 2016-02-08T09:08:20Z 2016-02-08T13:10:25Z NONE

@mrocklin I put my files on ftp://ftp.irsn.fr/argon/SHARE/, everything is in the DEBUG.zip file.

@shoyer I have tried replacing sum() by dot() but I get an error "Data Array has no attribute 'dot' "

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
180808528 https://github.com/pydata/xarray/issues/729#issuecomment-180808528 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE4MDgwODUyOA== mrocklin 306380 2016-02-06T16:47:35Z 2016-02-06T16:47:35Z MEMBER

I would generally send such a large file by hosting it at a web-accessible location. Perhaps you are at an institution were you have access to host files online?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
178503671 https://github.com/pydata/xarray/issues/729#issuecomment-178503671 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE3ODUwMzY3MQ== Scheibs 16919188 2016-02-02T10:45:12Z 2016-02-02T17:07:41Z NONE

@shoyer @mrocklin I can send you my files, with the two netCDF4 source files and the script. I will also join the Bokeh graph for only one variable which fit with memory. Do you know how I can send you the ZIP folder ? It's 830 MB.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
178661780 https://github.com/pydata/xarray/issues/729#issuecomment-178661780 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE3ODY2MTc4MA== shoyer 1217238 2016-02-02T16:14:24Z 2016-02-02T16:14:24Z MEMBER

Something like einsum or a broadcasting matrix multiplication could help a little bit here, by replacing (M * FLUX).sum('Paliers') with M.dot(FLUX, dim='Paliers') and thereby reducing peak memory consumption, but even there I'm calculating peak chunk size at 950000 elements. This should be totally fine on most machines.

@mrocklin Unfortunately, I don't know an easy way to create a copy of a netCDF file with random data, but that's a good idea for a little project....

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
177527321 https://github.com/pydata/xarray/issues/729#issuecomment-177527321 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE3NzUyNzMyMQ== mrocklin 306380 2016-01-31T15:32:44Z 2016-01-31T15:33:01Z MEMBER

Sorry for the delay in response.

Nothing here seems dangerous to me. @shoyer does the writeup above raise any questions for you?

If convenient, it would be interesting to see the output of a few of the dask profilers:

$ conda install bokeh $ pip install cachey

``` python import cachey from dask.diagnostics import CacheProfiler, ResourceProfiler, Profiler, visualize with Profiler() as prof, CacheProfiler(metric=cachey.nbytes) as cprof, ResourceProfiler() as rprof: # call the final dataset.to_netcdf() function

visualize([prof, cprof, rprof], file_path='profile.html') ```

And then upload that file somewhere, perhaps to a gist. In order to make this run to completion you might have to operate on a subset of the dataset.

Alternatively, is there a way for me to recreate a version of this dataset on my local machine? @shoyer is there a way to capture the metadata of netcdf files and reinstantiate empty copies of them on another machine?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
176693726 https://github.com/pydata/xarray/issues/729#issuecomment-176693726 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE3NjY5MzcyNg== Scheibs 16919188 2016-01-29T11:00:02Z 2016-01-29T11:00:02Z NONE

@mrocklin I have tried your lines, it seemed to work but finally I got the memory error message. However the crash was different than what I used to have...

My dataset is a set of several variables, which are calculated like this :

I have DataArray of unitary data for four rainfall height(0,5,10,15 mm/days,"SYMB") and another DataArray with spatialized rainfall height values ("RAIN").

SYMB <xray.DataArray (Hpluie: 4, time: 203, NIsoSource: 56, Denree: 19, Paliers: 23)> dask.array<xray-Fo..., shape=(4, 203, 56, 19, 23), dtype=float64, chunksize=(4, 50, 10, 19, 5)>

RAIN <xray.DataArray (Paliers: 23, DimK0: 1, DimJ0: 37, DimI0: 15)> dask.array<xray-Ra..., shape=(23, 1, 37, 15), dtype=float64, chunksize=(5, 1, 20, 15)>

Then I created an array which is the result of interpolation between this two arrays The function is :

``` python def interp(array,pluie):

p0 = (array.sel(Hpluie = 5) -array.sel(Hpluie = 0))/5
p1 = (array.sel(Hpluie = 10) -array.sel(Hpluie = 5))/5
p2 = (array.sel(Hpluie = 15) -array.sel(Hpluie = 10))/5

interp = (p0*(pluie.where(pluie<5) - array.Hpluie.sel(Hpluie=5)) +array.sel(Hpluie=5)).fillna(0)\
    +(p1*(pluie.where((pluie>=5) & (pluie<=10)) - array.Hpluie.sel(Hpluie=5)) +array.sel(Hpluie=5)).fillna(0)\
    +(p2*(pluie.where(pluie>10) - array.Hpluie.sel(Hpluie=10)) + array.sel(Hpluie=10)).fillna(0)   
return interp

```

So M = interp(SYMB,RAIN)

M <xray.DataArray (time: 203, NIsoSource: 56, Denree: 19, Paliers: 23, DimK0: 1, DimJ0: 37, DimI0: 15)> dask.array<elemwis..., shape=(203, 56, 19, 23, 1, 37, 15), dtype=float64, chunksize=(50, 10, 19, 5, 1, 20, 15)>

My final variable is the product of this interpolated array and an other spatialized variable("FLUX")

FLUX <xray.DataArray 'Flux_Moyen_Depot_Sec' (Paliers: 23, DimK0: 1, DimJ0: 37, DimI0: 15, NIsoSource: 56)> dask.array<xray-Fl..., shape=(23, 1, 37, 15, 56), dtype=float64, chunksize=(5, 1, 20, 15, 10)>

RES = (M * FLUX).sum("Paliers")

RES = <xray.DataArray (DimK0: 1, DimJ0: 37, DimI0: 15, NIsoSource: 56, time: 203, Denree: 19)> dask.array<p_reduc..., shape=(1, 37, 15, 56, 203, 19), dtype=float64, chunksize=(1, 20, 15, 10, 50, 19

I hope this is at less understandable... thanks a lot for your help

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
176385048 https://github.com/pydata/xarray/issues/729#issuecomment-176385048 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE3NjM4NTA0OA== mrocklin 306380 2016-01-28T20:15:33Z 2016-01-28T20:15:33Z MEMBER

@Scheibs can you try calling these lines to remove multi-threading and see if the problem persists?

python import dask dask.set_options(get=dask.async.get_sync)

I agree with @shoyer that it would be very useful to see what you're doing that causes this problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619
176362189 https://github.com/pydata/xarray/issues/729#issuecomment-176362189 https://api.github.com/repos/pydata/xarray/issues/729 MDEyOklzc3VlQ29tbWVudDE3NjM2MjE4OQ== shoyer 1217238 2016-01-28T19:37:17Z 2016-01-28T19:37:17Z MEMBER

This is almost certainly an issue with dask's scheduler. cc @mrocklin

Could you share a summary of how you create this dataset? Is it something that should be possible to calculate in a single pass over the data?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot write dask Dataset to NetCDF file 129150619

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.458ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows