home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

284 rows where user = 306380 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Integration with dask/distributed (xarray backend design) 23
  • groupby on dask objects doesn't handle chunks well 13
  • Support Dask interface 10
  • dask compute on reduction failes with ValueError 10
  • WIP: Zarr backend 9
  • Hooks for XArray operations 9
  • Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 9
  • Cannot write dask Dataset to NetCDF file 8
  • fix distributed writes 8
  • why time grouping doesn't preserve chunks 8
  • BUG: Dask distributed integration tests failing on Travis 7
  • slow performance when storing datasets in gcsfs-backed zarr stores 6
  • Avoid Adapters in task graphs? 6
  • segmentation fault with `open_mfdataset` 5
  • dask.async.RuntimeError: NetCDF: HDF error on xarray to_netcdf 5
  • Remove caching logic from xarray.Variable 4
  • Add persist method to DataSet 4
  • Windows/Python 2.7 tests of dask-distributed failing on master/v0.10.0 4
  • Fix DataArray.__dask_scheduler__ to point to dask.threaded.get 4
  • Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff? 4
  • added some logic to deal with rasterio objects in addition to filepaths 4
  • Support for duck Dask Arrays 4
  • Support out-of-core computation using dask 3
  • Array size changes following loading of numpy array 3
  • Switch to shared Lock (SerializableLock if possible) for reading/writing 3
  • Sparse arrays 3
  • Automatic parallelization for dask arrays in apply_ufunc 3
  • data_array.<tab> reads data 3
  • Add automatic chunking to open_rasterio 3
  • test_apply_dask_new_output_dimension is broken on master with dask-dev 3
  • …

user 1

  • mrocklin · 284 ✖

author_association 1

  • MEMBER 284
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1510444389 https://github.com/pydata/xarray/issues/7716#issuecomment-1510444389 https://api.github.com/repos/pydata/xarray/issues/7716 IC_kwDOAMm_X85aB41l mrocklin 306380 2023-04-16T17:57:26Z 2023-04-16T17:57:26Z MEMBER

That makes sense. Just following up, but this fails today:

yaml name: test-1 channels: - conda-forge dependencies: - xarray - pandas=2

It sounds like this will work itself out though and no further work here needs to be done (unless someone wants to go press some green buttons on conda-forge somewhere)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bad conda solve with pandas 2 1654022522
1510434421 https://github.com/pydata/xarray/issues/7716#issuecomment-1510434421 https://api.github.com/repos/pydata/xarray/issues/7716 IC_kwDOAMm_X85aB2Z1 mrocklin 306380 2023-04-16T17:10:12Z 2023-04-16T17:10:12Z MEMBER

This was the environment, solved on M1 Mac

yaml name: coiled channels: - conda-forge - defaults dependencies: - python==3.10 - dask - dask-ml - coiled - pyarrow - s3fs - matplotlib - ipykernel - dask-labextension - xgboost - pandas=2 - optuna - xarray - geogif - zarr - pip - pip: - git+https://github.com/optuna/optuna

I can try to minify this in a bit, although I'm on airport wifi right now, and it has started to kick me off, I suspect due to these sorts of activities.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bad conda solve with pandas 2 1654022522
1510432559 https://github.com/pydata/xarray/issues/7716#issuecomment-1510432559 https://api.github.com/repos/pydata/xarray/issues/7716 IC_kwDOAMm_X85aB18v mrocklin 306380 2023-04-16T17:01:50Z 2023-04-16T17:01:50Z MEMBER

I'm still running into this today when using only conda-forge

Encountered problems while solving: - package xarray-2023.1.0-pyhd8ed1ab_0 requires pandas >=1.3,<2a0, but none of the providers can be installed

When I add defaults the problem goes away

yaml channels: - conda-forge - defaults

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  bad conda solve with pandas 2 1654022522
925015359 https://github.com/pydata/xarray/issues/5648#issuecomment-925015359 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X843Ip0_ mrocklin 306380 2021-09-22T15:01:06Z 2021-09-22T15:01:06Z MEMBER

It looks like there are some other Dask folks participating. I'll step back and let them take over on our end.

On Wed, Sep 22, 2021 at 9:53 AM Hameer Abbasi @.***> wrote:

I would very much prefer not to be recorded.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/5648#issuecomment-925007468, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTAVX5WJDRARG4AOE43UDHUU5ANCNFSM5BHAMZ3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
924104833 https://github.com/pydata/xarray/issues/5648#issuecomment-924104833 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X843FLiB mrocklin 306380 2021-09-21T15:32:58Z 2021-09-21T15:32:58Z MEMBER

Surprisingly I happen to be free tomorrow at exactly that time. I've blocked it off. If you want to send a calendar invite to mrocklin at coiled that would be welcome.

On Tue, Sep 21, 2021 at 10:27 AM Tom Nicholas @.***> wrote:

TOMORROW: So there is just one slot when essentially everyone said they were free - but it's 11:00am EDT September 22nd, i.e. tomorrow morning.

I appreciate that's late notice - but could we can try for that and if we only get the super keen beans attending this time then we woud still be able to have a useful initial discussion about the exact problems that need resolving?

Alternatively if people could react to this comment with thumbs up / down for "that time is still good" / "no I need more notice" then that would be useful.

@mrocklin https://github.com/mrocklin are you or anyone else who is able to speak for dask free? I noticed that @jacobtomlinson https://github.com/jacobtomlinson put down that you were free tomorrow also?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/5648#issuecomment-924099831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTFTAT62OPPX7Z54MRDUDCP6NANCNFSM5BHAMZ3A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
889468552 https://github.com/pydata/xarray/issues/5648#issuecomment-889468552 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X841BDaI mrocklin 306380 2021-07-29T21:21:38Z 2021-07-29T21:21:38Z MEMBER

I would be happy to attend and look forward to what I'm sure will be a vigorous discussion :) Thank you for providing convenient links to reading materials ahead of time.

As a warning, my responsiveness to github comments these days is not what it used to be. If I miss something here then please forgive me.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
856124510 https://github.com/pydata/xarray/issues/5426#issuecomment-856124510 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1NjEyNDUxMA== mrocklin 306380 2021-06-07T17:31:00Z 2021-06-07T17:31:00Z MEMBER

Also cc'ing @gjoseph92

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852685733 https://github.com/pydata/xarray/issues/5426#issuecomment-852685733 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY4NTczMw== mrocklin 306380 2021-06-02T03:23:35Z 2021-06-02T03:23:35Z MEMBER

I think that the next thing to do here is to try to replicate this locally and watch the stealing logic to figure out why these tasks aren't moving. At this point we're just guessing. @jrbourbeau can I ask you to add this to the stack of issues to have folks look into?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852683916 https://github.com/pydata/xarray/issues/5426#issuecomment-852683916 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY4MzkxNg== mrocklin 306380 2021-06-02T03:18:37Z 2021-06-02T03:18:37Z MEMBER

Yeah, that size being very small shouldn't be a problem

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852675828 https://github.com/pydata/xarray/issues/5426#issuecomment-852675828 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3NTgyOA== mrocklin 306380 2021-06-02T02:58:13Z 2021-06-02T02:58:13Z MEMBER

Hrm, the root dependency does appear to be of type

xarray.core.indexing.ImplicitToExplicitIndexingAdapter with size 48 B

I'm not sure what's going on with it

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852672930 https://github.com/pydata/xarray/issues/5426#issuecomment-852672930 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3MjkzMA== mrocklin 306380 2021-06-02T02:50:28Z 2021-06-02T02:50:28Z MEMBER

This is what it looks like in practice for me FWIW

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852671075 https://github.com/pydata/xarray/issues/5426#issuecomment-852671075 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3MTA3NQ== mrocklin 306380 2021-06-02T02:45:48Z 2021-06-02T02:45:48Z MEMBER

Ideally Dask would be able to be robust to this kind of mis-assignment of object size, but it's particularly hard in this situation. We can't try to serialize these things because if we're wrong and the size actually is massive then we blow out the worker.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852670723 https://github.com/pydata/xarray/issues/5426#issuecomment-852670723 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY3MDcyMw== mrocklin 306380 2021-06-02T02:44:55Z 2021-06-02T02:44:55Z MEMBER

It may also be that we don't want to inline zarr objects (The graph is likely to be cheaper to move if we don't inline them). However we may want Zarr objects to report themselves as easy to move by defining their approximate size with sizeof. The ideal behavior here is that Dask treats zarr stores (or whatever is at the bottom of this graph) as separate tasks, but also as movable tasks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852666752 https://github.com/pydata/xarray/issues/5426#issuecomment-852666752 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2Njc1Mg== mrocklin 306380 2021-06-02T02:34:48Z 2021-06-02T02:34:48Z MEMBER

Do you run into poor load balancing as well when using Zarr with Xarray? My guess here is that there are a few tasks in the graph that report multi-TB sizes and so are highly resistant to being moved around. I haven't verified that though

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852656740 https://github.com/pydata/xarray/issues/5426#issuecomment-852656740 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY1Njc0MA== mrocklin 306380 2021-06-02T02:09:50Z 2021-06-02T02:09:50Z MEMBER

Thinking about this some more, it might be some other object, like a Zarr store, that is on only a couple of these machines. I recall that recently we switched Zarr from being in every task to being in only a few tasks. The problem here might be reversed, that we actually want to view Zarr stores in this case as quite cheap.

cc @TomAugspurger who I think was actively making decisions around that time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
755421422 https://github.com/pydata/xarray/pull/4746#issuecomment-755421422 https://api.github.com/repos/pydata/xarray/issues/4746 MDEyOklzc3VlQ29tbWVudDc1NTQyMTQyMg== mrocklin 306380 2021-01-06T16:50:12Z 2021-01-06T16:50:12Z MEMBER

If anyone here has time to review https://github.com/dask/dask/pull/7033 that would be greatly appreciated :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Faster unstacking 777153550
663148752 https://github.com/pydata/xarray/issues/4208#issuecomment-663148752 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzE0ODc1Mg== mrocklin 306380 2020-07-23T17:57:55Z 2020-07-23T17:57:55Z MEMBER

Dask collections tokenize quickly. We just use the name I think.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663123118 https://github.com/pydata/xarray/issues/4208#issuecomment-663123118 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzEyMzExOA== mrocklin 306380 2020-07-23T17:05:30Z 2020-07-23T17:05:30Z MEMBER

That's exactly what's been done in Pint (see hgrecco/pint#1129)! @dcherian's points go beyond just that and address what Pint hasn't covered yet through the standard collection interface.

Ah, great. My bad.

how do we ask a duck dask array to rechunk itself? pint seems to forward the .rechunk call but that isn't formalized anywhere AFAICT.

I think that you would want to make a pint array rechunk method that called down to the dask array rechunk method. My guess is that this might come up in other situations as well.

less important: should duck dask arrays cache their token somewhere? dask.array uses .name to do this and xarray uses that to check equality cheaply. We can use tokenize of course. But I'm wondering if it's worth asking duck dask arrays to cache their token as an optimization.

I think that implementing the dask.base.normalize_token method should be fine. This will probably be very fast because you're probably just returning the name of the underlying dask array as well as the unit of the pint array/quatity. I don't think that caching would be necessary here.

It's also possible that we could look at the __dask_layers__ method to get this information. My memory is a bit fuzzy here though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663119539 https://github.com/pydata/xarray/issues/4208#issuecomment-663119539 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzExOTUzOQ== mrocklin 306380 2020-07-23T16:58:27Z 2020-07-23T16:58:27Z MEMBER

My guess is that we could steal the xarray.DataArray implementations over to Pint without causing harm.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
663119334 https://github.com/pydata/xarray/issues/4208#issuecomment-663119334 https://api.github.com/repos/pydata/xarray/issues/4208 MDEyOklzc3VlQ29tbWVudDY2MzExOTMzNA== mrocklin 306380 2020-07-23T16:58:06Z 2020-07-23T16:58:06Z MEMBER

In Xarray we implemented the Dask collection spec. https://docs.dask.org/en/latest/custom-collections.html#the-dask-collection-interface

We might want to do that with Pint as well, if they're going to contain Dask things. That way Dask operations like dask.persist, dask.visualize, and dask.compute will work normally.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for duck Dask Arrays 653430454
617198555 https://github.com/pydata/xarray/pull/3989#issuecomment-617198555 https://api.github.com/repos/pydata/xarray/issues/3989 MDEyOklzc3VlQ29tbWVudDYxNzE5ODU1NQ== mrocklin 306380 2020-04-21T13:59:49Z 2020-04-21T13:59:49Z MEMBER

Yeah, my sense here is that it probably makes sense to relax the assertion that only async def functions are supported. Maybe we should warn for a few releases?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix distributed tests on upstream-dev 603937718
615501070 https://github.com/pydata/xarray/issues/3213#issuecomment-615501070 https://api.github.com/repos/pydata/xarray/issues/3213 MDEyOklzc3VlQ29tbWVudDYxNTUwMTA3MA== mrocklin 306380 2020-04-17T23:08:18Z 2020-04-17T23:08:18Z MEMBER

@amueller have you all connected with @hameerabbasi ? I'm not surprised to hear that there are performance issues with pydata/sparse relative to scipy.sparse, but Hameer has historically been pretty open to working to resolve issues quickly. I'm not sure if there is already an ongoing conversation between the two groups, but I'd recommend replacing "we've chosen not to use pydata/sparse because it isn't feature complete enough for us" with "in order for us to use pydata/sparse we would need the following features".

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  How should xarray use/support sparse arrays? 479942077
603635112 https://github.com/pydata/xarray/issues/2692#issuecomment-603635112 https://api.github.com/repos/pydata/xarray/issues/2692 MDEyOklzc3VlQ29tbWVudDYwMzYzNTExMg== mrocklin 306380 2020-03-25T04:34:26Z 2020-03-25T04:34:26Z MEMBER

Gah!

On Tue, Mar 24, 2020 at 8:17 PM Joe Hamman notifications@github.com wrote:

Irony of ironies. We resubmitted our tutorial proposal this year. It was accepted (yay!) BUT there is a good chance the conference will be rescheduled/canceled/virtual. I'll keep this issue updated as more details become available.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2692#issuecomment-603617181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCW7AK7S3N5J7BDI43RJFZU7ANCNFSM4GRDTOIA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray tutorial at SciPy 2019? 400948664
598800439 https://github.com/pydata/xarray/issues/3791#issuecomment-598800439 https://api.github.com/repos/pydata/xarray/issues/3791 MDEyOklzc3VlQ29tbWVudDU5ODgwMDQzOQ== mrocklin 306380 2020-03-13T16:12:53Z 2020-03-13T16:12:53Z MEMBER

I wonder if there are multi-dimensional analogs that might be interesting.

@eric-czech , if you have time to say a bit more about the data and operation that you're trying to do I think it would be an interesting exercise to see how to do that operation with Xarray's current functionality. I wouldn't be surprised to learn that there was some way to do what you wanted that went under a different name here.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Self joins with non-unique indexes 569176457
561921753 https://github.com/pydata/xarray/pull/3584#issuecomment-561921753 https://api.github.com/repos/pydata/xarray/issues/3584 MDEyOklzc3VlQ29tbWVudDU2MTkyMTc1Mw== mrocklin 306380 2019-12-05T01:16:03Z 2019-12-05T01:16:03Z MEMBER

@mrocklin if you get a chance, can you confirm that the values in HighLevelGraph.depedencies should be a subset of the keys of layers?

That sounds like a reasonable expectation, but honestly it's been a while, so I don't fully trust my knowledge here. It might be worth adding some runtime checks into the HighLevelGraph constructor to see where this might be occurring.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Make dask names change when chunking Variables by different amounts. 530657789
557615479 https://github.com/pydata/xarray/issues/3563#issuecomment-557615479 https://api.github.com/repos/pydata/xarray/issues/3563 MDEyOklzc3VlQ29tbWVudDU1NzYxNTQ3OQ== mrocklin 306380 2019-11-22T17:12:07Z 2019-11-22T17:12:07Z MEMBER

You're probably already aware, but https://examples.dask.org and https://github.com/dask/dask-examples might be a nice model to look at.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  environment file for binderized examples 527296094
546053879 https://github.com/pydata/xarray/pull/3425#issuecomment-546053879 https://api.github.com/repos/pydata/xarray/issues/3425 MDEyOklzc3VlQ29tbWVudDU0NjA1Mzg3OQ== mrocklin 306380 2019-10-24T18:52:08Z 2019-10-24T18:52:08Z MEMBER

Thanks @jsignell (and all). I'm really jazzed about this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Html repr 510294810
540843642 https://github.com/pydata/xarray/pull/3276#issuecomment-540843642 https://api.github.com/repos/pydata/xarray/issues/3276 MDEyOklzc3VlQ29tbWVudDU0MDg0MzY0Mg== mrocklin 306380 2019-10-10T23:49:12Z 2019-10-10T23:49:12Z MEMBER

Woo!

On Thu, Oct 10, 2019 at 4:44 PM crusaderky notifications@github.com wrote:

Merged #3276 https://github.com/pydata/xarray/pull/3276 into master.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/3276?email_source=notifications&email_token=AACKZTDCOM7M7XVOVQ6CCA3QN645VA5CNFSM4IS6SKCKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOUEZ5DQI#event-2704527809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTHDHVURZWN5R5TPSXDQN645VANCNFSM4IS6SKCA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  map_blocks 488243328
527187603 https://github.com/pydata/xarray/pull/3258#issuecomment-527187603 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNzE4NzYwMw== mrocklin 306380 2019-09-02T15:37:18Z 2019-09-02T15:37:18Z MEMBER

I'm glad to see progress here. FWIW, I think that many people would be quite happy with a version that just worked for DataArrays, in case that's faster to get in than the full solution with DataSets.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
526756738 https://github.com/pydata/xarray/pull/3258#issuecomment-526756738 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNjc1NjczOA== mrocklin 306380 2019-08-30T21:31:49Z 2019-08-30T21:32:02Z MEMBER

Then you can construct a tuple as a task (1, 2, 3) -> (tuple, [1, 2, 3])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
525966384 https://github.com/pydata/xarray/pull/3258#issuecomment-525966384 https://api.github.com/repos/pydata/xarray/issues/3258 MDEyOklzc3VlQ29tbWVudDUyNTk2NjM4NA== mrocklin 306380 2019-08-28T23:54:48Z 2019-08-28T23:54:48Z MEMBER

Dask doesn't traverse through tuples to find possible keys, so the keys here are hidden from view:

python {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)),

I recommend changing wrapping tuples with lists:

diff - {'a': (('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)), + {'a': [('x', 'y'), ('xarray-a-f178df193efafa67203f3862b3f9f0f4', 0, 0)],

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  [WIP] Add map_blocks. 484752930
518355147 https://github.com/pydata/xarray/pull/3117#issuecomment-518355147 https://api.github.com/repos/pydata/xarray/issues/3117 MDEyOklzc3VlQ29tbWVudDUxODM1NTE0Nw== mrocklin 306380 2019-08-05T18:53:39Z 2019-08-05T18:53:39Z MEMBER

Woot! Thanks @nvictus !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for __array_function__ implementers (sparse arrays) [WIP] 467771005
517311370 https://github.com/pydata/xarray/pull/3117#issuecomment-517311370 https://api.github.com/repos/pydata/xarray/issues/3117 MDEyOklzc3VlQ29tbWVudDUxNzMxMTM3MA== mrocklin 306380 2019-08-01T14:27:13Z 2019-08-01T14:27:13Z MEMBER

Checking in here. This was a fun project during SciPy Sprints that both showed a lot of potential and generated a lot of excitement. But of course as we all returned home other things came up and this has lingered for a while.

How can we best preserve this work? Two specific questions:

  1. @nvictus can you summarize, from your perspective, what still needs to be done here? If you aren't likely to have time to finish this up (which would not be surprising) where should someone else start if they wanted to push this forward?
  2. Xarray devs, are there parts of this PR that could go in quickly, even without the full implementation, just so that this doesn't grow stale?
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for __array_function__ implementers (sparse arrays) [WIP] 467771005
514687031 https://github.com/pydata/xarray/pull/2255#issuecomment-514687031 https://api.github.com/repos/pydata/xarray/issues/2255 MDEyOklzc3VlQ29tbWVudDUxNDY4NzAzMQ== mrocklin 306380 2019-07-24T15:43:04Z 2019-07-24T15:43:04Z MEMBER

I'm glad to hear it! I'm curious, are there features in rioxarray that could be pushed upstream?

On Wed, Jul 24, 2019 at 8:39 AM Alan D. Snow notifications@github.com wrote:

I've abandoned this PR. If anyone has time to pick it up, that would be welcome.

I appreciate you staring this! Based on this PR, I added the feature into rioxarray here: corteva/rioxarray#31 https://github.com/corteva/rioxarray/pull/31 (released in version 0.0.9).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/2255?email_source=notifications&email_token=AACKZTHDF5BOUXGTTZ55M3DQBBZRJA5CNFSM4FHJIU5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WXTEQ#issuecomment-514685330, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKZTDNYY2CK4WV4FBASGTQBBZRJANCNFSM4FHJIU5A .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add automatic chunking to open_rasterio 336371511
514682988 https://github.com/pydata/xarray/pull/2255#issuecomment-514682988 https://api.github.com/repos/pydata/xarray/issues/2255 MDEyOklzc3VlQ29tbWVudDUxNDY4Mjk4OA== mrocklin 306380 2019-07-24T15:33:10Z 2019-07-24T15:33:10Z MEMBER

I've abandoned this PR. If anyone has time to pick it up, that would be welcome. I think that it would have positive impact.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add automatic chunking to open_rasterio 336371511
513504716 https://github.com/pydata/xarray/pull/1820#issuecomment-513504716 https://api.github.com/repos/pydata/xarray/issues/1820 MDEyOklzc3VlQ29tbWVudDUxMzUwNDcxNg== mrocklin 306380 2019-07-20T22:48:30Z 2019-07-20T22:48:30Z MEMBER

I'll say that I'm looking forward to this getting in, mostly so that I can raise an issue about adding Dask's chunked array images :)

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  WIP: html repr 287844110
513504690 https://github.com/pydata/xarray/pull/1820#issuecomment-513504690 https://api.github.com/repos/pydata/xarray/issues/1820 MDEyOklzc3VlQ29tbWVudDUxMzUwNDY5MA== mrocklin 306380 2019-07-20T22:47:57Z 2019-07-20T22:47:57Z MEMBER

It's too bad that CSS isn't processed with untrusted inputs. How do Iris and Dask deal with this limitation?

Yeah, we just use raw HTML

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: html repr 287844110
511209094 https://github.com/pydata/xarray/issues/1375#issuecomment-511209094 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMTIwOTA5NA== mrocklin 306380 2019-07-14T14:50:45Z 2019-07-14T14:50:45Z MEMBER

@nvictus has been working on this at #3117

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
510947988 https://github.com/pydata/xarray/issues/1938#issuecomment-510947988 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDUxMDk0Nzk4OA== mrocklin 306380 2019-07-12T16:23:08Z 2019-07-12T16:23:08Z MEMBER

@jacobtomlinson got things sorta-working with NEP-18 and CuPy in an afternoon in Iris (with a strong emphasis on "kinda").

On the CuPy side you're fine. If you're on NumPy 1.16 you'll need to enable the __array_function__ interface with the following environment variable:

export NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1

If you're using Numpy 1.17 then this is on by default.

I think that most of the work here is on the Xarray side. We'll need to remove things like explicit type checks.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
510943157 https://github.com/pydata/xarray/issues/1375#issuecomment-510943157 https://api.github.com/repos/pydata/xarray/issues/1375 MDEyOklzc3VlQ29tbWVudDUxMDk0MzE1Nw== mrocklin 306380 2019-07-12T16:07:42Z 2019-07-12T16:07:42Z MEMBER

@rgommers might be able to recommend someone

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sparse arrays 221858543
507046745 https://github.com/pydata/xarray/issues/1627#issuecomment-507046745 https://api.github.com/repos/pydata/xarray/issues/1627 MDEyOklzc3VlQ29tbWVudDUwNzA0Njc0NQ== mrocklin 306380 2019-06-30T15:49:08Z 2019-06-30T15:49:08Z MEMBER

Thought I'd bump this (hopefully no one minds). I think that this is great!

{
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  html repr of xarray object (for the notebook) 264747372
504758362 https://github.com/pydata/xarray/pull/3027#issuecomment-504758362 https://api.github.com/repos/pydata/xarray/issues/3027 MDEyOklzc3VlQ29tbWVudDUwNDc1ODM2Mg== mrocklin 306380 2019-06-23T14:39:46Z 2019-06-23T14:39:46Z MEMBER

Does the green check mark here mean that we're all good @shoyer ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Ensure explicitly indexed arrays are preserved 456963929
502749637 https://github.com/pydata/xarray/pull/3027#issuecomment-502749637 https://api.github.com/repos/pydata/xarray/issues/3027 MDEyOklzc3VlQ29tbWVudDUwMjc0OTYzNw== mrocklin 306380 2019-06-17T16:11:44Z 2019-06-17T16:11:44Z MEMBER

I think that relaxing the astype constrain seems quite reasonable. I'll clean this up on the Dask side.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Ensure explicitly indexed arrays are preserved 456963929
502550448 https://github.com/pydata/xarray/issues/3009#issuecomment-502550448 https://api.github.com/repos/pydata/xarray/issues/3009 MDEyOklzc3VlQ29tbWVudDUwMjU1MDQ0OA== mrocklin 306380 2019-06-17T06:24:12Z 2019-06-17T06:24:12Z MEMBER

OK, reproduced. I'll take a look later today. Thanks for pointing me to that @max-sixty .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray test suite failing with dask-master 454168102
502432894 https://github.com/pydata/xarray/issues/3009#issuecomment-502432894 https://api.github.com/repos/pydata/xarray/issues/3009 MDEyOklzc3VlQ29tbWVudDUwMjQzMjg5NA== mrocklin 306380 2019-06-16T08:42:36Z 2019-06-16T08:42:36Z MEMBER

I believe that this is now resolved. Please let me know

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray test suite failing with dask-master 454168102
502290814 https://github.com/pydata/xarray/issues/3022#issuecomment-502290814 https://api.github.com/repos/pydata/xarray/issues/3022 MDEyOklzc3VlQ29tbWVudDUwMjI5MDgxNA== mrocklin 306380 2019-06-14T21:48:12Z 2019-06-14T21:48:12Z MEMBER

https://github.com/pydata/xarray/blob/7e4bf8623891c4e564bbaede706e1d69c614b74b/xarray/tests/test_duck_array_ops.py#L291-L297

(Pdb) pp times CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00, 2000-01-03 00:00:00, 2000-01-04 00:00:00], dtype='object') (Pdb) pp da <xarray.DataArray (time: 4)> dask.array<shape=(4,), dtype=object, chunksize=(4,)> Coordinates: * time (time) object 2000-01-01 00:00:00 ... 2000-01-04 00:00:00 (Pdb) pp da.data dask.array<xarray-<this-array>, shape=(4,), dtype=object, chunksize=(4,)> (Pdb) pp da.data._meta PandasIndexAdapter(array=CFTimeIndex([], dtype='object'), dtype=dtype('O'))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  LazilyOuterIndexedArray doesn't support slicing with slice objects 456239422
501707221 https://github.com/pydata/xarray/issues/3009#issuecomment-501707221 https://api.github.com/repos/pydata/xarray/issues/3009 MDEyOklzc3VlQ29tbWVudDUwMTcwNzIyMQ== mrocklin 306380 2019-06-13T13:40:10Z 2019-06-13T13:40:10Z MEMBER

cc @pentschev

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray test suite failing with dask-master 454168102
484178174 https://github.com/pydata/xarray/issues/2692#issuecomment-484178174 https://api.github.com/repos/pydata/xarray/issues/2692 MDEyOklzc3VlQ29tbWVudDQ4NDE3ODE3NA== mrocklin 306380 2019-04-17T17:06:37Z 2019-04-17T17:06:37Z MEMBER

There is usually a BoF at the end of the conference around planning for the next conference. I suggest that a few of us show up and see if we can get engaged in the process for next year.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray tutorial at SciPy 2019? 400948664
480862757 https://github.com/pydata/xarray/issues/2873#issuecomment-480862757 https://api.github.com/repos/pydata/xarray/issues/2873 MDEyOklzc3VlQ29tbWVudDQ4MDg2Mjc1Nw== mrocklin 306380 2019-04-08T14:45:50Z 2019-04-08T14:45:50Z MEMBER

I'm also unable to reproduce this on my local MacBook Pro, though I haven't tried with the same versions as you have here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask distributed tests fail locally 430188626
480835950 https://github.com/pydata/xarray/issues/2873#issuecomment-480835950 https://api.github.com/repos/pydata/xarray/issues/2873 MDEyOklzc3VlQ29tbWVudDQ4MDgzNTk1MA== mrocklin 306380 2019-04-08T13:40:04Z 2019-04-08T13:40:04Z MEMBER

That does not look familiar to me, no. Two questions:

  1. If you extend that timeout do things look better? If so, do you notice that starting or stopping Python processes on your machine take a long time?
  2. Is this intermittent, or consistent?
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dask distributed tests fail locally 430188626
478290795 https://github.com/pydata/xarray/issues/2692#issuecomment-478290795 https://api.github.com/repos/pydata/xarray/issues/2692 MDEyOklzc3VlQ29tbWVudDQ3ODI5MDc5NQ== mrocklin 306380 2019-03-30T21:28:28Z 2019-03-30T21:28:28Z MEMBER

Looking at the tutorial schedule it looks like it was not accepted, but that there is a TBA slot. Any information here @jhamman ? Did you all receive a rejection response?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray tutorial at SciPy 2019? 400948664
472141327 https://github.com/pydata/xarray/issues/2807#issuecomment-472141327 https://api.github.com/repos/pydata/xarray/issues/2807 MDEyOklzc3VlQ29tbWVudDQ3MjE0MTMyNw== mrocklin 306380 2019-03-12T19:09:58Z 2019-03-12T19:09:58Z MEMBER

The challenge is that with dask's lazy evaluation, we don't know the structure of the returned objects until after evaluating the wrapped functions. So we can't rebuild xarray objects unless we require redundantly specify all the coordinates and attributes from the return values.

Typically in Dask we run the user defined function on an empty version of the data and hope that it provides an appropriately shaped output. If it fails during this process, we ask the user to provide sufficient information for us to populate metadata. Maybe something similar would work here? Xarray would construct a dummy Xarray chunk, apply the user defined function onto that chunk, and then extrapolate metadata out from there somehow.

I'm likely glossing over several important details, but hopefully the general gist of what I'm trying to convey above is somewhat sensible, even if not doable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  can the callables of apply_ufunc + dask get a typed/labeled array 420139027
465770077 https://github.com/pydata/xarray/pull/2782#issuecomment-465770077 https://api.github.com/repos/pydata/xarray/issues/2782 MDEyOklzc3VlQ29tbWVudDQ2NTc3MDA3Nw== mrocklin 306380 2019-02-20T21:54:15Z 2019-02-20T21:54:15Z MEMBER

I'm glad to see this. I'll also be curious to see what the performance will look like.

cc @llllllllll

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  enable loading remote hdf5 files 412645481
449531351 https://github.com/pydata/xarray/pull/2589#issuecomment-449531351 https://api.github.com/repos/pydata/xarray/issues/2589 MDEyOklzc3VlQ29tbWVudDQ0OTUzMTM1MQ== mrocklin 306380 2018-12-22T00:43:16Z 2018-12-22T00:43:16Z MEMBER

``` mrocklin@carbon:~$ conda search rasterio=1 Loading channels: done

Name Version Build Channel

rasterio 1.0.13 py27hc38cc03_0 pkgs/main
rasterio 1.0.13 py36hc38cc03_0 pkgs/main
rasterio 1.0.13 py37hc38cc03_0 pkgs/main
```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  added some logic to deal with rasterio objects in addition to filepaths 387123860
449530939 https://github.com/pydata/xarray/pull/2589#issuecomment-449530939 https://api.github.com/repos/pydata/xarray/issues/2589 MDEyOklzc3VlQ29tbWVudDQ0OTUzMDkzOQ== mrocklin 306380 2018-12-22T00:38:34Z 2018-12-22T00:38:34Z MEMBER

For whatever reason the conda defaults channel hasn't been updated since 0.36 (Jun 14, 2016!).

It looks like @jjhelmus resolved this upstream . It seems like https://github.com/ContinuumIO/anaconda-issues is a good issue tracker to know :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  added some logic to deal with rasterio objects in addition to filepaths 387123860
448747075 https://github.com/pydata/xarray/pull/2589#issuecomment-448747075 https://api.github.com/repos/pydata/xarray/issues/2589 MDEyOklzc3VlQ29tbWVudDQ0ODc0NzA3NQ== mrocklin 306380 2018-12-19T21:17:52Z 2018-12-19T21:17:52Z MEMBER

https://github.com/ContinuumIO/anaconda-issues/issues/10443

On Wed, Dec 19, 2018 at 4:14 PM Jonathan J. Helmus notifications@github.com wrote:

@jjhelmus https://github.com/jjhelmus is there a good way to report things like this other than pinging you directly?

Opening and issue in the anaconda-issues https://github.com/ContinuumIO/anaconda-issues repository is the best option at this time for requesting a package update.

I'm looking at updating the rasterio package in defaults this week. Something should be available by the end of the week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/2589#issuecomment-448746048, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszMr1wxsC7gjEdyH3dfrDU5kig_LBks5u6qwugaJpZM4Y_8DQ .

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  added some logic to deal with rasterio objects in addition to filepaths 387123860
448699184 https://github.com/pydata/xarray/pull/2589#issuecomment-448699184 https://api.github.com/repos/pydata/xarray/issues/2589 MDEyOklzc3VlQ29tbWVudDQ0ODY5OTE4NA== mrocklin 306380 2018-12-19T18:34:47Z 2018-12-19T18:34:47Z MEMBER

For whatever reason the conda defaults channel hasn't been updated since 0.36 (Jun 14, 2016!).

@jjhelmus is there a good way to report things like this other than pinging you directly? (which I'm more than happy to continue doing :))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  added some logic to deal with rasterio objects in addition to filepaths 387123860
440064660 https://github.com/pydata/xarray/issues/1815#issuecomment-440064660 https://api.github.com/repos/pydata/xarray/issues/1815 MDEyOklzc3VlQ29tbWVudDQ0MDA2NDY2MA== mrocklin 306380 2018-11-19T22:27:31Z 2018-11-19T22:27:31Z MEMBER

FYI @magonser

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc(dask='parallelized') with multiple outputs 287223508
432016733 https://github.com/pydata/xarray/pull/2500#issuecomment-432016733 https://api.github.com/repos/pydata/xarray/issues/2500 MDEyOklzc3VlQ29tbWVudDQzMjAxNjczMw== mrocklin 306380 2018-10-22T22:42:11Z 2018-10-22T22:42:11Z MEMBER

I'm not sure that I understand the failure here. Can someone verify that this is related to these changes?

``` =================================== FAILURES =================================== ____ TestCfGrib.test_read ____ self = <xarray.tests.test_backends.TestCfGrib object at 0x7fd47fc30b00> def test_read(self): expected = {'number': 2, 'time': 3, 'air_pressure': 2, 'latitude': 3, 'longitude': 4} with open_example_dataset('example.grib', engine='cfgrib') as ds:

      assert ds.dims == expected

E AssertionError: assert Frozen(Sorted...ngitude': 4})) == {'air_pressure...mber': 2, ...} E Full diff: E - Frozen(SortedKeysDict({'number': 2, 'time': 3, 'isobaricInhPa': 2, 'latitude': 3, 'longitude': 4})) E + {'air_pressure': 2, 'latitude': 3, 'longitude': 4, 'number': 2, 'time': 3} xarray/tests/test_backends.py:2473: AssertionError ___ TestCfGrib.test_read_filter_by_keys ____ self = <xarray.tests.test_backends.TestCfGrib object at 0x7fd4904b12b0> def test_read_filter_by_keys(self): kwargs = {'filter_by_keys': {'shortName': 't'}} expected = {'number': 2, 'time': 3, 'air_pressure': 2, 'latitude': 3, 'longitude': 4} with open_example_dataset('example.grib', engine='cfgrib', backend_kwargs=kwargs) as ds: assert ds.dims == expected E AssertionError: assert Frozen(Sorted...ngitude': 4})) == {'air_pressure...mber': 2, ...} E Full diff: E - Frozen(SortedKeysDict({'number': 2, 'time': 3, 'isobaricInhPa': 2, 'latitude': 3, 'longitude': 4})) E + {'air_pressure': 2, 'latitude': 3, 'longitude': 4, 'number': 2, 'time': 3} xarray/tests/test_backends.py:2483: AssertionError ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid use of deprecated get= parameter in tests 372640063
431934087 https://github.com/pydata/xarray/pull/2500#issuecomment-431934087 https://api.github.com/repos/pydata/xarray/issues/2500 MDEyOklzc3VlQ29tbWVudDQzMTkzNDA4Nw== mrocklin 306380 2018-10-22T18:49:11Z 2018-10-22T18:49:11Z MEMBER

Yes, github is still having issues. https://status.github.com/messages

We are working through the backlogs of webhook deliveries and Pages builds. We continue to monitor as the site recovers.

On Mon, Oct 22, 2018 at 2:48 PM Maximilian Roos notifications@github.com wrote:

Thanks @mrocklin https://github.com/mrocklin

Anyone have any idea why Travis isn't running? Is it the GH issues from this morning? Looks like the latest run was a couple of days ago: https://travis-ci.org/pydata/xarray/pull_requests

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/2500#issuecomment-431933524, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszMBivzvQsiHHkWkGP2F2kYaafzmeks5unhLggaJpZM4Xz6dG .

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid use of deprecated get= parameter in tests 372640063
429309135 https://github.com/pydata/xarray/issues/2480#issuecomment-429309135 https://api.github.com/repos/pydata/xarray/issues/2480 MDEyOklzc3VlQ29tbWVudDQyOTMwOTEzNQ== mrocklin 306380 2018-10-12T12:29:47Z 2018-10-12T12:29:47Z MEMBER

This should be fixed with https://github.com/dask/dask/pull/4081

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  test_apply_dask_new_output_dimension is broken on master with dask-dev 369310993
429156168 https://github.com/pydata/xarray/issues/2480#issuecomment-429156168 https://api.github.com/repos/pydata/xarray/issues/2480 MDEyOklzc3VlQ29tbWVudDQyOTE1NjE2OA== mrocklin 306380 2018-10-11T23:34:31Z 2018-10-11T23:34:31Z MEMBER

No need to bother with the reproducible example.

As a warning, there might be some increased churn like this if we move forward with some of the proposed dask array changes.

On Thu, Oct 11, 2018, 7:32 PM Matthew Rocklin mrocklin@gmail.com wrote:

Yeah, I noticed this too. I have a fix already in a PR

On Thu, Oct 11, 2018, 5:24 PM Stephan Hoyer notifications@github.com wrote:

Example build failure: https://travis-ci.org/pydata/xarray/jobs/439949937

=================================== FAILURES =================================== ___ test_apply_dask_new_output_dimension ___ @requires_dask def test_apply_dask_new_output_dimension(): import dask.array as da

    array = da.ones((2, 2), chunks=(1, 1))
    data_array = xr.DataArray(array, dims=('x', 'y'))

    def stack_negative(obj):
        def func(x):
            return np.stack([x, -x], axis=-1)
        return apply_ufunc(func, obj, output_core_dims=[['sign']],
                           dask='parallelized', output_dtypes=[obj.dtype],
                           output_sizes={'sign': 2})

    expected = stack_negative(data_array.compute())

    actual = stack_negative(data_array)
    assert actual.dims == ('x', 'y', 'sign')
    assert actual.shape == (2, 2, 2)
    assert isinstance(actual.data, da.Array)
  assert_identical(expected, actual)

xarray/tests/test_computation.py:737:


xarray/tests/test_computation.py:24: in assert_identical assert a.identical(b), msg xarray/core/dataarray.py:1923: in identical self._all_compat(other, 'identical')) xarray/core/dataarray.py:1875: in _all_compat compat(self, other)) xarray/core/dataarray.py:1872: in compat return getattr(x.variable, compat_str)(y.variable) xarray/core/variable.py:1461: in identical self.equals(other)) xarray/core/variable.py:1439: in equals equiv(self.data, other.data))) xarray/core/duck_array_ops.py:144: in array_equiv arr1, arr2 = as_like_arrays(arr1, arr2) xarray/core/duck_array_ops.py:128: in as_like_arrays return tuple(np.asarray(d) for d in data) xarray/core/duck_array_ops.py:128: in <genexpr> return tuple(np.asarray(d) for d in data) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/numpy/core/numeric.py:501: in asarray return array(a, dtype, copy=False, order=order) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/core.py:1118: in array x = self.compute() ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:156: in compute (result,) = compute(self, traverse=False, kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:390: in compute dsk = collections_to_dsk(collections, optimize_graph, kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in collections_to_dsk for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in <listcomp> for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/optimization.py:41: in optimize dsk = ensure_dict(dsk) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/utils.py:830: in ensure_dict result.update(dd) ../../../miniconda/envs/test_env/lib/python3.6/_collections_abc.py:720: in iter yield from self._mapping ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:168: in iter return iter(self._dict) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:160: in _dict concatenate=self.concatenate ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: in top keytups = list(itertools.product(*[range(dims[i]) for i in out_indices]))


.0 = <tuple_iterator object at 0x7f606ba84fd0>

keytups = list(itertools.product(*[range(dims[i]) for i in out_indices])) E KeyError: '.0' ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: KeyError

My guess is that this is somehow related to @mrocklin https://github.com/mrocklin's recent refactor of dask.array.atop: dask/dask#3998 https://github.com/dask/dask/pull/3998

If the cause isn't obvious, I'll try to come up with a simple dask only example that reproduces it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2480, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKUTpGqeSQEiX7MA79jHNrRqO-tLks5uj7cZgaJpZM4XYZPZ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  test_apply_dask_new_output_dimension is broken on master with dask-dev 369310993
429155894 https://github.com/pydata/xarray/issues/2480#issuecomment-429155894 https://api.github.com/repos/pydata/xarray/issues/2480 MDEyOklzc3VlQ29tbWVudDQyOTE1NTg5NA== mrocklin 306380 2018-10-11T23:32:59Z 2018-10-11T23:32:59Z MEMBER

Yeah, I noticed this too. I have a fix already in a PR

On Thu, Oct 11, 2018, 5:24 PM Stephan Hoyer notifications@github.com wrote:

Example build failure: https://travis-ci.org/pydata/xarray/jobs/439949937

=================================== FAILURES =================================== ___ test_apply_dask_new_output_dimension ___ @requires_dask def test_apply_dask_new_output_dimension(): import dask.array as da

    array = da.ones((2, 2), chunks=(1, 1))
    data_array = xr.DataArray(array, dims=('x', 'y'))

    def stack_negative(obj):
        def func(x):
            return np.stack([x, -x], axis=-1)
        return apply_ufunc(func, obj, output_core_dims=[['sign']],
                           dask='parallelized', output_dtypes=[obj.dtype],
                           output_sizes={'sign': 2})

    expected = stack_negative(data_array.compute())

    actual = stack_negative(data_array)
    assert actual.dims == ('x', 'y', 'sign')
    assert actual.shape == (2, 2, 2)
    assert isinstance(actual.data, da.Array)
  assert_identical(expected, actual)

xarray/tests/test_computation.py:737:


xarray/tests/test_computation.py:24: in assert_identical assert a.identical(b), msg xarray/core/dataarray.py:1923: in identical self._all_compat(other, 'identical')) xarray/core/dataarray.py:1875: in _all_compat compat(self, other)) xarray/core/dataarray.py:1872: in compat return getattr(x.variable, compat_str)(y.variable) xarray/core/variable.py:1461: in identical self.equals(other)) xarray/core/variable.py:1439: in equals equiv(self.data, other.data))) xarray/core/duck_array_ops.py:144: in array_equiv arr1, arr2 = as_like_arrays(arr1, arr2) xarray/core/duck_array_ops.py:128: in as_like_arrays return tuple(np.asarray(d) for d in data) xarray/core/duck_array_ops.py:128: in <genexpr> return tuple(np.asarray(d) for d in data) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/numpy/core/numeric.py:501: in asarray return array(a, dtype, copy=False, order=order) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/core.py:1118: in array x = self.compute() ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:156: in compute (result,) = compute(self, traverse=False, kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:390: in compute dsk = collections_to_dsk(collections, optimize_graph, kwargs) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in collections_to_dsk for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/base.py:194: in <listcomp> for opt, (dsk, keys) in groups.items()])) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/optimization.py:41: in optimize dsk = ensure_dict(dsk) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/utils.py:830: in ensure_dict result.update(dd) ../../../miniconda/envs/test_env/lib/python3.6/_collections_abc.py:720: in iter yield from self._mapping ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:168: in iter return iter(self._dict) ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:160: in _dict concatenate=self.concatenate ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: in top keytups = list(itertools.product(*[range(dims[i]) for i in out_indices]))


.0 = <tuple_iterator object at 0x7f606ba84fd0>

keytups = list(itertools.product(*[range(dims[i]) for i in out_indices])) E KeyError: '.0' ../../../miniconda/envs/test_env/lib/python3.6/site-packages/dask/array/top.py:305: KeyError

My guess is that this is somehow related to @mrocklin https://github.com/mrocklin's recent refactor of dask.array.atop: dask/dask#3998 https://github.com/dask/dask/pull/3998

If the cause isn't obvious, I'll try to come up with a simple dask only example that reproduces it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2480, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKUTpGqeSQEiX7MA79jHNrRqO-tLks5uj7cZgaJpZM4XYZPZ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  test_apply_dask_new_output_dimension is broken on master with dask-dev 369310993
417084035 https://github.com/pydata/xarray/issues/2390#issuecomment-417084035 https://api.github.com/repos/pydata/xarray/issues/2390 MDEyOklzc3VlQ29tbWVudDQxNzA4NDAzNQ== mrocklin 306380 2018-08-29T19:54:49Z 2018-08-29T19:54:49Z MEMBER

Sorry, I meant plot.imshow()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Why are there two compute calls for plot? 355308699
417076999 https://github.com/pydata/xarray/issues/2389#issuecomment-417076999 https://api.github.com/repos/pydata/xarray/issues/2389 MDEyOklzc3VlQ29tbWVudDQxNzA3Njk5OQ== mrocklin 306380 2018-08-29T19:32:17Z 2018-08-29T19:32:17Z MEMBER

I wouldn't expect this to sway things too much, but yes, there is a chance that that would happen.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812
417072024 https://github.com/pydata/xarray/issues/2389#issuecomment-417072024 https://api.github.com/repos/pydata/xarray/issues/2389 MDEyOklzc3VlQ29tbWVudDQxNzA3MjAyNA== mrocklin 306380 2018-08-29T19:15:10Z 2018-08-29T19:15:10Z MEMBER

It would be nice if dask had a way to consolidate the serialization of these objects, rather than separately serializing them in each task.

You can make it a separate task (often done by wrapping with dask.delayed) and then use that key within other objets. This does create a data dependency though, which can make the graph somewhat more complex.

In normal use of Pickle these things are cached and reused. Unfortunately we can't do this because we're sending the tasks to different machines, each of which will need to deserialize independently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Large pickle overhead in ds.to_netcdf() involving dask.delayed functions 355264812
406572020 https://github.com/pydata/xarray/issues/2298#issuecomment-406572020 https://api.github.com/repos/pydata/xarray/issues/2298 MDEyOklzc3VlQ29tbWVudDQwNjU3MjAyMA== mrocklin 306380 2018-07-20T11:20:59Z 2018-07-20T11:20:59Z MEMBER

Two thoughts:

  1. We can push some of this into Dask with https://github.com/dask/dask/issues/2538
  2. The full lazy ndarray solution would be a good application of the __array_function__ protocol
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Making xarray math lazy 342180429
401028473 https://github.com/pydata/xarray/pull/2255#issuecomment-401028473 https://api.github.com/repos/pydata/xarray/issues/2255 MDEyOklzc3VlQ29tbWVudDQwMTAyODQ3Mw== mrocklin 306380 2018-06-28T13:08:58Z 2018-06-29T14:00:19Z MEMBER

```python import os if not os.path.exists('myfile.tif'): import requests response = requests.get('https://oin-hotosm.s3.amazonaws.com/5abae68e65bd8f00110f3e42/0/5abae68e65bd8f00110f3e43.tif') with open('myfile.tif', 'wb') as f: f.write(response.content)

import dask dask.config.set({'array.chunk-size': '1MiB'})

import xarray as xr ds = xr.open_rasterio('myfile.tif', chunks=True) # this only reads metadata to start

ds.chunks ((1, 1, 1), (1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 136), (1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 995)) ```

Also depends on https://github.com/dask/dask/pull/3679 . Without that PR it will use values that are similar, but don't precisely align with 1024.

Oh, I should point out that the image has tiles of size (512, 512)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add automatic chunking to open_rasterio 336371511
398838600 https://github.com/pydata/xarray/issues/2237#issuecomment-398838600 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODgzODYwMA== mrocklin 306380 2018-06-20T17:48:49Z 2018-06-20T17:48:49Z MEMBER

I've implemented something here: https://github.com/dask/dask/pull/3648

Playing with it would be welcome.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
398586226 https://github.com/pydata/xarray/issues/2237#issuecomment-398586226 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODU4NjIyNg== mrocklin 306380 2018-06-20T00:26:39Z 2018-06-20T00:26:39Z MEMBER

Thanks. This example helps.

As you can see, if you concatenate together the first set of indices and index by the second set of indices, it would arrange them into sequential integers.

I'm not sure I understand this.

The situation on the whole does seem sensible to me though. This starts to look a little bit like a proper shuffle situation (using dataframe terminology). Each of your 365 output partitions would presumably touch 1/12th of your input partitions, leading to a quadratic number of tasks. If after doing something you then wanted to rearrange your data back then presumably that would require an equivalent number of extra tasks.

Am I understanding the situation correctly?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
398582100 https://github.com/pydata/xarray/issues/2237#issuecomment-398582100 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODU4MjEwMA== mrocklin 306380 2018-06-19T23:59:58Z 2018-06-19T23:59:58Z MEMBER

So if you're willing to humor me for a moment with dask.array examples, if you have an array that's currently partitioned by month:

x = da.ones((1000, ...), chunks=(30, ...))  # approximately

And you do something by time.dayofyear, what do you end up doing to the array in dask array operations? Sorry to be a bit slow here. I'm not as familiar with how XArray translates its groupby operations to dask.array operations under the hood.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
398581508 https://github.com/pydata/xarray/issues/2237#issuecomment-398581508 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODU4MTUwOA== mrocklin 306380 2018-06-19T23:56:22Z 2018-06-19T23:56:22Z MEMBER

So my question was "if you're grouping data by month, and it's already partitioned by month, then why are the indices out of order?" However it may be that you've answer this in your most recent comment, I'm not sure. It may also be that I'm not understanding the situation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
398577207 https://github.com/pydata/xarray/issues/2237#issuecomment-398577207 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODU3NzIwNw== mrocklin 306380 2018-06-19T23:29:37Z 2018-06-19T23:29:37Z MEMBER

That said, it's still probably more graceful to fail by creating too many small tasks rather than one giant task.

Maybe. We'll blow out the scheduler with too many tasks. With one large task we'll probably just start losing workers from memory errors.

In your example what does the chunking of the indexed array likely to look like? How is the interaction between contiguous regions of the index and the chunk structure of the indexed array?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
398575620 https://github.com/pydata/xarray/issues/2237#issuecomment-398575620 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODU3NTYyMA== mrocklin 306380 2018-06-19T23:20:23Z 2018-06-19T23:20:23Z MEMBER

It's also probably worth thinking about the kind of operations you're trying to do, and how streamable they are. For example, if you were to take a dataset that was partitioned chronologically by month and then do some sort of day-of-month grouping then that would require the full dataset to be in memory at once.

If you're doing something like grouping on every month (keeping months of different years separate) then presumably your index is already sorted, and so you should be fine with the current behavior.

It might be useful to take a look at how the various XArray cases you care about convert to dask array slicing operations.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
398573000 https://github.com/pydata/xarray/issues/2237#issuecomment-398573000 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODU3MzAwMA== mrocklin 306380 2018-06-19T23:03:53Z 2018-06-19T23:03:53Z MEMBER

OK, so lowering down to a dask array conversation, lets look at a couple examples. First, lets look at the behavior of a sorted index:

```python import dask.array as da x = da.ones((20, 20), chunks=(4, 5)) x.chunks

((4, 4, 4, 4, 4), (5, 5, 5, 5))

```

If we index that array with a sorted index, we are able to efficiently preserve chunking:

```python import numpy as np

x[np.arange(20), :].chunks

((4, 4, 4, 4, 4), (5, 5, 5, 5))

x[np.arange(20) // 2, :].chunks

((8, 8, 4), (5, 5, 5, 5))

```

However if the index isn't sorted then everything goes into one big chunk:

```python x[np.arange(20) % 3, :].chunks

((20,), (5, 5, 5, 5))

```

We could imagine a few alternatives here:

  1. Make a chunk for every element in the index
  2. Make a chunk for every contiguous run in the index. So here we would have chunk dimensions of size 3 matching the 0, 1, 2, 0, 1, 2, 0, 1, 2 pattern of our index.

I don't really have a strong intuition for how the xarray operations transform into dask array operations (my brain is a bit tired right now, so thinking is hard) but my guess is that they would benefit from the second case. (A pure dask.array example would be welcome).

Now we have to consider how enacting a policy like "put contiguous index regions into the same chunk" might go wrong, and how we might defend against it generally.

python x = da.ones(10000, chunks=(100,)) # 100 chunks of size 100 index = np.array([0, 100, 200, 300, ..., 1, 101, 201, 301, ..., 2, 102, 202, 302, ...]) x[index]

In the example above we have a hundred input chunks and a hundred contiguous regions in our index. Seems good. However each output chunk touches each input chunk, so this will likely create 10,000 tasks, which we should probably consider a fail case here.

So we learn that we need to look pretty carefully at how the values within the index interact with the chunk structure in order to know if we can do this well. This isn't an insurmountable problem, but isn't trivial either.

In principle we're looking for a function that takes in two inputs:

  1. The chunks of a single dimension like x.chunks[i] or (4, 4, 4, 4, 4) from our first example
  2. An index like np.arange(20) % 3 from our first example

And outputs a bunch of smaller indexes to pass on to various chunks. However, it hopefully does this in a way that is efficient, and fails early if it's going to emit a bunch of very small slices.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
398500088 https://github.com/pydata/xarray/issues/2238#issuecomment-398500088 https://api.github.com/repos/pydata/xarray/issues/2238 MDEyOklzc3VlQ29tbWVudDM5ODUwMDA4OA== mrocklin 306380 2018-06-19T18:31:04Z 2018-06-19T18:31:04Z MEMBER

We can add this back in. I anticipate having to do a bugfix release within a week or two. Long term you probably want to do the following:

python from dask.utils import get_scheduler actual_get = get_scheduler(get=get, collections=[collection]) 

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Failing test with dask_distributed 333480301
398218407 https://github.com/pydata/xarray/issues/2237#issuecomment-398218407 https://api.github.com/repos/pydata/xarray/issues/2237 MDEyOklzc3VlQ29tbWVudDM5ODIxODQwNw== mrocklin 306380 2018-06-18T22:43:25Z 2018-06-18T22:43:25Z MEMBER

I think that it would be useful to consider many possible cases of how people might want to chunk dask arrays with out-of-order indices, and the desired chunking outputs. XArray users like those here can provide some of those use cases. We'll have to gather others from other communities. Maybe once we have enough use cases gathered then rules for what correct behavior should be will emerge?

On Mon, Jun 18, 2018 at 5:16 PM Stephan Hoyer notifications@github.com wrote:

I vaguely recall discussing chunks that result from indexing somewhere in the dask issue tracker (when we added the special case for a monotonic increasing indexer to preserve chunks), but I can't find it now.

I think the challenge is that it isn't obvious what the right chunksizes should be. Chunks that are too small also have negative performance implications. Maybe the automatic chunking logic that @mrocklin https://github.com/mrocklin has been looking into recently would be relevant here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2237#issuecomment-398198466, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJeod5WLFa94XQo_6AwKwBdSpC9Rks5t-Bi3gaJpZM4Ur8XO .

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  why time grouping doesn't preserve chunks 333312849
397615169 https://github.com/pydata/xarray/issues/2234#issuecomment-397615169 https://api.github.com/repos/pydata/xarray/issues/2234 MDEyOklzc3VlQ29tbWVudDM5NzYxNTE2OQ== mrocklin 306380 2018-06-15T13:10:56Z 2018-06-15T13:10:56Z MEMBER

Replicated on pangeo.pydata.org. I created a local cluster on pangeo and found that things worked fine, suggesting that it was due to a version mismatch between the client and workers.

I then ran client.get_versions(check=True) and found that many things were very far out of sync, which made me curious to see if we were using the right image. Looking at my worker_template.yaml file and at the cluster.pod_template everything looks fine. I think that the next step is to verify the contents of the worker image. I'm headed out the door at the moment though. I can try to take another look at this in a bit. Alternatively I would welcome others carrying this on.

On Fri, Jun 15, 2018 at 8:59 AM Ryan Abernathey notifications@github.com wrote:

Update, this does work on my laptop with the following versions installed

INSTALLED VERSIONS

commit: 66be9c5db7d86ea385c3a4cd4295bfce67e3f25b python: 3.6.2.final.0 python-bits: 64 OS: Darwin OS-release: 16.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

xarray: 0.10.7+2.g66be9c5d pandas: 0.20.3 numpy: 1.13.1 scipy: 0.19.1 netCDF4: 1.3.1 h5netcdf: 0.4.1 h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.2 distributed: 1.21.6 matplotlib: 2.1.0 cartopy: 0.15.1 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.1 conda: None pytest: 3.5.0 IPython: 6.1.0 sphinx: 1.6.5

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2234#issuecomment-397612641, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszOsLUB3SP5kMDYTU4fCVosQvTuuvks5t86_HgaJpZM4UphG2 .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fillna error with distributed 332762756
385501221 https://github.com/pydata/xarray/issues/2042#issuecomment-385501221 https://api.github.com/repos/pydata/xarray/issues/2042 MDEyOklzc3VlQ29tbWVudDM4NTUwMTIyMQ== mrocklin 306380 2018-04-30T19:20:04Z 2018-04-30T19:20:04Z MEMBER

gdal can read/write windows:

I'm aware. See this doc listed above for rasterio: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html#writing

Background here is that rasterio more-or-less wraps around GDAL, but with interfaces that are somewhat more idiomatic to this community.

I wonder how you got that to work other than setting up a slave read process that handles all reads.

We've run into these issues before as well. Typically we handle them with locks of various types.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff?  312203596
385488636 https://github.com/pydata/xarray/issues/2042#issuecomment-385488636 https://api.github.com/repos/pydata/xarray/issues/2042 MDEyOklzc3VlQ29tbWVudDM4NTQ4ODYzNg== mrocklin 306380 2018-04-30T18:34:21Z 2018-04-30T18:34:21Z MEMBER

My first attempt would be to use this API: https://rasterio.readthedocs.io/en/latest/topics/windowed-rw.html#writing

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff?  312203596
385488169 https://github.com/pydata/xarray/issues/2042#issuecomment-385488169 https://api.github.com/repos/pydata/xarray/issues/2042 MDEyOklzc3VlQ29tbWVudDM4NTQ4ODE2OQ== mrocklin 306380 2018-04-30T18:32:44Z 2018-04-30T18:32:44Z MEMBER

My impression is that there are will be some (significant) development challenges

If you're able to expand on this that would be welcome.

that perhaps only supports a few rasterio file formats

My hope would be that rasterio/GDAL would handle the many-file-format issue for us if they support writing in chunks. I also lack experience here though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff?  312203596
385452187 https://github.com/pydata/xarray/issues/2093#issuecomment-385452187 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NTQ1MjE4Nw== mrocklin 306380 2018-04-30T16:27:37Z 2018-04-30T16:27:37Z MEMBER

My guess is that geotiff chunks will be much smaller than is ideal for dask.array. We might want to expand those chunk sizes by some multiple.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
385451826 https://github.com/pydata/xarray/issues/2042#issuecomment-385451826 https://api.github.com/repos/pydata/xarray/issues/2042 MDEyOklzc3VlQ29tbWVudDM4NTQ1MTgyNg== mrocklin 306380 2018-04-30T16:26:13Z 2018-04-30T16:26:13Z MEMBER

When writing https://github.com/pydata/xarray/issues/2093 I came across this issue and thought I'd weigh in.

The GIS community seems like a fairly close neighbor to XArray's current community. Some API compatibility here might be a good to expand the community. I definitely agree that GeoTiff does not implement the full XArray model, but it might be useful to support the subset of datasets that do, just so that round-trip operations can occur. For example, it might be nice if the following worked:

```python dset = xr.open_rasterio(...)

do modest modifications to dest

dset.to_rasterio(...) ```

My hope would be that the rasterio/GDAL data model would be consistent enough so that we could detect and err early if the dataset was not well-formed.

{
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Anyone working on a to_tiff? Alternatively, how do you write an xarray to a geotiff?  312203596
383817119 https://github.com/pydata/xarray/issues/2074#issuecomment-383817119 https://api.github.com/repos/pydata/xarray/issues/2074 MDEyOklzc3VlQ29tbWVudDM4MzgxNzExOQ== mrocklin 306380 2018-04-24T06:22:39Z 2018-04-24T06:22:39Z MEMBER

When doing benchmarks with things that might call BLAS operations in multiple threads I recommend setting the OMP_NUM_THREADS environment variable to 1. This will avoid oversubscription.

On Mon, Apr 23, 2018 at 7:32 PM, Keisuke Fujii notifications@github.com wrote:

@crusaderky https://github.com/crusaderky , Thanks for the detailed benchmarking. Further note:

  • xr.dot uses tensordot if possible, as when I implemented dask did not have einsum. In the other cases, we use dask.atop with np.einsum.

In your example, bench(100, False, ['t'], '...i,...i') uses dask.tensordot , bench(100, True, ['t'], '...i,...i') uses np.einsum.

bench(100, True, [], ...i,...i->...i) also uses np.einsum. But I have no idea yet why dot(a, b, dims=[]) is faster than a * b.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2074#issuecomment-383754980, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszD_CL-zC6QgDunKQVaIGCiQA7u5Jks5trmSUgaJpZM4TfDSk .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.dot() dask problems 316618290
383651390 https://github.com/pydata/xarray/issues/2074#issuecomment-383651390 https://api.github.com/repos/pydata/xarray/issues/2074 MDEyOklzc3VlQ29tbWVudDM4MzY1MTM5MA== mrocklin 306380 2018-04-23T17:12:04Z 2018-04-23T17:12:04Z MEMBER

See also https://github.com/dask/dask/issues/2225

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.dot() dask problems 316618290
383109977 https://github.com/pydata/xarray/issues/1938#issuecomment-383109977 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM4MzEwOTk3Nw== mrocklin 306380 2018-04-20T14:15:38Z 2018-04-20T14:15:38Z MEMBER

Thanks for taking the initiative here @hameerabbasi ! It's good to see something up already.

Here is a link to the discussion that I think @hameerabbasi is referring to: http://numpy-discussion.10968.n7.nabble.com/new-NEP-np-AbstractArray-and-np-asabstractarray-tt45282.html#none

I haven't read through that entirely yet, was arrayish decided on by the community or was the term still up for discussion?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
383104966 https://github.com/pydata/xarray/issues/1938#issuecomment-383104966 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM4MzEwNDk2Ng== mrocklin 306380 2018-04-20T13:59:23Z 2018-04-20T13:59:23Z MEMBER

Happy with arrayish too

On Fri, Apr 20, 2018 at 9:59 AM, Matthew Rocklin mrocklin@gmail.com wrote:

What name should we go with? I have a slight preference for duckarray over arrayish but happy with whatever the group decides.

On Fri, Apr 20, 2018 at 1:51 AM, Hameer Abbasi notifications@github.com wrote:

I've created one, as per your e-mail: https://github.com/hameerabbas i/arrayish

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1938#issuecomment-382985783, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJ0A1I96lO8uHy4rO2Oj_35znavlks5tqXdJgaJpZM4SQsHy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
383104907 https://github.com/pydata/xarray/issues/1938#issuecomment-383104907 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM4MzEwNDkwNw== mrocklin 306380 2018-04-20T13:59:09Z 2018-04-20T13:59:09Z MEMBER

What name should we go with? I have a slight preference for duckarray over arrayish but happy with whatever the group decides.

On Fri, Apr 20, 2018 at 1:51 AM, Hameer Abbasi notifications@github.com wrote:

I've created one, as per your e-mail: https://github.com/ hameerabbasi/arrayish

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1938#issuecomment-382985783, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszJ0A1I96lO8uHy4rO2Oj_35znavlks5tqXdJgaJpZM4SQsHy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
382901777 https://github.com/pydata/xarray/issues/1938#issuecomment-382901777 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM4MjkwMTc3Nw== mrocklin 306380 2018-04-19T22:36:48Z 2018-04-19T22:36:48Z MEMBER

Doing this externally sounds sensible to me. Thoughts on a good name? duck_array seems to be free on PyPI

On Thu, Apr 19, 2018 at 4:23 PM, Stephan Hoyer notifications@github.com wrote:

This library would have hard dependencies only on numpy and multipledispatch, and would expose a multipledispatch namespace so extending it doesn't have to happen in the library itself.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1938#issuecomment-382868997, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszI-z2bvzo597NWGzF0E9J486VBbHks5tqPJNgaJpZM4SQsHy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
382709490 https://github.com/pydata/xarray/issues/1938#issuecomment-382709490 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM4MjcwOTQ5MA== mrocklin 306380 2018-04-19T12:05:22Z 2018-04-19T12:05:22Z MEMBER

In https://github.com/pydata/sparse/issues/1#issuecomment-370248174 @shoyer mentions that some work could likely progress in XArray before deciding on the VarArgs in multipledispatch. If XArray maintainers have time it might be valuable to lay out how that would look so that other devs can try it out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
371813468 https://github.com/pydata/xarray/issues/1895#issuecomment-371813468 https://api.github.com/repos/pydata/xarray/issues/1895 MDEyOklzc3VlQ29tbWVudDM3MTgxMzQ2OA== mrocklin 306380 2018-03-09T13:35:38Z 2018-03-09T13:35:38Z MEMBER

If things are operational then we're fine. It may be that a lot of this cost was due to other serialization things in gcsfs, zarr, or other.

On Fri, Mar 9, 2018 at 12:33 AM, Joe Hamman notifications@github.com wrote:

Where did we land here? Is there an action item that came from this discussion?

In my view, the benefit of having consistent getitem behavior for all of our backends is worth working through potential hiccups in the way dask interacts with xarray.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1895#issuecomment-371718136, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszGISdLyCz1vL3SwpdNv8CplC5hi1ks5tchQNgaJpZM4R9Svr .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid Adapters in task graphs? 295270362
371561783 https://github.com/pydata/xarray/issues/1974#issuecomment-371561783 https://api.github.com/repos/pydata/xarray/issues/1974 MDEyOklzc3VlQ29tbWVudDM3MTU2MTc4Mw== mrocklin 306380 2018-03-08T17:32:08Z 2018-03-08T17:32:08Z MEMBER

Seeing a good thing twice never hurts. The audience is likely not entirely the same. It's also probably the motivation for their interest. It might be useful as an introduction.

On Thu, Mar 8, 2018 at 12:30 PM, Alistair Miles notifications@github.com wrote:

Actually just realising @rabernat https://github.com/rabernat and @mrocklin https://github.com/mrocklin you guys already demoed all of this to ESIP back in January (really nice talk btw). So maybe I don't need to repeat.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1974#issuecomment-371561259, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszCa5pvXIfJk7I1qgIEOtlXqUK3zoks5tcWqwgaJpZM4ShNMy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray/zarr cloud demo 303270676
371548028 https://github.com/pydata/xarray/issues/1974#issuecomment-371548028 https://api.github.com/repos/pydata/xarray/issues/1974 MDEyOklzc3VlQ29tbWVudDM3MTU0ODAyOA== mrocklin 306380 2018-03-08T16:49:38Z 2018-03-08T16:49:38Z MEMBER

Recorded video if you want: https://youtu.be/rSOJKbfNBNk

On Thu, Mar 8, 2018 at 11:38 AM, Alistair Miles notifications@github.com wrote:

Ha, Murphy's law. Shame because the combination of jupyterlab interface, launching a kubernetes cluster, and being able to click through to the Dask dashboard looks futuristic cool :-) I was really looking forward to seeing all my jobs spinning through the Dask dashboard as they work. I actually have a pretty packed talk already so don't absolutely need to include this, but if it does come back in time I'll slot it in. Talk starts 8pm GMT so still a few hours yet...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1974#issuecomment-371544386, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszL5yxyELO4rZQbSdzooGZ7t9bfjGks5tcV6bgaJpZM4ShNMy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray/zarr cloud demo 303270676
371462262 https://github.com/pydata/xarray/issues/1971#issuecomment-371462262 https://api.github.com/repos/pydata/xarray/issues/1971 MDEyOklzc3VlQ29tbWVudDM3MTQ2MjI2Mg== mrocklin 306380 2018-03-08T11:35:25Z 2018-03-08T11:35:25Z MEMBER

FWIW most of the logic within the dask collections (array, dataframe, delayed) is only tested with dask.local.get_sync. This also makes the test suite much faster.

Obviously though for things like writing to disk it's useful to check different schedulers.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should we be testing against multiple dask schedulers? 302930480
368575548 https://github.com/pydata/xarray/issues/1873#issuecomment-368575548 https://api.github.com/repos/pydata/xarray/issues/1873 MDEyOklzc3VlQ29tbWVudDM2ODU3NTU0OA== mrocklin 306380 2018-02-26T17:11:32Z 2018-02-26T17:11:32Z MEMBER

From Anaconda's David Mason (I don't know his github handle):

Readthedocs has confirmed that they have a bug with the way they are handling redirections on their community site. They opened their own bug ticket to work on resolving the issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Documentation is inaccessible via HTTPS 293272998
368569053 https://github.com/pydata/xarray/issues/1873#issuecomment-368569053 https://api.github.com/repos/pydata/xarray/issues/1873 MDEyOklzc3VlQ29tbWVudDM2ODU2OTA1Mw== mrocklin 306380 2018-02-26T16:52:21Z 2018-02-26T16:52:21Z MEMBER

Not really, no. I tend to push these upstream to either Anaconda's IT or NumFOCUS. cc @aterrel

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Documentation is inaccessible via HTTPS 293272998
368267730 https://github.com/pydata/xarray/issues/1938#issuecomment-368267730 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM2ODI2NzczMA== mrocklin 306380 2018-02-24T23:11:28Z 2018-02-24T23:11:28Z MEMBER

cc @jcrist , who has historically been interested in how we solve this problem within dask.array

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
368159542 https://github.com/pydata/xarray/issues/1938#issuecomment-368159542 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM2ODE1OTU0Mg== mrocklin 306380 2018-02-23T22:41:54Z 2018-02-23T22:41:54Z MEMBER

I would want to see how magical it was. @llllllllll 's calibration of "mild metaprogramming" may differ slightly from my own :)

Eventually if multipledispatch becomes a dependency of xarray then we should consider changing the decision-making process away from being just me though. Relatedly, SymPy also just adopted it (by vendoring) as a dependency.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
368068500 https://github.com/pydata/xarray/issues/1938#issuecomment-368068500 https://api.github.com/repos/pydata/xarray/issues/1938 MDEyOklzc3VlQ29tbWVudDM2ODA2ODUwMA== mrocklin 306380 2018-02-23T16:54:37Z 2018-02-23T16:54:37Z MEMBER

Import times on multipledispatch have improved thanks to work by @llllllllll . They could probably be further improved if people wanted to invest modest intellectual effort here. Costs scale with the number of type signatures on each operation. In blaze this was very high, well into the hundreds, in our case it would be, I think, more modest around 2-10. (also, historical note, multipledispatch predates my involvement in Blaze).

When possible it would be useful to upstream these concerns to NumPy, even if we have to move faster than NumPy is able to support.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hooks for XArray operations 299668148
367802779 https://github.com/pydata/xarray/issues/1935#issuecomment-367802779 https://api.github.com/repos/pydata/xarray/issues/1935 MDEyOklzc3VlQ29tbWVudDM2NzgwMjc3OQ== mrocklin 306380 2018-02-22T19:58:55Z 2018-02-22T19:58:55Z MEMBER

+1 on reporting upstream if convenient

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Not compatible with PyPy and dask.array. 299346082

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 32.741ms · About: xarray-datasette