home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

39 rows where author_association = "NONE" and user = 3019665 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 23

  • Implementing map_blocks and map_overlap 4
  • recent versions of sparse and dask seem to be incompatible with our tests 4
  • Add compute=False keywords to `to_foo` functions 3
  • Use pytorch as backend for xarrays 3
  • Switch py2.7 CI build to use conda-forge 2
  • WIP: Zarr backend 2
  • xarray.dot() dask problems 2
  • Opening from zarr.ZipStore fails to read (store???) unicode characters 2
  • Duck array compatibility meeting 2
  • docs on specifying chunks in to_zarr encoding arg 2
  • dask compute on reduction failes with ValueError 1
  • slow performance when storing datasets in gcsfs-backed zarr stores 1
  • fix distributed writes 1
  • implement Gradient 1
  • Zarr loading from ZipStore gives error on default arguments 1
  • map_blocks 1
  • Allow nested dictionaries in the Zarr backend (#3517) 1
  • Feature Request: Hierarchical storage and processing in xarray 1
  • Expose xarray's h5py serialization capabilites as public API? 1
  • Awkward array backend? 1
  • Update `sparse` `test_chunk` xfail 1
  • Support NumPy array API (experimental) 1
  • Do we need to update AbstractArray for duck arrays? 1

user 1

  • jakirkham · 39 ✖

author_association 1

  • NONE · 39 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1252772587 https://github.com/pydata/xarray/issues/4285#issuecomment-1252772587 https://api.github.com/repos/pydata/xarray/issues/4285 IC_kwDOAMm_X85Kq8rr jakirkham 3019665 2022-09-20T18:48:47Z 2022-09-20T18:48:47Z NONE

cc @ivirshup @joshmoore (who may be interested in this as well)

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Awkward array backend? 667864088
1232159535 https://github.com/pydata/xarray/issues/4242#issuecomment-1232159535 https://api.github.com/repos/pydata/xarray/issues/4242 IC_kwDOAMm_X85JcUMv jakirkham 3019665 2022-08-30T20:56:42Z 2022-08-30T20:56:42Z NONE

FWIW this sounds similar to what h5pickle does. Maybe it is worth improving that package with whatever logic Xarray has?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Expose xarray's h5py serialization capabilites as public API? 663148659
1198743015 https://github.com/pydata/xarray/issues/4118#issuecomment-1198743015 https://api.github.com/repos/pydata/xarray/issues/4118 IC_kwDOAMm_X85Hc13n jakirkham 3019665 2022-07-29T00:14:46Z 2022-07-29T00:14:46Z NONE

Wanted to note issue ( https://github.com/carbonplan/ndpyramid/issues/10 ) here, which may be of interest to people here.

Also we are thinking about a Dask blogpost in this space if people have thoughts on what we should include and/or are interested in being involved. Details in issue ( https://github.com/dask/dask-blog/issues/141 ).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature Request: Hierarchical storage and processing in xarray 628719058
1198655444 https://github.com/pydata/xarray/issues/6845#issuecomment-1198655444 https://api.github.com/repos/pydata/xarray/issues/6845 IC_kwDOAMm_X85HcgfU jakirkham 3019665 2022-07-28T21:33:03Z 2022-07-28T21:33:03Z NONE

Probably out of my depth here (so please forgive me), but one thing that might be worth looking at is Array API support, which CuPy 10+ supports and Dask is working on support for ( https://github.com/dask/dask/pull/8750 ). Believe XArray is taking some initial steps in this direction recently ( https://github.com/pydata/xarray/pull/6804 ), but could easily be misunderstanding the scope/intended usage of the changes there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Do we need to update AbstractArray for duck arrays? 1321228754
1191765132 https://github.com/pydata/xarray/pull/6804#issuecomment-1191765132 https://api.github.com/repos/pydata/xarray/issues/6804 IC_kwDOAMm_X85HCOSM jakirkham 3019665 2022-07-21T17:43:20Z 2022-07-21T17:43:20Z NONE

cc @rgommers (for awareness)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support NumPy array API (experimental) 1307709199
1190589331 https://github.com/pydata/xarray/issues/3232#issuecomment-1190589331 https://api.github.com/repos/pydata/xarray/issues/3232 IC_kwDOAMm_X85G9vOT jakirkham 3019665 2022-07-20T18:01:56Z 2022-07-20T18:01:56Z NONE

While it is true to use PyTorch Tensors directly, one would need the Array API implemented in PyTorch. One could use them indirectly by converting them zero-copy to CuPy arrays, which do have Array API support

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
1122811102 https://github.com/pydata/xarray/pull/6542#issuecomment-1122811102 https://api.github.com/repos/pydata/xarray/issues/6542 IC_kwDOAMm_X85C7Lze jakirkham 3019665 2022-05-10T20:06:06Z 2022-05-10T20:06:06Z NONE

@jakirkham were you thinking a reference to the dask docs for more info on optimal chunk sizing and aligning with storage?

It could make sense to refer to or if similar ideas come up here it may be worth mentioning in this change

or are you suggesting the proposed docs change is too complex?

Not at all.

I was trying to address the lack of documentation on specifying chunks within a zarr array for non-dask arrays/coordinates, but also covering the weedsy (but common) case of datasets with a mix of dask & in-memory arrays/coords like in my example. I have been frustrated by zarr stores I've written with a couple dozen array chunks and thousands of coordinate chunks for this reason, but it's definitely a gnarly topic to cover concisely :P

If there's anything you need help with or would like to discuss, please don't hesitate to raise a Zarr issue. We also enabled GH discussions over there so if that fits better feel free to use that 🙂

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  docs on specifying chunks in to_zarr encoding arg 1221393104
1121430268 https://github.com/pydata/xarray/pull/6542#issuecomment-1121430268 https://api.github.com/repos/pydata/xarray/issues/6542 IC_kwDOAMm_X85C16r8 jakirkham 3019665 2022-05-09T18:23:03Z 2022-05-09T18:23:03Z NONE

FWIW there's a similar doc page about chunk size in Dask that may be worth borrowing from

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
  docs on specifying chunks in to_zarr encoding arg 1221393104
1100938882 https://github.com/pydata/xarray/issues/3147#issuecomment-1100938882 https://api.github.com/repos/pydata/xarray/issues/3147 IC_kwDOAMm_X85Bnv6C jakirkham 3019665 2022-04-17T19:44:03Z 2022-04-17T19:44:03Z NONE

Would be good to keep this open

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing map_blocks and map_overlap 470024896
941268682 https://github.com/pydata/xarray/issues/5648#issuecomment-941268682 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X844Gp7K jakirkham 3019665 2021-10-12T18:26:17Z 2021-10-12T18:26:17Z NONE

If you haven't already, would be good if those running into issues here could look over the Array API. This is still something that is being worked on, but the goal is to standardize Array APIs. If there are things missing from that, it would be good to hear about them in a new issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
924111284 https://github.com/pydata/xarray/issues/5648#issuecomment-924111284 https://api.github.com/repos/pydata/xarray/issues/5648 IC_kwDOAMm_X843FNG0 jakirkham 3019665 2021-09-21T15:40:28Z 2021-09-21T15:40:28Z NONE

Maybe too soon to ask, but do we have a link to the video call?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Duck array compatibility meeting 956103236
908626589 https://github.com/pydata/xarray/pull/5751#issuecomment-908626589 https://api.github.com/repos/pydata/xarray/issues/5751 IC_kwDOAMm_X842KIqd jakirkham 3019665 2021-08-30T19:28:37Z 2021-08-30T19:28:37Z NONE

@Illviljan please let us know if there's anything else needed here 🙂

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Update `sparse` `test_chunk` xfail 983032639
908582707 https://github.com/pydata/xarray/issues/5654#issuecomment-908582707 https://api.github.com/repos/pydata/xarray/issues/5654 IC_kwDOAMm_X842J98z jakirkham 3019665 2021-08-30T18:28:27Z 2021-08-30T18:28:27Z NONE

Thanks Hameer! 😄

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  recent versions of sparse and dask seem to be incompatible with our tests 957131705
906684668 https://github.com/pydata/xarray/issues/5654#issuecomment-906684668 https://api.github.com/repos/pydata/xarray/issues/5654 IC_kwDOAMm_X842Cuj8 jakirkham 3019665 2021-08-26T19:30:53Z 2021-08-26T19:30:53Z NONE

cc @pentschev (just so you are aware)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  recent versions of sparse and dask seem to be incompatible with our tests 957131705
903169462 https://github.com/pydata/xarray/issues/5654#issuecomment-903169462 https://api.github.com/repos/pydata/xarray/issues/5654 IC_kwDOAMm_X8411UW2 jakirkham 3019665 2021-08-21T19:59:40Z 2021-08-21T19:59:40Z NONE

Yeah was just mentioning that since we had older version of sparse pulled while developing that PR at one point and it caused issues. Sounds like that is not the case here

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  recent versions of sparse and dask seem to be incompatible with our tests 957131705
903015237 https://github.com/pydata/xarray/issues/5654#issuecomment-903015237 https://api.github.com/repos/pydata/xarray/issues/5654 IC_kwDOAMm_X8410utF jakirkham 3019665 2021-08-21T00:05:35Z 2021-08-21T00:09:08Z NONE

Would double check that CI is pulling the latest sparse

xref: https://github.com/dask/dask/pull/7939#issuecomment-887122942

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  recent versions of sparse and dask seem to be incompatible with our tests 957131705
668263428 https://github.com/pydata/xarray/issues/3147#issuecomment-668263428 https://api.github.com/repos/pydata/xarray/issues/3147 MDEyOklzc3VlQ29tbWVudDY2ODI2MzQyOA== jakirkham 3019665 2020-08-03T22:02:22Z 2020-08-03T22:02:22Z NONE

Yeah +1 for using pad instead. Had tried to get rid of map_overlap's padding and use da.pad in Dask as well ( https://github.com/dask/dask/pull/5052 ), but haven't had time to get back to that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing map_blocks and map_overlap 470024896
606354369 https://github.com/pydata/xarray/issues/3232#issuecomment-606354369 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDYwNjM1NDM2OQ== jakirkham 3019665 2020-03-31T02:07:47Z 2020-03-31T02:07:47Z NONE

Well here's a blogpost on using Dask + CuPy. Maybe start there and build up to using Xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
606262540 https://github.com/pydata/xarray/issues/3232#issuecomment-606262540 https://api.github.com/repos/pydata/xarray/issues/3232 MDEyOklzc3VlQ29tbWVudDYwNjI2MjU0MA== jakirkham 3019665 2020-03-30T21:31:18Z 2020-03-30T21:31:18Z NONE

Yeah Jacob and I played with this a few months back. There were some issues, but my recollection is pretty hazy. If someone gives this another try, it would be interesting to hear how things go.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use pytorch as backend for xarrays 482543307
604207177 https://github.com/pydata/xarray/issues/3815#issuecomment-604207177 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwNDIwNzE3Nw== jakirkham 3019665 2020-03-26T03:26:02Z 2020-03-26T03:26:02Z NONE

Sure an upstream issue would be welcome. Thanks for unpacking that further Mark 😀

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
604022035 https://github.com/pydata/xarray/issues/3815#issuecomment-604022035 https://api.github.com/repos/pydata/xarray/issues/3815 MDEyOklzc3VlQ29tbWVudDYwNDAyMjAzNQ== jakirkham 3019665 2020-03-25T18:51:51Z 2020-03-25T18:51:51Z NONE

Sorry I don't know. Maybe @rabernat can advise? 🙂

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Opening from zarr.ZipStore fails to read (store???) unicode characters 573577844
554045517 https://github.com/pydata/xarray/pull/3526#issuecomment-554045517 https://api.github.com/repos/pydata/xarray/issues/3526 MDEyOklzc3VlQ29tbWVudDU1NDA0NTUxNw== jakirkham 3019665 2019-11-14T19:37:13Z 2019-11-14T19:37:13Z NONE

Yeah this probably works as these are just JSON files. That said, IDK that we are making any attempt to ensure this works. IOW I don't think this is tested or in the spec.

Additionally IDK that we do the same decoding on nested dictionaries as would be done on a flat dictionary. Meaning non-JSON values like datetime64/timedelta64 might not be handled correctly in this case.

Could be wrong about these things. Those are just my immediate thoughts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow nested dictionaries in the Zarr backend (#3517) 522519084
540846421 https://github.com/pydata/xarray/pull/3276#issuecomment-540846421 https://api.github.com/repos/pydata/xarray/issues/3276 MDEyOklzc3VlQ29tbWVudDU0MDg0NjQyMQ== jakirkham 3019665 2019-10-11T00:03:25Z 2019-10-11T00:03:25Z NONE

Congratulations! 🎉

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  map_blocks 488243328
513044413 https://github.com/pydata/xarray/issues/3147#issuecomment-513044413 https://api.github.com/repos/pydata/xarray/issues/3147 MDEyOklzc3VlQ29tbWVudDUxMzA0NDQxMw== jakirkham 3019665 2019-07-19T00:33:55Z 2019-07-19T00:42:03Z NONE

Another approach for the split_by_chunks implementation would be...

python def split_by_chunks(a): for sl in da.core.slices_from_chunks(a.chunks): yield (sl, a[sl])

While a little bit more cumbersome to write, this could be implemented with .blocks and may be a bit more performant.

python def split_by_chunks(a): for i, sl in zip(np.ndindex(a.numblocks), da.core.slices_from_chunks(a.chunks)): yield (sl, a.blocks[i])

If the slices are not strictly needed, this could be simplified a bit more.

python def split_by_chunks(a): for i in np.ndindex(a.numblocks): yield a.blocks[i]

Admittedly slices_from_chunks is an internal utility function. Though it is unlikely to change. We could consider exposing it as part of the API if that is useful.

We could consider other things like making .blocks iterable, which could make this more friendly as well. Raised issue ( https://github.com/dask/dask/issues/5117 ) on this point.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing map_blocks and map_overlap 470024896
513029753 https://github.com/pydata/xarray/issues/3147#issuecomment-513029753 https://api.github.com/repos/pydata/xarray/issues/3147 MDEyOklzc3VlQ29tbWVudDUxMzAyOTc1Mw== jakirkham 3019665 2019-07-18T23:22:11Z 2019-07-18T23:22:11Z NONE

That sounds somewhat similar to .blocks accessor in Dask Array. ( https://github.com/dask/dask/pull/3689 ) Maybe we should align on that as well?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing map_blocks and map_overlap 470024896
453807790 https://github.com/pydata/xarray/issues/2586#issuecomment-453807790 https://api.github.com/repos/pydata/xarray/issues/2586 MDEyOklzc3VlQ29tbWVudDQ1MzgwNzc5MA== jakirkham 3019665 2019-01-13T07:11:23Z 2019-01-13T07:11:23Z NONE

I'm not really familiar with XArray's internals, but issue ( https://github.com/pydata/xarray/issues/2660 ) looks relevant.

What happens if you do?

python ds.to_zarr(zarr.group(zarr.ZipStore("test.zarr"))) print(xr.open_zarr(zarr.group(zarr.ZipStore("test.zarr"))))

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Zarr loading from ZipStore gives error on default arguments 386515973
422893434 https://github.com/pydata/xarray/pull/2398#issuecomment-422893434 https://api.github.com/repos/pydata/xarray/issues/2398 MDEyOklzc3VlQ29tbWVudDQyMjg5MzQzNA== jakirkham 3019665 2018-09-19T17:38:42Z 2018-09-19T17:38:42Z NONE

@shoyer, have you seen da.pad?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  implement Gradient 356698348
396828029 https://github.com/pydata/xarray/issues/1770#issuecomment-396828029 https://api.github.com/repos/pydata/xarray/issues/1770 MDEyOklzc3VlQ29tbWVudDM5NjgyODAyOQ== jakirkham 3019665 2018-06-13T06:27:36Z 2018-06-13T06:27:36Z NONE

Is this still an issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  slow performance when storing datasets in gcsfs-backed zarr stores 280626621
383723159 https://github.com/pydata/xarray/issues/2074#issuecomment-383723159 https://api.github.com/repos/pydata/xarray/issues/2074 MDEyOklzc3VlQ29tbWVudDM4MzcyMzE1OQ== jakirkham 3019665 2018-04-23T21:06:42Z 2018-04-23T21:06:42Z NONE

from what I understand da.dot implements... a limited special case of da.einsum?

Basically dot is an inner product. Certainly inner products can be formulated using Einstein notation (i.e. calling with einsum).

The question is whether the performance keeps up with that formulation. Currently it sounds like chunking causes some problems right now IIUC. However things like dot and tensordot dispatch through optimized BLAS routines. In theory einsum should do the same ( https://github.com/numpy/numpy/pull/9425 ), but the experimental data still shows a few warts. For example, matmul is implemented with einsum, but is slower than dot. ( https://github.com/numpy/numpy/issues/7569 ) ( https://github.com/numpy/numpy/issues/8957 ) Pure einsum implementations seem to perform similarly.

I ran a few more benchmarks...

What are the arrays used as input for this case?

...apparently xarray.dot on a dask backend is situationally faster than all other implementations when you are not reducing on any dimensions...

Having a little trouble following this. dot reduces one dimension from each input. Excepting if one of the inputs is 0-D (i.e. a scalar), then it is just multiplying a single scalar through an array. Is that what you are referring?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.dot() dask problems 316618290
383637379 https://github.com/pydata/xarray/issues/2074#issuecomment-383637379 https://api.github.com/repos/pydata/xarray/issues/2074 MDEyOklzc3VlQ29tbWVudDM4MzYzNzM3OQ== jakirkham 3019665 2018-04-23T16:26:51Z 2018-04-23T16:26:51Z NONE

Might be worth revisiting how da.dot is implemented as well. That would be the least amount of rewriting for you and would generally be nice for Dask users. If you have not already, @crusaderky, it would be nice to raise an issue over at Dask with a straight Dask benchmark comparing Dask Array's dot and einsum.

cc @mrocklin

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.dot() dask problems 316618290
367164232 https://github.com/pydata/xarray/issues/1784#issuecomment-367164232 https://api.github.com/repos/pydata/xarray/issues/1784 MDEyOklzc3VlQ29tbWVudDM2NzE2NDIzMg== jakirkham 3019665 2018-02-20T23:58:47Z 2018-02-20T23:58:47Z NONE

What is store in this case? Sorry not very familiar with how xarray does things.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add compute=False keywords to `to_foo` functions 282178751
364812486 https://github.com/pydata/xarray/pull/1528#issuecomment-364812486 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM2NDgxMjQ4Ng== jakirkham 3019665 2018-02-12T01:51:40Z 2018-02-12T01:51:40Z NONE

So Zarr supports storing structured arrays. Maybe that’s what you are looking for, @martindurant? Would suggest using the latest 2.2.0 RC though as it fixed a few issues in this regard (particularly with NumPy 1.14).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
360590825 https://github.com/pydata/xarray/pull/1793#issuecomment-360590825 https://api.github.com/repos/pydata/xarray/issues/1793 MDEyOklzc3VlQ29tbWVudDM2MDU5MDgyNQ== jakirkham 3019665 2018-01-25T20:29:58Z 2018-01-25T20:29:58Z NONE

Yep, using dask.array.store regularly with the distributed scheduler both on our cluster and in a local Docker image for testing. Am using Zarr Arrays as the targets for store to write to. Basically rechunk the data to match the chunking selected for the Zarr Array and then write out in parallel lock-free.

Our cluster uses NFS for things like one's home directory. So these are accessible across nodes. Also there are other types of storage available that are a bit faster and still remain accessible across nodes. So these work pretty well.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix distributed writes 283388962
352036122 https://github.com/pydata/xarray/issues/1784#issuecomment-352036122 https://api.github.com/repos/pydata/xarray/issues/1784 MDEyOklzc3VlQ29tbWVudDM1MjAzNjEyMg== jakirkham 3019665 2017-12-15T15:38:14Z 2017-12-15T15:38:14Z NONE

In case anyone is curious, PR ( https://github.com/dask/dask/pull/2980 ) contains this work. Feedback welcome.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add compute=False keywords to `to_foo` functions 282178751
351837521 https://github.com/pydata/xarray/issues/1784#issuecomment-351837521 https://api.github.com/repos/pydata/xarray/issues/1784 MDEyOklzc3VlQ29tbWVudDM1MTgzNzUyMQ== jakirkham 3019665 2017-12-14T21:13:30Z 2017-12-14T21:13:30Z NONE

Just to give a brief synopsis of what we are working in Dask in case it is valuable for this or other contexts, have given an overview of the relevant work below.

With Matthew's help am trying to add a keep argument to da.store. By default keep=False, which is the current behavior of da.store. If keep=True however, it returns Dask Arrays that can lazily load data written by da.store. Thus allowing the stored result to be linked to later computations before it is fully written. The compute argument of da.store affects whether to submit the storage tasks immediately (adding Futures into the resultant Dask Array) or whether to hold off until a later computation step triggers it.

This sort of functionality could be useful for a variety of situations including the one Matthew has described above. Also this could be useful for viewing partially computed results before they are totally done. Another use case could be more rapid batching of computations with many intermediate values. There is also an opportunity to re-explore caching in this context; thus, revisiting an area that many people have previously shown interest in.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add compute=False keywords to `to_foo` functions 282178751
350504017 https://github.com/pydata/xarray/pull/1528#issuecomment-350504017 https://api.github.com/repos/pydata/xarray/issues/1528 MDEyOklzc3VlQ29tbWVudDM1MDUwNDAxNw== jakirkham 3019665 2017-12-09T20:38:58Z 2017-12-09T20:38:58Z NONE

Just to confirm, if writes are aligned with chunk boundaries in the destination array then no locking is required.

As a minor point to complement what Matthew and Alistair have already said, one can pretty easily rechunk beforehand so that the chunks will have a nice 1-to-1 non-overlapping mapping on disk. Not sure whether this strategy is good enough to make default. However have had no issues doing this myself. Also would expect it is better than holding one lock over the whole Zarr Array. Though there may be some strange edge cases that I have not encountered.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Zarr backend 253136694
349772394 https://github.com/pydata/xarray/issues/1759#issuecomment-349772394 https://api.github.com/repos/pydata/xarray/issues/1759 MDEyOklzc3VlQ29tbWVudDM0OTc3MjM5NA== jakirkham 3019665 2017-12-06T20:57:51Z 2017-12-06T20:57:51Z NONE

Given the recent turn in discussion here, might be worthwhile to share some thoughts on issue ( https://github.com/dask/dask/issues/2694 ).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask compute on reduction failes with ValueError 279161550
221893997 https://github.com/pydata/xarray/pull/860#issuecomment-221893997 https://api.github.com/repos/pydata/xarray/issues/860 MDEyOklzc3VlQ29tbWVudDIyMTg5Mzk5Nw== jakirkham 3019665 2016-05-26T14:51:14Z 2016-05-26T14:51:14Z NONE

Also, FYI we have python-coveralls currently. Though we don't have coveralls yet.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Switch py2.7 CI build to use conda-forge 156793282
221758918 https://github.com/pydata/xarray/pull/860#issuecomment-221758918 https://api.github.com/repos/pydata/xarray/issues/860 MDEyOklzc3VlQ29tbWVudDIyMTc1ODkxOA== jakirkham 3019665 2016-05-26T02:04:59Z 2016-05-26T02:04:59Z NONE

Please let me know if there is something actionable for me here. Looks like that is not the case. If that changes, please let me know.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Switch py2.7 CI build to use conda-forge 156793282

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.903ms · About: xarray-datasette