home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

52 rows where author_association = "MEMBER" and user = 1312546 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 29

  • open_mfdataset usage and limitations. 6
  • Feature/benchmark 3
  • Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data 3
  • Test failures with pandas master 3
  • dask.optimize on xarray objects 3
  • Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 3
  • Fix optimize for chunked DataArray 3
  • da.plot.pcolormesh fails when there is a datetime coordinate 2
  • Implementing map_blocks and map_overlap 2
  • Make dask names change when chunking Variables by different amounts. 2
  • upstream-dev failure when installing pandas 2
  • Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 2
  • Slow performance of `DataArray.unstack()` from checking `variable.data` 2
  • Supporting out-of-core computation/indexing for very large indexes 1
  • Data variables empty with to_zarr / from_zarr on s3 if 's3://' in root s3fs string 1
  • Fix map_blocks HLG layering 1
  • Add entrypoint for plotting backends 1
  • more upstream-dev cftime failures 1
  • Add template xarray object kwarg to map_blocks 1
  • Unexpected chunking behavior when using `xr.align` with `join='outer'` 1
  • fix the RTD timeouts 1
  • fix matplotlib errors for single level discrete colormaps 1
  • Fix map_blocks examples 1
  • Threading Lock issue with to_netcdf and Dask arrays 1
  • Comprehensive benchmarking suite 1
  • ⚠️ Nightly upstream-dev CI failed ⚠️ 1
  • ENH: Compute hash of xarray objects 1
  • Implement __sizeof__ on objects? 1
  • Avoid accessing slow .data in unstack 1

user 1

  • TomAugspurger · 52 ✖

author_association 1

  • MEMBER · 52 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
964099251 https://github.com/pydata/xarray/issues/4648#issuecomment-964099251 https://api.github.com/repos/pydata/xarray/issues/4648 IC_kwDOAMm_X845dvyz TomAugspurger 1312546 2021-11-09T12:17:32Z 2021-11-09T12:17:32Z MEMBER

"In charge of" is overstating it a bit. It's been segfaulting when building pandas and I haven't had a chance to debug it.

If / when I get around to fixing it I'll try adding xarray, but it might be a bit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Comprehensive benchmarking suite 756425955
953858365 https://github.com/pydata/xarray/pull/5906#issuecomment-953858365 https://api.github.com/repos/pydata/xarray/issues/5906 IC_kwDOAMm_X8442rk9 TomAugspurger 1312546 2021-10-28T13:43:04Z 2021-10-28T13:43:04Z MEMBER

There are two changes here

  1. Only check the .data of non-index variables, done at https://github.com/pydata/xarray/pull/5906/files#diff-763e3002fd954d544b05858d8d138b828b66b6a2a0ae3cd58d2040a652f14638R4161-R4163
  2. The check for whether or not a full index was needed is done in a for dim in dims loop, but the condition doesn't actually depend on dim. So I lifted that check out of the for loop (doesn't matter much, since stuff is cached).

cc @dcherian

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoid accessing slow .data in unstack 1038531231
953379569 https://github.com/pydata/xarray/issues/5902#issuecomment-953379569 https://api.github.com/repos/pydata/xarray/issues/5902 IC_kwDOAMm_X84402rx TomAugspurger 1312546 2021-10-27T23:19:49Z 2021-10-27T23:19:49Z MEMBER

Thanks @dcherian, that seems to fix this performance problem. I'll see if the tests pass and will submit a PR.

I came across #5582 while searching, thanks :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157
953344052 https://github.com/pydata/xarray/issues/5902#issuecomment-953344052 https://api.github.com/repos/pydata/xarray/issues/5902 IC_kwDOAMm_X8440uA0 TomAugspurger 1312546 2021-10-27T22:02:58Z 2021-10-27T22:03:35Z MEMBER

Oh, hmm... I'm noticing now that IndexVariable (currently) eagerly loads data into memory, so that check will always be false for the problematic IndexVariable variable.

So perhaps a slight adjustment to is_duck_dask_array to handle xarray.Variable ?

```diff diff --git a/xarray/core/dataset.py b/xarray/core/dataset.py index 550c3587..16637574 100644 --- a/xarray/core/dataset.py +++ b/xarray/core/dataset.py @@ -4159,14 +4159,14 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # Dask arrays don't support assignment by index, which the fast unstack # function requires. # https://github.com/pydata/xarray/pull/4746#issuecomment-753282125 - any(is_duck_dask_array(v.data) for v in self.variables.values()) + any(is_duck_dask_array(v) for v in self.variables.values()) # Sparse doesn't currently support (though we could special-case # it) # https://github.com/pydata/sparse/issues/422 - or any( - isinstance(v.data, sparse_array_type) - for v in self.variables.values() - ) + # or any( + # isinstance(v.data, sparse_array_type) + # for v in self.variables.values() + # ) or sparse # Until https://github.com/pydata/xarray/pull/4751 is resolved, # we check explicitly whether it's a numpy array. Once that is @@ -4177,9 +4177,9 @@ class Dataset(DataWithCoords, DatasetArithmetic, Mapping): # # or any( # # isinstance(v.data, pint_array_type) for v in self.variables.values() # # ) - or any( - not isinstance(v.data, np.ndarray) for v in self.variables.values() - ) + # or any( + # not isinstance(v.data, np.ndarray) for v in self.variables.values() + # ) ): result = result._unstack_full_reindex(dim, fill_value, sparse) else: diff --git a/xarray/core/pycompat.py b/xarray/core/pycompat.py index d1649235..e9669105 100644 --- a/xarray/core/pycompat.py +++ b/xarray/core/pycompat.py @@ -44,6 +44,12 @@ class DuckArrayModule:

def is_duck_dask_array(x): + from xarray.core.variable import IndexVariable, Variable + if isinstance(x, IndexVariable): + return False + elif isinstance(x, Variable): + x = x.data + if DuckArrayModule("dask").available: from dask.base import is_dask_collection ```

That's completely ignoring the accesses to v.data for the sparse and pint checks, which don't look quite as easy to solve.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of `DataArray.unstack()` from checking `variable.data` 1037894157
932811398 https://github.com/pydata/xarray/issues/5764#issuecomment-932811398 https://api.github.com/repos/pydata/xarray/issues/5764 IC_kwDOAMm_X843mZKG TomAugspurger 1312546 2021-10-02T19:48:05Z 2021-10-02T19:48:05Z MEMBER

Mmm for better or worse, Dask relies on sizeof to estimate the memory usage of objects at runtime. We could move that over to some new duck-typed interface like using .nbytes if it's around, but not all objects will want to expose an nbytes attribute in their API.

IMO, I think the best path is for objects to implement __getsizeof__ unless there's some downside I'm missing.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement __sizeof__ on objects? 988158051
852667695 https://github.com/pydata/xarray/issues/5426#issuecomment-852667695 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2NzY5NQ== TomAugspurger 1312546 2021-06-02T02:37:18Z 2021-06-02T02:37:18Z MEMBER

Do you run into poor load balancing as well when using Zarr with Xarray?

The only thing that comes to mind is everything being assigned to one worker when the entire task graph has a single node at the base of the task graph. But then work stealing kicks in and things level out (that was a while ago though).

I haven't noticed any kind of systemic load balancing problem, but I can take a look at that notebook later.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
852666211 https://github.com/pydata/xarray/issues/5426#issuecomment-852666211 https://api.github.com/repos/pydata/xarray/issues/5426 MDEyOklzc3VlQ29tbWVudDg1MjY2NjIxMQ== TomAugspurger 1312546 2021-06-02T02:33:28Z 2021-06-02T02:33:28Z MEMBER

https://github.com/dask/dask/pull/6203 and https://github.com/dask/dask/pull/6773/ are the maybe relevant issues. I actually don't know if that could have an effect here. I don't know (and a brief search couldn't confirm) whether or not xarray uses dask.array.from_zarr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter 908971901
767797103 https://github.com/pydata/xarray/issues/1094#issuecomment-767797103 https://api.github.com/repos/pydata/xarray/issues/1094 MDEyOklzc3VlQ29tbWVudDc2Nzc5NzEwMw== TomAugspurger 1312546 2021-01-26T20:09:11Z 2021-01-26T20:09:11Z MEMBER

Should this and https://github.com/pydata/xarray/issues/1650 be consolidated into a single issue? I think that they're duplicates of eachother.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Supporting out-of-core computation/indexing for very large indexes 187873247
752156934 https://github.com/pydata/xarray/issues/4738#issuecomment-752156934 https://api.github.com/repos/pydata/xarray/issues/4738 MDEyOklzc3VlQ29tbWVudDc1MjE1NjkzNA== TomAugspurger 1312546 2020-12-29T16:53:16Z 2020-12-29T16:53:16Z MEMBER

IIUC, something like https://github.com/dask/dask/blob/4a7a2438219c4ee493434042e50f4cdb67b6ec9f/dask/base.py#L778 is what you're looking for. Further down we register tokenizers for various types like pandas' DataFrames and ndarrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ENH: Compute hash of xarray objects 775502974
749205535 https://github.com/pydata/xarray/issues/4717#issuecomment-749205535 https://api.github.com/repos/pydata/xarray/issues/4717 MDEyOklzc3VlQ29tbWVudDc0OTIwNTUzNQ== TomAugspurger 1312546 2020-12-21T21:29:56Z 2020-12-21T21:29:56Z MEMBER

I'm not sure offhand. Maybe best to post an issue on the pandas tracker.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  ⚠️ Nightly upstream-dev CI failed ⚠️ 771484861
712066302 https://github.com/pydata/xarray/issues/4428#issuecomment-712066302 https://api.github.com/repos/pydata/xarray/issues/4428 MDEyOklzc3VlQ29tbWVudDcxMjA2NjMwMg== TomAugspurger 1312546 2020-10-19T11:08:13Z 2020-10-19T11:43:46Z MEMBER

Sorry, my comment in https://github.com/pydata/xarray/issues/4428#issuecomment-711034128 was incorrect in a couple ways

  1. We still do the splitting, even when slicing with an out-of-order indexer. Checking on if that's appropriate.
  2. I'm checking in on a logic bug when computing the number of chunks. I don't think we properly handle non-uniform chunking on the other axes.
{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 702646191
711034128 https://github.com/pydata/xarray/issues/4428#issuecomment-711034128 https://api.github.com/repos/pydata/xarray/issues/4428 MDEyOklzc3VlQ29tbWVudDcxMTAzNDEyOA== TomAugspurger 1312546 2020-10-17T15:54:48Z 2020-10-17T15:54:48Z MEMBER

I assume that the indices [np.argsort(da.x.data)] are not going to be monotonically increasing. That induces a different slicing pattern. The docs in https://docs.dask.org/en/latest/array-slicing.html#efficiency describe the case where the indices are sorted, but doesn't discuss the non-sorted case (yet).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 702646191
709539887 https://github.com/pydata/xarray/issues/4428#issuecomment-709539887 https://api.github.com/repos/pydata/xarray/issues/4428 MDEyOklzc3VlQ29tbWVudDcwOTUzOTg4Nw== TomAugspurger 1312546 2020-10-15T19:20:53Z 2020-10-15T19:20:53Z MEMBER

Closing the loop here, with https://github.com/dask/dask/pull/6665 the behavior of Dask=2.25.0 should be restored (possibly with a warning about creating large chunks).

So this can probably be closed, though there may be parts of xarray that should be updated to avoid creating large chunks, or we could rely on the user to do that through the dask config system.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Behaviour change in xarray.Dataset.sortby/sel between dask==2.25.0 and dask==2.26.0 702646191
694817581 https://github.com/pydata/xarray/pull/4432#issuecomment-694817581 https://api.github.com/repos/pydata/xarray/issues/4432 MDEyOklzc3VlQ29tbWVudDY5NDgxNzU4MQ== TomAugspurger 1312546 2020-09-18T11:36:49Z 2020-09-18T11:36:49Z MEMBER

I'm not sure, but I don't think so. It's strange that it didn't fail on the pull request.

On Thu, Sep 17, 2020 at 8:51 PM Maximilian Roos notifications@github.com wrote:

Might be best to proceed with #4434 https://github.com/pydata/xarray/pull/4434 for now. I'll need to give this a bit of thought.

OK, as you wish, I'll merge if that passes.

But your change did pass before the merge. Could it be a conflict (in functionality, not git) with recent changes on master?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/4432#issuecomment-694601049, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOISNY5A5N2A44YR2ZMLSGK4JTANCNFSM4RQ6OP2Q .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix optimize for chunked DataArray 703881154
694594817 https://github.com/pydata/xarray/pull/4432#issuecomment-694594817 https://api.github.com/repos/pydata/xarray/issues/4432 MDEyOklzc3VlQ29tbWVudDY5NDU5NDgxNw== TomAugspurger 1312546 2020-09-18T01:27:30Z 2020-09-18T01:27:30Z MEMBER

Might be best to proceed with https://github.com/pydata/xarray/pull/4434 for now. I'll need to give this a bit of thought.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix optimize for chunked DataArray 703881154
694593225 https://github.com/pydata/xarray/pull/4432#issuecomment-694593225 https://api.github.com/repos/pydata/xarray/issues/4432 MDEyOklzc3VlQ29tbWVudDY5NDU5MzIyNQ== TomAugspurger 1312546 2020-09-18T01:22:43Z 2020-09-18T01:22:43Z MEMBER

Huh, I'm able to reproduce locally. Looking into it now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix optimize for chunked DataArray 703881154
691083939 https://github.com/pydata/xarray/issues/4406#issuecomment-691083939 https://api.github.com/repos/pydata/xarray/issues/4406 MDEyOklzc3VlQ29tbWVudDY5MTA4MzkzOQ== TomAugspurger 1312546 2020-09-11T13:07:00Z 2020-09-11T13:07:00Z MEMBER

@TomAugspurger do you know off-hand if there have been any recent changes in Dask's scheduler that could have caused this?

This is just using Dask's threaded scheduler, right? I don't recall any changes there recently.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Threading Lock issue with to_netcdf and Dask arrays 694112301
690378323 https://github.com/pydata/xarray/issues/3698#issuecomment-690378323 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDY5MDM3ODMyMw== TomAugspurger 1312546 2020-09-10T15:42:54Z 2020-09-10T15:42:54Z MEMBER

Thanks for confirming. I'll take another look at this today then.

On Thu, Sep 10, 2020 at 10:30 AM Deepak Cherian notifications@github.com wrote:

Reopened #3698 https://github.com/pydata/xarray/issues/3698.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3698#event-3751728444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIT6LDBKVUQ5KR7VFB3SFDWI3ANCNFSM4KHH63GQ .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524
689808725 https://github.com/pydata/xarray/issues/3698#issuecomment-689808725 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDY4OTgwODcyNQ== TomAugspurger 1312546 2020-09-09T20:38:39Z 2020-09-09T20:38:39Z MEMBER

FYI, @dcherian your recent PR to dask fixed this example. Playing around with chunk sizes, it seems to have fixed it even when the chunk size exceeds dask.config['array']['chunk-size'].

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524
668256401 https://github.com/pydata/xarray/issues/3147#issuecomment-668256401 https://api.github.com/repos/pydata/xarray/issues/3147 MDEyOklzc3VlQ29tbWVudDY2ODI1NjQwMQ== TomAugspurger 1312546 2020-08-03T21:42:42Z 2020-08-03T21:42:42Z MEMBER

Thanks for that link. I hope that map_overlap could use pad internally for the external boundaries.

On Mon, Aug 3, 2020 at 3:22 PM Deepak Cherian notifications@github.com wrote:

This issue about coordinate labels for boundaries exists with pad too:

3868 https://github.com/pydata/xarray/issues/3868

Can map_overlap just use DataArray.pad and we can fix things there?

Or perhaps we can expect users to add a call to pad before map_overlap?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3147#issuecomment-668223125, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIWLGJZYO63S7IXTEH3R64MAZANCNFSM4IFAIWOA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing map_blocks and map_overlap 470024896
668242904 https://github.com/pydata/xarray/pull/4305#issuecomment-668242904 https://api.github.com/repos/pydata/xarray/issues/4305 MDEyOklzc3VlQ29tbWVudDY2ODI0MjkwNA== TomAugspurger 1312546 2020-08-03T21:08:38Z 2020-08-03T21:08:38Z MEMBER

The doc failure looks unrelated:

```


Exception in /home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/doc/plotting.rst at block ending on line None Specify :okexcept: as an option in the ipython:: block to suppress this message


KeyError Traceback (most recent call last) <ipython-input-75-c7d6afd7f8c5> in <module> ----> 1 g_simple = t.plot(x="lon", y="lat", col="time", col_wrap=3)

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/plot/plot.py in call(self, kwargs) 444 445 def call(self, kwargs): --> 446 return plot(self._da, **kwargs) 447 448 # we can't use functools.wraps here since that also modifies the name / qualname

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/plot/plot.py in plot(darray, row, col, col_wrap, ax, hue, rtol, subplot_kws, kwargs) 198 kwargs["ax"] = ax 199 --> 200 return plotfunc(darray, kwargs) 201 202

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/plot/plot.py in newplotfunc(darray, x, y, figsize, size, aspect, ax, row, col, col_wrap, xincrease, yincrease, add_colorbar, add_labels, vmin, vmax, cmap, center, robust, extend, levels, infer_intervals, colors, subplot_kws, cbar_ax, cbar_kwargs, xscale, yscale, xticks, yticks, xlim, ylim, norm, kwargs) 636 # Need the decorated plotting function 637 allargs["plotfunc"] = globals()[plotfunc.name] --> 638 return _easy_facetgrid(darray, kind="dataarray", allargs) 639 640 plt = import_matplotlib_pyplot()

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/plot/facetgrid.py in _easy_facetgrid(data, plotfunc, kind, x, y, row, col, col_wrap, sharex, sharey, aspect, size, subplot_kws, ax, figsize, kwargs) 642 643 if kind == "dataarray": --> 644 return g.map_dataarray(plotfunc, x, y, kwargs) 645 646 if kind == "dataset":

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/plot/facetgrid.py in map_dataarray(self, func, x, y, **kwargs) 263 # Get x, y labels for the first subplot 264 x, y = _infer_xy_labels( --> 265 darray=self.data.loc[self.name_dicts.flat[0]], 266 x=x, 267 y=y,

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/core/dataarray.py in getitem(self, key) 196 labels = indexing.expanded_indexer(key, self.data_array.ndim) 197 key = dict(zip(self.data_array.dims, labels)) --> 198 return self.data_array.sel(**key) 199 200 def setitem(self, key, value) -> None:

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/core/dataarray.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 1147 1148 """ -> 1149 ds = self._to_temp_dataset().sel( 1150 indexers=indexers, 1151 drop=drop,

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 2099 """ 2100 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel") -> 2101 pos_indexers, new_indexes = remap_label_indexers( 2102 self, indexers=indexers, method=method, tolerance=tolerance 2103 )

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs) 394 } 395 --> 396 pos_indexers, new_indexes = indexing.remap_label_indexers( 397 obj, v_indexers, method=method, tolerance=tolerance 398 )

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance) 268 coords_dtype = data_obj.coords[dim].dtype 269 label = maybe_cast_to_coords_dtype(label, coords_dtype) --> 270 idxr, new_idx = convert_label_indexer(index, label, dim, method, tolerance) 271 pos_indexers[dim] = idxr 272 if new_idx is not None:

~/checkouts/readthedocs.org/user_builds/xray/checkouts/4305/xarray/core/indexing.py in convert_label_indexer(index, label, index_name, method, tolerance) 187 indexer = index.get_loc(label.item()) 188 else: --> 189 indexer = index.get_loc( 190 label.item(), method=method, tolerance=tolerance 191 )

~/checkouts/readthedocs.org/user_builds/xray/conda/4305/lib/python3.8/site-packages/pandas/core/indexes/datetimes.py in get_loc(self, key, method, tolerance) 620 else: 621 # unrecognized type --> 622 raise KeyError(key) 623 624 try:

KeyError: 1356998400000000000 <<<------------------------------------------------------------------------- ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix map_blocks examples 672281867
668209121 https://github.com/pydata/xarray/issues/3147#issuecomment-668209121 https://api.github.com/repos/pydata/xarray/issues/3147 MDEyOklzc3VlQ29tbWVudDY2ODIwOTEyMQ== TomAugspurger 1312546 2020-08-03T19:47:47Z 2020-08-03T19:47:57Z MEMBER

I'm thinking through a map_overlap API right now. In dask, map_overlap requires a few extra arguments

depth: int, tuple, dict or list The number of elements that each block should share with its neighbors If a tuple or dict then this can be different per axis. If a list then each element of that list must be an int, tuple or dict defining depth for the corresponding array in `args`. Asymmetric depths may be specified using a dict value of (-/+) tuples. Note that asymmetric depths are currently only supported when ``boundary`` is 'none'. The default value is 0. boundary: str, tuple, dict or list How to handle the boundaries. Values include 'reflect', 'periodic', 'nearest', 'none', or any constant value like 0 or np.nan. If a list then each element must be a str, tuple or dict defining the boundary for the corresponding array in `args`. The default value is 'reflect'.

In dask.array those must be dicts whose keys are the axis number. For xarray we would want to allow the dimension names there.

I'm not sure how to handle the DataArray labels for the boundary chunks (dask docs at https://docs.dask.org/en/latest/array-overlap.html#boundaries). For reflect / periodic I think things are OK, we perhaps just use the label associated with that value. I'm not sure what to do for constants.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implementing map_blocks and map_overlap 470024896
663584770 https://github.com/pydata/xarray/pull/4256#issuecomment-663584770 https://api.github.com/repos/pydata/xarray/issues/4256 MDEyOklzc3VlQ29tbWVudDY2MzU4NDc3MA== TomAugspurger 1312546 2020-07-24T15:06:03Z 2020-07-24T15:06:03Z MEMBER

Yep. I believe that @ogrisel can add you to the organization on anaconda.org so that you can create a key to upload to packages.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix matplotlib errors for single level discrete colormaps 664363493
663082208 https://github.com/pydata/xarray/pull/4254#issuecomment-663082208 https://api.github.com/repos/pydata/xarray/issues/4254 MDEyOklzc3VlQ29tbWVudDY2MzA4MjIwOA== TomAugspurger 1312546 2020-07-23T15:45:57Z 2020-07-23T15:45:57Z MEMBER

FYI https://github.com/pandas-dev/pandas/pull/35393 is the PR to follow. It'll be included in pandas 1.1.0, which should be out in a week or so.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  fix the RTD timeouts 663977922
641332231 https://github.com/pydata/xarray/issues/4133#issuecomment-641332231 https://api.github.com/repos/pydata/xarray/issues/4133 MDEyOklzc3VlQ29tbWVudDY0MTMzMjIzMQ== TomAugspurger 1312546 2020-06-09T14:24:59Z 2020-06-09T14:31:26Z MEMBER

Ah, the (numpy) build failure is because pandas doesn't have a py38 entry in our pyproject.toml. Fixing that now.

edit: https://github.com/pandas-dev/pandas/pull/34667. But you'll still want to update your CI at https://github.com/pydata/xarray/blob/2a288f6ed4286910fcf3ab9895e1e9cbd44d30b4/ci/azure/install.yml#L16 and https://github.com/pydata/xarray/blob/2a288f6ed4286910fcf3ab9895e1e9cbd44d30b4/ci/azure/install.yml#L23 to pull from the new locations.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  upstream-dev failure when installing pandas 634979933
641330288 https://github.com/pydata/xarray/issues/4133#issuecomment-641330288 https://api.github.com/repos/pydata/xarray/issues/4133 MDEyOklzc3VlQ29tbWVudDY0MTMzMDI4OA== TomAugspurger 1312546 2020-06-09T14:22:02Z 2020-06-09T14:22:02Z MEMBER

@keewis not sure about the build issue, but we (along with many other projects) recently moved our wheels to upload to https://anaconda.org/scipy-wheels-nightly/. https://anaconda.org/scipy-wheels-nightly/pandas/ does have py38 wheels.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  upstream-dev failure when installing pandas 634979933
636808986 https://github.com/pydata/xarray/issues/4112#issuecomment-636808986 https://api.github.com/repos/pydata/xarray/issues/4112 MDEyOklzc3VlQ29tbWVudDYzNjgwODk4Ng== TomAugspurger 1312546 2020-06-01T11:44:23Z 2020-06-01T11:44:23Z MEMBER

Rechunking the indexer array is how I would be explicit about the desired chunk size. Opened https://github.com/dask/dask/issues/6270 to discuss this on the dask side.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Unexpected chunking behavior when using `xr.align` with `join='outer'` 627600168
622128514 https://github.com/pydata/xarray/pull/3816#issuecomment-622128514 https://api.github.com/repos/pydata/xarray/issues/3816 MDEyOklzc3VlQ29tbWVudDYyMjEyODUxNA== TomAugspurger 1312546 2020-04-30T21:38:21Z 2020-04-30T21:38:21Z MEMBER

Makes sense. template seems fine.

On Thu, Apr 30, 2020 at 3:35 PM Deepak Cherian notifications@github.com wrote:

Thanks for the review @TomAugspurger https://github.com/TomAugspurger

Question on the name template. I think in dask.dataframe and dask.array we might call this meta. Is that keyword already used elsewhere in xarray? template is also a fine name though.

I added the meta kwarg to apply_ufunc so that users could pass that down to dask i.e. that meta = dask's meta = np.ndarray or something like that. So I'd like to avoid reusing meta here where it would exclusively be an xarray object ≠ dask's meta

BUT it seems to me like there's a better name than template. Any ideas?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/3816#issuecomment-622095710, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIQWE7DGYAOSJWLG5F3RPHOI7ANCNFSM4K7ODDRA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add template xarray object kwarg to map_blocks 573768194
592101136 https://github.com/pydata/xarray/issues/3698#issuecomment-592101136 https://api.github.com/repos/pydata/xarray/issues/3698 MDEyOklzc3VlQ29tbWVudDU5MjEwMTEzNg== TomAugspurger 1312546 2020-02-27T18:13:28Z 2020-02-27T18:13:28Z MEMBER

It looks like xarray is getting a bad task graph after the optimize.

```python In [1]: import xarray as xr import dask In [2]: import dask

In [3]: a = dask.array.ones((10,5), chunks=(1,3)) ...: a = dask.optimize(a)[0]

In [4]: da = xr.DataArray(a.compute()).chunk({"dim_0": 5}) ...: da = dask.optimize(da)[0]

In [5]: dict(da.dask_graph()) Out[5]: {('xarray-<this-array>-e2865aa10d476e027154771611541f99', 1, 0): (<function _operator.getitem(a, b, /)>, 'xarray-<this-array>-e2865aa10d476e027154771611541f99', (slice(5, 10, None), slice(0, 5, None))), ('xarray-<this-array>-e2865aa10d476e027154771611541f99', 0, 0): (<function _operator.getitem(a, b, /)>, 'xarray-<this-array>-e2865aa10d476e027154771611541f99', (slice(0, 5, None), slice(0, 5, None)))} ```

Notice that are references to xarray-<this-array>-e2865aa10d476e027154771611541f99 (just the string, not a tuple representing a chunk) but that key isn't in the graph.

If we manually insert that, you'll see things work

```python In [9]: dsk['xarray-<this-array>-e2865aa10d476e027154771611541f99'] = da._to_temp_dataset()[xr.core.dataarray._THIS_ARRAY]

In [11]: dask.get(dsk, keys=[('xarray-<this-array>-e2865aa10d476e027154771611541f99', 1, 0)]) Out[11]: (<xarray.DataArray \<this-array> (dim_0: 5, dim_1: 5)> dask.array<getitem, shape=(5, 5), dtype=float64, chunksize=(5, 5), chunktype=numpy.ndarray> Dimensions without coordinates: dim_0, dim_1,) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dask.optimize on xarray objects 550355524
582972083 https://github.com/pydata/xarray/issues/3751#issuecomment-582972083 https://api.github.com/repos/pydata/xarray/issues/3751 MDEyOklzc3VlQ29tbWVudDU4Mjk3MjA4Mw== TomAugspurger 1312546 2020-02-06T15:55:30Z 2020-02-06T15:55:30Z MEMBER

FWIW, I think @jbrockmendel is still progressing on an "extension index" interface where you could have a custom dtype / Index subclass that would be properly supported. Long-term, that's the best solution.

Short-term, I'm less sure what's best.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  more upstream-dev cftime failures 559873728
580462361 https://github.com/pydata/xarray/pull/3640#issuecomment-580462361 https://api.github.com/repos/pydata/xarray/issues/3640 MDEyOklzc3VlQ29tbWVudDU4MDQ2MjM2MQ== TomAugspurger 1312546 2020-01-30T21:13:09Z 2020-01-30T21:13:09Z MEMBER

Is my interpretation correct?

Yep, that's the basic idea. Every call to DataFrame.plot.<kind> begins with a check for the active backend. Based on the configured value, we the correct backend, make the call, and return the result.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add entrypoint for plotting backends 539394615
579517151 https://github.com/pydata/xarray/issues/3673#issuecomment-579517151 https://api.github.com/repos/pydata/xarray/issues/3673 MDEyOklzc3VlQ29tbWVudDU3OTUxNzE1MQ== TomAugspurger 1312546 2020-01-28T23:12:47Z 2020-01-28T23:12:47Z MEMBER

FYI, we had some failures in our nightly wheel builds so they weren't updated in a while. https://github.com/MacPython/pandas-wheels/pull/70 fixed that, so you'll hopefully get a new wheel tonight.

On Tue, Jan 28, 2020 at 5:09 PM Deepak Cherian notifications@github.com wrote:

should be closed by pandas-dev/pandas#31136 https://github.com/pandas-dev/pandas/pull/31136 . I think the tests will turn green once the wheels update

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/3673?email_source=notifications&email_token=AAKAOISQMX62U3JJPLTYVEDRAC3JRA5CNFSM4KEMIFRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKFLHSQ#issuecomment-579515338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOIT7GKDFDJV4LFZA4YDRAC3JRANCNFSM4KEMIFRA .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Test failures with pandas master 547012915
575688251 https://github.com/pydata/xarray/issues/3673#issuecomment-575688251 https://api.github.com/repos/pydata/xarray/issues/3673 MDEyOklzc3VlQ29tbWVudDU3NTY4ODI1MQ== TomAugspurger 1312546 2020-01-17T16:06:23Z 2020-01-17T16:06:23Z MEMBER

Opened https://github.com/pandas-dev/pandas/issues/31109.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Test failures with pandas master 547012915
574256856 https://github.com/pydata/xarray/issues/3673#issuecomment-574256856 https://api.github.com/repos/pydata/xarray/issues/3673 MDEyOklzc3VlQ29tbWVudDU3NDI1Njg1Ng== TomAugspurger 1312546 2020-01-14T16:25:50Z 2020-01-14T16:25:50Z MEMBER

@jbrockmendel likely knows more about the index arithmetic issue.

```python In [22]: import xarray as xr

In [23]: import pandas as pd

In [24]: idx = pd.timedelta_range("1D", periods=5, freq="D")

In [25]: a = xr.cftime_range("2000", periods=5)

In [26]: idx + a /Users/taugspurger/sandbox/pandas/pandas/core/arrays/datetimelike.py:1204: PerformanceWarning: Adding/subtracting array of DateOffsets to TimedeltaArray not vectorized PerformanceWarning, Out[26]: Index([2000-01-02 00:00:00, 2000-01-04 00:00:00, 2000-01-06 00:00:00, 2000-01-08 00:00:00, 2000-01-10 00:00:00], dtype='object')

In [27]: a + idx Out[27]: CFTimeIndex([2000-01-02 00:00:00, 2000-01-04 00:00:00, 2000-01-06 00:00:00, 2000-01-08 00:00:00, 2000-01-10 00:00:00], dtype='object') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Test failures with pandas master 547012915
569820784 https://github.com/pydata/xarray/issues/2666#issuecomment-569820784 https://api.github.com/repos/pydata/xarray/issues/2666 MDEyOklzc3VlQ29tbWVudDU2OTgyMDc4NA== TomAugspurger 1312546 2019-12-30T22:58:23Z 2019-12-30T22:58:23Z MEMBER

I think this is basically the same change.

Ah, I was mistaken. I was thinking we needed to plump a dtype argument all the way through there, but I don't think that's necessary. I may be able to submit a PR with a dtypes argument for from_dataframe tomorrow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data 398107776
569810375 https://github.com/pydata/xarray/issues/2666#issuecomment-569810375 https://api.github.com/repos/pydata/xarray/issues/2666 MDEyOklzc3VlQ29tbWVudDU2OTgxMDM3NQ== TomAugspurger 1312546 2019-12-30T22:07:30Z 2019-12-30T22:07:30Z MEMBER

And there are a couple places that need updating, even with a dtypes argument to let the user specify things. We also hit this via Dataset.__setitem__

```pytb ~/sandbox/xarray/xarray/core/dataset.py in setitem(self, key, value) 1268 ) 1269 -> 1270 self.update({key: value}) 1271 1272 def delitem(self, key: Hashable) -> None:

~/sandbox/xarray/xarray/core/dataset.py in update(self, other, inplace) 3521 """ 3522 _check_inplace(inplace) -> 3523 merge_result = dataset_update_method(self, other) 3524 return self._replace(inplace=True, **merge_result._asdict()) 3525

~/sandbox/xarray/xarray/core/merge.py in dataset_update_method(dataset, other) 862 other[key] = value.drop_vars(coord_names) 863 --> 864 return merge_core([dataset, other], priority_arg=1, indexes=dataset.indexes)

~/sandbox/xarray/xarray/core/merge.py in merge_core(objects, compat, join, priority_arg, explicit_coords, indexes, fill_value) 550 coerced, join=join, copy=False, indexes=indexes, fill_value=fill_value 551 ) --> 552 collected = collect_variables_and_indexes(aligned) 553 554 prioritized = _get_priority_vars_and_indexes(aligned, priority_arg, compat=compat)

~/sandbox/xarray/xarray/core/merge.py in collect_variables_and_indexes(list_of_mappings) 275 append_all(coords, indexes) 276 --> 277 variable = as_variable(variable, name=name) 278 if variable.dims == (name,): 279 variable = variable.to_index_variable()

~/sandbox/xarray/xarray/core/variable.py in as_variable(obj, name) 105 elif isinstance(obj, tuple): 106 try: --> 107 obj = Variable(*obj) 108 except (TypeError, ValueError) as error: 109 # use .format() instead of % because it handles tuples consistently

~/sandbox/xarray/xarray/core/variable.py in init(self, dims, data, attrs, encoding, fastpath) 306 unrecognized encoding items. 307 """ --> 308 self._data = as_compatible_data(data, fastpath=fastpath) 309 self._dims = self._parse_dimensions(dims) 310 self._attrs = None

~/sandbox/xarray/xarray/core/variable.py in as_compatible_data(data, fastpath) 229 if isinstance(data, np.ndarray): 230 if data.dtype.kind == "O": --> 231 data = _possibly_convert_objects(data) 232 elif data.dtype.kind == "M": 233 data = np.asarray(data, "datetime64[ns]")

~/sandbox/xarray/xarray/core/variable.py in _possibly_convert_objects(values) 165 datetime64 and timedelta64, according to the pandas convention. 166 """ --> 167 return np.asarray(pd.Series(values.ravel())).reshape(values.shape) 168 169

~/sandbox/numpy/numpy/core/_asarray.py in asarray(a, dtype, order) 83 84 """ ---> 85 return array(a, dtype, copy=False, order=order) 86 87

~/sandbox/pandas/pandas/core/series.py in array(self, dtype) 730 "To keep the old behavior, pass 'dtype=\"datetime64[ns]\"'." 731 ) --> 732 warnings.warn(msg, FutureWarning, stacklevel=3) 733 dtype = "M8[ns]" 734 return np.asarray(self.array, dtype) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data 398107776
569805431 https://github.com/pydata/xarray/issues/2666#issuecomment-569805431 https://api.github.com/repos/pydata/xarray/issues/2666 MDEyOklzc3VlQ29tbWVudDU2OTgwNTQzMQ== TomAugspurger 1312546 2019-12-30T21:45:41Z 2019-12-30T21:48:39Z MEMBER

Just FYI, we're potentially enforcing this deprecation in https://github.com/pandas-dev/pandas/pull/30563 (which would be included in a pandas release in a week or two). Is that likely to cause problems for xarray users?

It's not clear to me what the desired behavior is (https://github.com/pydata/xarray/issues/3291 seems to want to preserve the tz, though it isn't clear they are willing to be forced into an object dtype array for it).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset.from_dataframe will produce a FutureWarning for DatetimeTZ data 398107776
562310739 https://github.com/pydata/xarray/pull/3598#issuecomment-562310739 https://api.github.com/repos/pydata/xarray/issues/3598 MDEyOklzc3VlQ29tbWVudDU2MjMxMDczOQ== TomAugspurger 1312546 2019-12-05T20:47:02Z 2019-12-05T20:47:02Z MEMBER

Hopefully the new comments make sense. I'm struggling a bit to explain things since I don't fully understand them myself :)

So it was a graph construction issue.

I think so. Dask doesn't actually validate arguments passed to HighLevelGraph. But I believe we assume that when all the values in dependencies are themselves keys of layers. We didn't have that before with things like

(Pdb) pp collections[0].dask.dependencies {'all-84bc51ac43a9275b3662b0089710eab9': {'or_-64f95b81b2f8001b4c61f2023ac4c223'}, ... 'eq-abac622d95ce5055d3e7b7dea944ec37': {'lambda-e79de3edfa267f41111057d26471bce3-x', 'ones-c4a83f4b990021618d55e0fa61a351d6'}, ... }

The 'lambda-e79de3edfa267f41111057d26471bce3-x' wasn't a layer of the graph. It was previously nested under the single new layer we were creating gname or lambda-e79de3edfa267f41111057d26471bce3 in this case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix map_blocks HLG layering 533555794
561794415 https://github.com/pydata/xarray/pull/3584#issuecomment-561794415 https://api.github.com/repos/pydata/xarray/issues/3584 MDEyOklzc3VlQ29tbWVudDU2MTc5NDQxNQ== TomAugspurger 1312546 2019-12-04T19:09:34Z 2019-12-04T19:09:34Z MEMBER

@mrocklin if you get a chance, can you confirm that the values in HighLevelGraph.depedencies should be a subset of the keys of layers?

So in the following, the lambda-<...>-x is problematic, because it's not a key in layers?

python (Pdb) pp list(self.layers) ['eq-e98e52fb2b8e27b4b5158d399330c72d', 'lambda-0f1d0bc5e7df462d7125839aed006e04', 'ones-c4a83f4b990021618d55e0fa61a351d6'] (Pdb) pp self.dependencies {'eq-e98e52fb2b8e27b4b5158d399330c72d': {'lambda-0f1d0bc5e7df462d7125839aed006e04-x', 'ones-c4a83f4b990021618d55e0fa61a351d6'}, 'lambda-0f1d0bc5e7df462d7125839aed006e04': {'ones-c4a83f4b990021618d55e0fa61a351d6'}, 'ones-c4a83f4b990021618d55e0fa61a351d6': set()}

That's coming from the name of the DataArray / the dask arary in DataArray.data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Make dask names change when chunking Variables by different amounts. 530657789
561773837 https://github.com/pydata/xarray/pull/3584#issuecomment-561773837 https://api.github.com/repos/pydata/xarray/issues/3584 MDEyOklzc3VlQ29tbWVudDU2MTc3MzgzNw== TomAugspurger 1312546 2019-12-04T18:17:56Z 2019-12-04T18:17:56Z MEMBER

So this is enough to fix this in Dask

diff diff --git a/dask/blockwise.py b/dask/blockwise.py index 52a36c246..84e0ecc08 100644 --- a/dask/blockwise.py +++ b/dask/blockwise.py @@ -818,7 +818,7 @@ def fuse_roots(graph: HighLevelGraph, keys: list): if ( isinstance(layer, Blockwise) and len(deps) > 1 - and not any(dependencies[dep] for dep in deps) # no need to fuse if 0 or 1 + and not any(dependencies.get(dep, {}) for dep in deps) # no need to fuse if 0 or 1 and all(len(dependents[dep]) == 1 for dep in deps) ): new = toolz.merge(layer, *[layers[dep] for dep in deps])

I'm trying to understand why we're getting this KeyError though. I want to make sure that we have a valid HighLevelGraph before making that change.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Make dask names change when chunking Variables by different amounts. 530657789
510217080 https://github.com/pydata/xarray/issues/2501#issuecomment-510217080 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUxMDIxNzA4MA== TomAugspurger 1312546 2019-07-10T20:30:41Z 2019-07-10T20:30:41Z MEMBER

Yep, that’s my suspicion as well. I’m still plugging away at it. Currently the pausing logic isn’t quite working well.

On Jul 10, 2019, at 12:10, Ryan Abernathey notifications@github.com wrote:

I believe that the memory issue is basically the same as dask/distributed#2602.

The graphs look like: read --> rechunk --> write.

Reading and rechunking increase memory consumption. Writing relieves it. In Rich's case, the workers just load too much data before they write it. Eventually they run out of memory.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
510167911 https://github.com/pydata/xarray/issues/2501#issuecomment-510167911 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUxMDE2NzkxMQ== TomAugspurger 1312546 2019-07-10T18:05:07Z 2019-07-10T18:05:07Z MEMBER

Great, thanks. I’ll look into the memory issue when writing. We may already have an issue for it.

On Jul 10, 2019, at 10:59, Rich Signell notifications@github.com wrote:

@TomAugspurger , I sat down here at Scipy with @rabernat and he instantly realized that we needed to drop the feature_id coordinate to prevent open_mfdataset from trying to harmonize that coordinate from all the chunks.

So if I use this code, the open_mdfdataset command finishes:

def drop_coords(ds): ds = ds.drop(['reference_time','feature_id']) return ds.reset_coords(drop=True) and I can then add back in the dropped coordinate values at the end:

dsets = [xr.open_dataset(f) for f in files[:3]] ds.coords['feature_id'] = dsets[0].coords['feature_id'] I'm now running into memory issues when I write the zarr data -- but I should raise that as a new issue, right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509346055 https://github.com/pydata/xarray/issues/2501#issuecomment-509346055 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTM0NjA1NQ== TomAugspurger 1312546 2019-07-08T18:46:58Z 2019-07-08T18:46:58Z MEMBER

@rsignell-usgs very helpful, thanks. I'd noticed that there was a pause after the open_dataset tasks finish, indicating that either the scheduler or (more likely) the client was doing work rather than the cluster. Most likely @rabernat's guess

In open_mfdataset, all of the dimensions and coordinates of the individual files have to be checked and verified to be compatible. That is often the source of slow performance with open_mfdataset.

is correct. Verifying all that now, and looking into if / how that can be done on the workers.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
509307081 https://github.com/pydata/xarray/issues/2501#issuecomment-509307081 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwOTMwNzA4MQ== TomAugspurger 1312546 2019-07-08T16:57:15Z 2019-07-08T16:57:15Z MEMBER

I'm looking into it today. Can you clarify

The memory use kept growing until the process died.

by "process" do you mean a dask worker process, or just the main python process executing the ds = xr.open_mfdataset(...) code?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
506497180 https://github.com/pydata/xarray/issues/2501#issuecomment-506497180 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwNjQ5NzE4MA== TomAugspurger 1312546 2019-06-27T20:24:26Z 2019-06-27T20:24:26Z MEMBER

The datasets in our cloud datastore are designed explicitly to avoid this problem!

Good to know!

FYI, https://github.com/pydata/xarray/issues/2501#issuecomment-506478508 was user error (I can access it, but need to specify the us-east-1 region). Taking a look now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
506486503 https://github.com/pydata/xarray/issues/2927#issuecomment-506486503 https://api.github.com/repos/pydata/xarray/issues/2927 MDEyOklzc3VlQ29tbWVudDUwNjQ4NjUwMw== TomAugspurger 1312546 2019-06-27T19:51:58Z 2019-06-27T19:51:58Z MEMBER

Spoke with @martindurant about this today. The mapping should probably strip the protocol from the root provided by the user. Tracking in https://github.com/intake/filesystem_spec/issues/56 (this issue can probably be closed).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Data variables empty with to_zarr / from_zarr on s3 if 's3://' in root s3fs string 438166604
506478508 https://github.com/pydata/xarray/issues/2501#issuecomment-506478508 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUwNjQ3ODUwOA== TomAugspurger 1312546 2019-06-27T19:25:05Z 2019-06-27T19:25:05Z MEMBER

Thanks, will take a look this afternoon. Are there any datasets on https://pangeo-data.github.io/pangeo-datastore/ that would exhibit this poor behavior? I may not have access to the bucket (or I'm misusing rclone)

2019/06/27 14:23:50 NOTICE: Config file "/Users/taugspurger/.config/rclone/rclone.conf" not found - using defaults 2019/06/27 14:23:50 Failed to create file system for "aws-east:nwm-archive/2009": didn't find section in config file

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
339525582 https://github.com/pydata/xarray/issues/1661#issuecomment-339525582 https://api.github.com/repos/pydata/xarray/issues/1661 MDEyOklzc3VlQ29tbWVudDMzOTUyNTU4Mg== TomAugspurger 1312546 2017-10-26T01:49:12Z 2017-10-26T01:49:12Z MEMBER

Yep, that was the change.

The fix is to explicitly register the converters before plotting:

python from pandas.tseries import converter converter.register()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  da.plot.pcolormesh fails when there is a datetime coordinate 268487752
339510522 https://github.com/pydata/xarray/issues/1661#issuecomment-339510522 https://api.github.com/repos/pydata/xarray/issues/1661 MDEyOklzc3VlQ29tbWVudDMzOTUxMDUyMg== TomAugspurger 1312546 2017-10-26T00:05:57Z 2017-10-26T00:05:57Z MEMBER

Pandas used to register a matplotlib converter for datetimes on import. I’ll take a closer look in a bit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  da.plot.pcolormesh fails when there is a datetime coordinate 268487752
318451800 https://github.com/pydata/xarray/pull/1457#issuecomment-318451800 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxODQ1MTgwMA== TomAugspurger 1312546 2017-07-27T18:45:36Z 2017-07-27T18:45:36Z MEMBER

Yep, thanks again for setting that up.

On Thu, Jul 27, 2017 at 11:39 AM, Wes McKinney notifications@github.com wrote:

cool, are these numbers coming off the pandabox?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1457#issuecomment-318417790, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIk24wmNhChH3nCVT3AGqR_Q6EHa9ks5sSL1IgaJpZM4N74gy .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
318376827 https://github.com/pydata/xarray/pull/1457#issuecomment-318376827 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxODM3NjgyNw== TomAugspurger 1312546 2017-07-27T14:21:30Z 2017-07-27T14:21:30Z MEMBER

These are now being run and published to https://tomaugspurger.github.io/asv-collection/xarray/

I'm plan to find a more permanent home to publish the results rather than my personal github pages site, but that may take a while before I can get to it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
315402471 https://github.com/pydata/xarray/pull/1457#issuecomment-315402471 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMxNTQwMjQ3MQ== TomAugspurger 1312546 2017-07-14T16:21:29Z 2017-07-14T16:21:29Z MEMBER

About hardware, we should be able to run these on the machine running the pandas benchmarks. Once it's merged I should be able to add it easily to https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml and the benchmarks will be run and published (to https://tomaugspurger.github.io/asv-collection/ right now; not the permanent home)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.002ms · About: xarray-datasette