home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

92 rows where user = 12229877 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

issue >30

  • Support RGB[A] arrays in plot.imshow() 9
  • Starter property-based test suite 7
  • Improved default behavior when concatenating DataArrays 7
  • If a NetCDF file is chunked on disk, open it with compatible dask chunks 5
  • Normalisation for RGB imshow 5
  • Hypothesis strategies in xarray.testing.strategies 5
  • Include units (if set) in plot labels 4
  • Shorter repr for attributes 3
  • Remove flake8 from travis 3
  • Fix RGB imshow with X or Y dim of size one 3
  • Remove future statements 3
  • Feature/benchmark 2
  • DataArray.plot raises exception if contents are all NaN 2
  • v0.10.1 Release 2
  • Read small integers as float32, not float64 2
  • Tremendous slowdown when using dask integration 2
  • Automatic duck array testing - reductions 2
  • Functions for converting DataArrays to and from iris.Cubes 1
  • Add RasterIO backend 1
  • Truncate long lines in repr of Dataset.attrs 1
  • Using uint64 for Dataset indexing gives ValueError 1
  • Need small updates of docs 1
  • When reporting errors, note what value was invalid and why 1
  • Where functionality in xarray including else case (dask compability) 1
  • Plot nans 1
  • Broken link to github on ReadTheDocs 1
  • Make `flake8 xarray` pass 1
  • Drop support for Python 3.4 1
  • Add a suite of property-based tests with Hypothesis 1
  • Should imshow() recognise 0-255 images? 1
  • …

user 1

  • Zac-HD · 92 ✖

author_association 1

  • CONTRIBUTOR 92
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1251922767 https://github.com/pydata/xarray/pull/6908#issuecomment-1251922767 https://api.github.com/repos/pydata/xarray/issues/6908 IC_kwDOAMm_X85KntNP Zac-HD 12229877 2022-09-20T06:59:10Z 2022-09-20T06:59:10Z CONTRIBUTOR

@Zac-HD if I could request one more review please! The two remaining problems for me are:

Absolutely! Some quick comments this evening; I would also like to do a full review again before merge but that might be next week or weekend - I'm out for a conference from early Thursday.

  1. How should we alter the API of datasets to make it easier to construct Dataset objects containing duck-typed arrays (see Hypothesis strategies in xarray.testing.strategies #6908 (comment))

Replied in the thread above.

  1. Why does the example generation performance seem to have gotten worse? 😕 I added what I thought were small refactors (e.g. _sizes_from_dim_names) which may somehow be related, but I'm not sure.

I'd be pretty surprised if that was related, st.fixed_dictionaries() is internally basically just that zip() trick anyway. I'd guess that this is mostly "as you implement the last few complicated data-gen options, they start taking nonzero time", but not confident in that without reading closely and probably measuring some perf things.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hypothesis strategies in xarray.testing.strategies 1336119080
1238674803 https://github.com/pydata/xarray/pull/6908#issuecomment-1238674803 https://api.github.com/repos/pydata/xarray/issues/6908 IC_kwDOAMm_X85J1K1z Zac-HD 12229877 2022-09-06T21:37:07Z 2022-09-06T21:37:07Z CONTRIBUTOR

(generally staying unsubbed; so please ping me whenever you've got questions or would like another review!)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hypothesis strategies in xarray.testing.strategies 1336119080
1218435300 https://github.com/pydata/xarray/pull/6908#issuecomment-1218435300 https://api.github.com/repos/pydata/xarray/issues/6908 IC_kwDOAMm_X85In9jk Zac-HD 12229877 2022-08-17T19:59:55Z 2022-08-17T19:59:55Z CONTRIBUTOR

(unsubbing for noise, please @-me when you'd like another review!)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hypothesis strategies in xarray.testing.strategies 1336119080
1217524909 https://github.com/pydata/xarray/pull/6908#issuecomment-1217524909 https://api.github.com/repos/pydata/xarray/issues/6908 IC_kwDOAMm_X85IkfSt Zac-HD 12229877 2022-08-17T06:39:42Z 2022-08-17T06:39:42Z CONTRIBUTOR

OK, reviewed ✅

Overall it looks pretty good, but there are a couple of places where the API you've got is pushing you into some really nasty performance traps where you have to use rejection sampling to keep things consistent. We have some tricks to help, but it's fundamentally exponential scaling - which means that it works right up until it doesn't work at all, and most of your users are likely to hit that regime 😕 Definitely possible to fix that, but it's more like a redesign than patching a typo.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hypothesis strategies in xarray.testing.strategies 1336119080
1216920806 https://github.com/pydata/xarray/pull/6908#issuecomment-1216920806 https://api.github.com/repos/pydata/xarray/issues/6908 IC_kwDOAMm_X85IiLzm Zac-HD 12229877 2022-08-16T17:12:40Z 2022-08-16T17:12:40Z CONTRIBUTOR

@Zac-HD would you be able to give your hypothesis-expert opinions on the general design of these strategies? Especially the whole "strategies that accept multiple other strategies" thing.

I'll aim for a proper review tonight! Quick remarks: strategies accepting strategies is fine, though our API style guide suggests accepting strategies xor values (this is violated in a few places in our Numpy extra, but it's still a good base principle). The guides might have other useful advice, I'd recommend skimming them since you're planning to ship Hypothesis-extending code to quite a few people!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Hypothesis strategies in xarray.testing.strategies 1336119080
828924833 https://github.com/pydata/xarray/pull/4972#issuecomment-828924833 https://api.github.com/repos/pydata/xarray/issues/4972 MDEyOklzc3VlQ29tbWVudDgyODkyNDgzMw== Zac-HD 12229877 2021-04-29T04:03:39Z 2021-04-29T04:03:39Z CONTRIBUTOR

is there a way to separate sizes into dims and shape without using draw? I tried dims, shape = sizes.map(lambda s: tuple(zip(*s))), but since the map strategy is not iterable the final unpacking fails.

You've just got to use draw for this.

I'd prefer the verbose repr of st.builds() when debugging strategies, but if I know the strategy is correct and understand what it generates I'd probably prefer the smaller repr of @st.composite.

That's a good way of thinking about it :slightly_smiling_face:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Automatic duck array testing - reductions 818059250
808736180 https://github.com/pydata/xarray/pull/4972#issuecomment-808736180 https://api.github.com/repos/pydata/xarray/issues/4972 MDEyOklzc3VlQ29tbWVudDgwODczNjE4MA== Zac-HD 12229877 2021-03-27T13:51:54Z 2021-03-27T13:51:54Z CONTRIBUTOR

Looking at https://github.com/keewis/xarray/compare/duckarray-tests...duckarray-tests-hypothesis, for high-level feedback:

  • Overall it looks pretty good; though ping me again if/when it's a PR and I'll do line-level feedback on idiom issues
  • A more general test would generate the shapes, and the axes to reduce over - reducing a 1D array over the first dimension is going to miss things
  • You use @st.composite when the .map() method and a lambda would suffice (though the perf gain is small enough that this is mostly a readability issue)
  • I don't see the point of Label, and we advise against mixing "a strategy or a value". We break this rule a few times for backwards-compatibility in our Numpy support, but wouldn't write such an API these days.

And I'm always delighted to see people using Hypothesis to test libraries that I use and love 🥰🤩

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Automatic duck array testing - reductions 818059250
633336577 https://github.com/pydata/xarray/pull/4089#issuecomment-633336577 https://api.github.com/repos/pydata/xarray/issues/4089 MDEyOklzc3VlQ29tbWVudDYzMzMzNjU3Nw== Zac-HD 12229877 2020-05-25T01:39:49Z 2020-05-25T08:29:16Z CONTRIBUTOR

Also, I don't think there is a hypothesis.extra.xarray module, yet. Any comments on that, @Zac-HD?

Not yet, though I'm interested in collaborations to write one!

Our existing Numpy + Pandas integrations, along with the st.builds() and @st.composite strategies, make it reasonably straightforward-if-tedious to compose the objects you want in exquisite detail... the real challenge is to design a clean high-level API which lets you control only the parts you care about. Plausibly we have to let go of that last part to make this work, and just point people at the low-level tools if high-level isn't working.

This third-party module is unidiomatic in that it doesn't have the API design above, but I believe it works. rdturnermtl has a history of great features; we eventually got idiomatic high-performance gufunc shapes upstream and I'm confident we'll get Xarray support eventually too... and sooner if there are people who can help design it :smile:

Just \@-mention me again if this comes up, or I won't be notified.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.cov() and xr.corr() 623751213
616967346 https://github.com/pydata/xarray/issues/1846#issuecomment-616967346 https://api.github.com/repos/pydata/xarray/issues/1846 MDEyOklzc3VlQ29tbWVudDYxNjk2NzM0Ng== Zac-HD 12229877 2020-04-21T05:51:10Z 2020-04-21T05:51:10Z CONTRIBUTOR

@rdturnermtl wrote a Hypothesis extension for Xarray, which is at least a nice demo of what's possible.

If Xarray contributors want to keep working on this (#3283 seems stalled?) I'd be happy to help, since I'm both invested in Xarray working and would like to ship a hypothesis.extra.xarray some day :slightly_smiling_face:

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add a suite of property-based tests with Hypothesis 290244473
547646808 https://github.com/pydata/xarray/pull/3285#issuecomment-547646808 https://api.github.com/repos/pydata/xarray/issues/3285 MDEyOklzc3VlQ29tbWVudDU0NzY0NjgwOA== Zac-HD 12229877 2019-10-29T21:54:35Z 2019-10-29T21:56:21Z CONTRIBUTOR

OK, looks like the test failure now is real. Let me know if you want me to comment out the relevant line so the tests pass.

If we want to merge a subset of the tests then that's fine. Ofc even better if we can use these tests to find & fix the errors

In my experience it's better to open an issue, add an xfail decorator to the test, and merge the tests PR. Otherwise the initial PR can take a very long time and no other property-based tests get added.

In this case I'd duplicate the test, so there's one which does not allow empty dataframes and one (xfailing) which does.

It's also likely that the person who found the bug is not the best person to fix it, and requiring that they do so in order to merge a useful test just disincentives testing!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Hypothesis tests for roundtrip to & from pandas 490316894
541046819 https://github.com/pydata/xarray/pull/3283#issuecomment-541046819 https://api.github.com/repos/pydata/xarray/issues/3283 MDEyOklzc3VlQ29tbWVudDU0MTA0NjgxOQ== Zac-HD 12229877 2019-10-11T12:38:06Z 2019-10-11T12:38:06Z CONTRIBUTOR

The datetime64 & timedelta64 cases don't roundtrip completely, so there are test failures. I've added these in a separate commit at the end, so it's easy to cut that back out if you prefer.

The basic problem seems to be that Hypothesis is generating timedelta64[Y], but Xarray is dealing with timedelta64[ns]:

``` ____ test_netcdf_roundtrip _____ tmp_path = PosixPath('/tmp/pytest-of-vsts/pytest-0/test_netcdf_roundtrip0') data = data(...), arr = array([293], dtype='timedelta64[Y]') ... with xr.open_dataset(tmp_path / "test.nc") as roundtripped:

      xr.testing.assert_identical(original, roundtripped)

E AssertionError: Left and right Dataset objects are not identical E
E Differing data variables: E L data (0) timedelta64[ns] -106488 days +01:41:02.290448 E R data (0) timedelta64[ns] -106488 days +01:41:02.290449

properties/test_netcdf_roundtrip.py:51: AssertionError ---------------------------------- Hypothesis ---------------------------------- Falsifying example: test_netcdf_roundtrip( tmp_path=PosixPath('/tmp/pytest-of-vsts/pytest-0/test_netcdf_roundtrip0'), data=data(...), arr=array([293], dtype='timedelta64[Y]') ) ```

So either that's a pretty serious bug, or you should specify the max_period of the timedelta dtype strategy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add hypothesis test for netCDF4 roundtrip 490228661
522436770 https://github.com/pydata/xarray/issues/1566#issuecomment-522436770 https://api.github.com/repos/pydata/xarray/issues/1566 MDEyOklzc3VlQ29tbWVudDUyMjQzNjc3MA== Zac-HD 12229877 2019-08-19T06:50:15Z 2019-08-19T06:50:15Z CONTRIBUTOR

It's at least out of date, and doesn't appear to be helpful, so I'm happy to close this issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  When reporting errors, note what value was invalid and why 256557897
519760729 https://github.com/pydata/xarray/pull/3194#issuecomment-519760729 https://api.github.com/repos/pydata/xarray/issues/3194 MDEyOklzc3VlQ29tbWVudDUxOTc2MDcyOQ== Zac-HD 12229877 2019-08-09T03:10:37Z 2019-08-09T03:10:37Z CONTRIBUTOR

My config uses tox and bash instead of pre-commit. Doesn't work well for Windows, but it's a tiny Python script to glob for .py files and subprocess.run the pyupgrade command. Balancing powerful CI against new-contributor-friendliness is hard!

ini [testenv:check] deps = -r requirements.txt whitelist_externals = bash commands = autoflake --recursive --in-place --remove-all-unused-imports --remove-duplicate-keys --remove-unused-variables . bash -c \"pyupgrade --py36-plus **.py\" isort --recursive --apply . black . flake8 mypy --config-file=tox.ini . src/ bandit --recursive --exclude=./.tox/** --skip=B101,B110,B310 --quiet .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove future statements 478690528
519720166 https://github.com/pydata/xarray/pull/3194#issuecomment-519720166 https://api.github.com/repos/pydata/xarray/issues/3194 MDEyOklzc3VlQ29tbWVudDUxOTcyMDE2Ng== Zac-HD 12229877 2019-08-08T23:14:19Z 2019-08-08T23:14:19Z CONTRIBUTOR

The author pushes pre-commit pretty hard, but that shouldn't be a problem for Xarray as we're already using it. Then pre-commit run --all-files will execute it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove future statements 478690528
519713454 https://github.com/pydata/xarray/pull/3194#issuecomment-519713454 https://api.github.com/repos/pydata/xarray/issues/3194 MDEyOklzc3VlQ29tbWVudDUxOTcxMzQ1NA== Zac-HD 12229877 2019-08-08T22:42:12Z 2019-08-08T22:42:12Z CONTRIBUTOR

In all seriousness, pyupgrade will do this and more automatically with the --py3-only option. I'd add it to CI!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove future statements 478690528
519711707 https://github.com/pydata/xarray/issues/3092#issuecomment-519711707 https://api.github.com/repos/pydata/xarray/issues/3092 MDEyOklzc3VlQ29tbWVudDUxOTcxMTcwNw== Zac-HD 12229877 2019-08-08T22:34:33Z 2019-08-08T22:34:33Z CONTRIBUTOR

@max-sixty - closed by #3142 I think?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  black formatting 466750687
518929174 https://github.com/pydata/xarray/pull/3142#issuecomment-518929174 https://api.github.com/repos/pydata/xarray/issues/3142 MDEyOklzc3VlQ29tbWVudDUxODkyOTE3NA== Zac-HD 12229877 2019-08-07T03:54:03Z 2019-08-07T03:54:03Z CONTRIBUTOR

Sadly if there's a way to automatically fix up merge conflicts I don't know of it 😭. However I'd be happy to help anyone who is having trouble rebasing their open PR.

On a different note, have you considered adding the pyupgrade and autoflake8 autoformatters? They go further than Black in that they do change the AST, but it's a nice way to ensure consistency of style, automatically remove unused imports and variables for flake8, and so on.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Black 469871658
469067311 https://github.com/pydata/xarray/pull/2777#issuecomment-469067311 https://api.github.com/repos/pydata/xarray/issues/2777 MDEyOklzc3VlQ29tbWVudDQ2OTA2NzMxMQ== Zac-HD 12229877 2019-03-03T21:41:21Z 2019-03-03T21:41:21Z CONTRIBUTOR

No objection to going with #2792; I'm just happy to have the change merged 😄

It would be nice for someone to cherry-pick 63da214d697345ebdd0ecc0967c72eafc70bcb0d before releasing 0.12 though, just to fix that warning.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved default behavior when concatenating DataArrays 411755105
467667604 https://github.com/pydata/xarray/pull/2777#issuecomment-467667604 https://api.github.com/repos/pydata/xarray/issues/2777 MDEyOklzc3VlQ29tbWVudDQ2NzY2NzYwNA== Zac-HD 12229877 2019-02-27T00:07:34Z 2019-02-27T00:07:34Z CONTRIBUTOR

maybe you have a syntax error?

...yep, an unmatched paren. 😥

If a and b have the same name, then you'd get an index with the two duplicate entries in the second case but not the first case. [which would be bad]

I think it's impossible to avoid this when using inference in the general case. Two options I think would be decent-if-unsatisfying:

  1. Explicitly manage this in the new combining functions, e.g. clear the concat dim coords if they are not unique and the input arrays did not have coords in that dimension.
  2. Add an argument to xr.concat to enable or disable this, e.g. infer_coords=True, and disable it when calling xr.concat from other combining functions.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved default behavior when concatenating DataArrays 411755105
466587431 https://github.com/pydata/xarray/pull/2777#issuecomment-466587431 https://api.github.com/repos/pydata/xarray/issues/2777 MDEyOklzc3VlQ29tbWVudDQ2NjU4NzQzMQ== Zac-HD 12229877 2019-02-22T23:47:30Z 2019-02-22T23:47:30Z CONTRIBUTOR

@shoyer - don't worry about the docs build, I'm pretty sure that was just a flaky network from Travis and it's working now in any case.

I've left tutorial.load_dataset in, just changed "removed in 0.12" to "removed in a future version".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved default behavior when concatenating DataArrays 411755105
466369284 https://github.com/pydata/xarray/pull/2777#issuecomment-466369284 https://api.github.com/repos/pydata/xarray/issues/2777 MDEyOklzc3VlQ29tbWVudDQ2NjM2OTI4NA== Zac-HD 12229877 2019-02-22T11:41:37Z 2019-02-22T11:41:37Z CONTRIBUTOR

OK! @shoyer, I've got everything passing and it's ready for review.

Even the accidental tutorial/docs fixes :smile:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved default behavior when concatenating DataArrays 411755105
465843922 https://github.com/pydata/xarray/pull/2784#issuecomment-465843922 https://api.github.com/repos/pydata/xarray/issues/2784 MDEyOklzc3VlQ29tbWVudDQ2NTg0MzkyMg== Zac-HD 12229877 2019-02-21T03:05:33Z 2019-02-21T03:05:33Z CONTRIBUTOR

This function is deprecated and planned for removal in Xarray 0.12 (i.e. very soon, see #2776), e.g. by https://github.com/pydata/xarray/pull/2777/commits/d6e15cc1647909cebde734a5d6b92f262b234112. Merging both will just create a merge conflict, but if you want to pull that commit into this PR that would be fine with me!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix typos in tutorial.py (#2783) 412694880
465812738 https://github.com/pydata/xarray/pull/2777#issuecomment-465812738 https://api.github.com/repos/pydata/xarray/issues/2777 MDEyOklzc3VlQ29tbWVudDQ2NTgxMjczOA== Zac-HD 12229877 2019-02-21T00:32:10Z 2019-02-21T00:32:10Z CONTRIBUTOR

Hmm, it looks like the failure to download the naturalearth coastlines.zip wasn't so transient after all - but it does work on my machine!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved default behavior when concatenating DataArrays 411755105
465802643 https://github.com/pydata/xarray/pull/2777#issuecomment-465802643 https://api.github.com/repos/pydata/xarray/issues/2777 MDEyOklzc3VlQ29tbWVudDQ2NTgwMjY0Mw== Zac-HD 12229877 2019-02-20T23:51:04Z 2019-02-20T23:51:15Z CONTRIBUTOR

The docs build failed due to a (transient) http error when loading tutorial data for the docs, so I've also finalised the planned conversion from xarray.tutorial.load_dataset to xarray.tutorial.open_dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved default behavior when concatenating DataArrays 411755105
465535523 https://github.com/pydata/xarray/issues/2776#issuecomment-465535523 https://api.github.com/repos/pydata/xarray/issues/2776 MDEyOklzc3VlQ29tbWVudDQ2NTUzNTUyMw== Zac-HD 12229877 2019-02-20T11:25:23Z 2019-02-20T11:25:23Z CONTRIBUTOR

I'd love to get #2777 merged - it's a small patch, but really useful in a remote sensing niche.

My students are going to be graded on how well their figures are labelled, and I really don't want to have to teach them matplotlib 😅

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  0.12.0 release 411738552
465029759 https://github.com/pydata/xarray/pull/2777#issuecomment-465029759 https://api.github.com/repos/pydata/xarray/issues/2777 MDEyOklzc3VlQ29tbWVudDQ2NTAyOTc1OQ== Zac-HD 12229877 2019-02-19T08:11:56Z 2019-02-19T08:11:56Z CONTRIBUTOR

Thanks for the support and quick review @shoyer!

Any idea when Xarray 0.12 might be out? I'm teaching some remote sensing workshops in mid-March and would love to have this merged, as a colleague's review of those notebooks prompted this PR 😄

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Improved default behavior when concatenating DataArrays 411755105
431718845 https://github.com/pydata/xarray/issues/2499#issuecomment-431718845 https://api.github.com/repos/pydata/xarray/issues/2499 MDEyOklzc3VlQ29tbWVudDQzMTcxODg0NQ== Zac-HD 12229877 2018-10-22T00:50:22Z 2018-10-22T00:50:22Z CONTRIBUTOR

I'd also try to find a way to use a groupby or apply_along_axis without stacking and unstacking the data, and to choose chunks that match the layout on disk - i.e. try lon=1 if the order is time, lat, lon. If the time observations are not contiguous in memory, it's probably worth reshaping the whole array and writing it back to disk up front.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Tremendous slowdown when using dask integration 372244156
431657200 https://github.com/pydata/xarray/issues/2499#issuecomment-431657200 https://api.github.com/repos/pydata/xarray/issues/2499 MDEyOklzc3VlQ29tbWVudDQzMTY1NzIwMA== Zac-HD 12229877 2018-10-21T10:30:23Z 2018-10-21T10:30:23Z CONTRIBUTOR

dataset = xr.open_dataset(netcdf_precip, chunks={'lat': 1})

This makes me really suspicious - lat=1 is a very very small chunk size, and completely unchunked in time and lon. Without knowing anything else, I'd try chunks=dict(lat=200, lon=200) or higher depending on the time dim - Dask is most efficient with chunks of around 10MB for most workloads.

This all also depends on the data layout on disk too - can you share repr(xr.open_dataset(netcdf_precip))? What does ncdump say?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Tremendous slowdown when using dask integration 372244156
424916084 https://github.com/pydata/xarray/pull/2442#issuecomment-424916084 https://api.github.com/repos/pydata/xarray/issues/2442 MDEyOklzc3VlQ29tbWVudDQyNDkxNjA4NA== Zac-HD 12229877 2018-09-27T00:35:46Z 2018-09-27T00:35:46Z CONTRIBUTOR

No problem!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Use Hypothesis profile mechanism, not no-op mutation 364247513
424902226 https://github.com/pydata/xarray/issues/2441#issuecomment-424902226 https://api.github.com/repos/pydata/xarray/issues/2441 MDEyOklzc3VlQ29tbWVudDQyNDkwMjIyNg== Zac-HD 12229877 2018-09-26T23:21:33Z 2018-09-26T23:21:33Z CONTRIBUTOR

Wow, this is embarrassing - I strengthened that check because I thought people might be relying on a silent no-op, and the first case I hear about... was me. Um. #2442 fixes it, at least?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  hypothesis tests are failing on master 364148137
396381891 https://github.com/pydata/xarray/pull/2204#issuecomment-396381891 https://api.github.com/repos/pydata/xarray/issues/2204 MDEyOklzc3VlQ29tbWVudDM5NjM4MTg5MQ== Zac-HD 12229877 2018-06-11T20:47:21Z 2018-06-11T20:47:21Z CONTRIBUTOR

@jhamman - also closes #2211. Note though that you have to put the word "closes" (or "fixes") before each issue reference; it doesn't handle CSV so they don't get closed, and that's an easy way to have zombie issues that cause some confusion later.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  update minimum versions and associated code cleanup 327905732
387052622 https://github.com/pydata/xarray/issues/2093#issuecomment-387052622 https://api.github.com/repos/pydata/xarray/issues/2093 MDEyOklzc3VlQ29tbWVudDM4NzA1MjYyMg== Zac-HD 12229877 2018-05-07T12:43:41Z 2018-05-07T12:43:41Z CONTRIBUTOR

With the benefit of almost a year's worth of procrastination, I think the best approach is to take the heuristics from #1440, but only support chunks=True - if a decent default heuristic isn't good enough, the user can specify exact chunks.

The underlying logic for this issue would be identical to that of #1440, so supporting both is "just" a matter of plumbing it in correctly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Default chunking in GeoTIFF images 318950038
374573132 https://github.com/pydata/xarray/pull/1972#issuecomment-374573132 https://api.github.com/repos/pydata/xarray/issues/1972 MDEyOklzc3VlQ29tbWVudDM3NDU3MzEzMg== Zac-HD 12229877 2018-03-20T12:03:13Z 2018-03-20T12:03:13Z CONTRIBUTOR

I would say we should merge it now and iterate in future PRs.

Merge away then!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Starter property-based test suite 303103716
373545890 https://github.com/pydata/xarray/pull/1972#issuecomment-373545890 https://api.github.com/repos/pydata/xarray/issues/1972 MDEyOklzc3VlQ29tbWVudDM3MzU0NTg5MA== Zac-HD 12229877 2018-03-15T22:41:20Z 2018-03-15T22:41:20Z CONTRIBUTOR

@shoyer & @fmaussion - I've just given up on the plotting tests as being more effort than they're worth. Are there any:

  • blockers to merging this as-is?
  • other APIs you think it would be reasonably easy to write property tests for? Here's a nice list of properties to test :smile:
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Starter property-based test suite 303103716
371796211 https://github.com/pydata/xarray/pull/1972#issuecomment-371796211 https://api.github.com/repos/pydata/xarray/issues/1972 MDEyOklzc3VlQ29tbWVudDM3MTc5NjIxMQ== Zac-HD 12229877 2018-03-09T12:11:02Z 2018-03-09T12:11:02Z CONTRIBUTOR

@jklymak - I'm getting RuntimeError: Invalid DISPLAY variable in the Qt backend now that I've added plt.clf(). It works on my machine now (:tada:, thanks!) - any suggestions for Travis?

(I'm also getting Zarr errors, but I assume those will go away soon as I didn't cause them)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Starter property-based test suite 303103716
371722392 https://github.com/pydata/xarray/pull/1972#issuecomment-371722392 https://api.github.com/repos/pydata/xarray/issues/1972 MDEyOklzc3VlQ29tbWVudDM3MTcyMjM5Mg== Zac-HD 12229877 2018-03-09T06:05:46Z 2018-03-09T06:05:46Z CONTRIBUTOR

...that also explains why I was having trouble reproducing the error, whoops. I'll see how it goes with those problems excluded later tonight!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Starter property-based test suite 303103716
371689343 https://github.com/pydata/xarray/pull/1972#issuecomment-371689343 https://api.github.com/repos/pydata/xarray/issues/1972 MDEyOklzc3VlQ29tbWVudDM3MTY4OTM0Mw== Zac-HD 12229877 2018-03-09T02:05:18Z 2018-03-09T02:05:39Z CONTRIBUTOR

@shoyer - that depends mostly on whether you want to run these tests as part of a standard run. A test-time dependency on Hypothesis is very cheap compared to the other dev dependencies, so I'd think more about the impact on eg CI resources than contributors.

Upside is more power and coverage of edge-cases with odd data; downside is that they take a lot longer by virtue of trying hundreds of examples, and in this case also having to generate arrays takes a while (~log(elements) average via sparse filling).


@tacaswell - I would be delighted to write a test suite like this for matplotlib! The only reason I haven't is because I thought it would be rude to report so many bugs that I don't have time to help fix. If we can get a group project going though I'd be very enthusiastic 😄

  • For this suite I was passing in 2D arrays of unsigned ints, signed ints, or floats (in all sizes), with edge sizes between 2 and 10.
  • The failing outputs were reported as 2x2 arrays (~30 times), or 3x2 (once - there's a cap on number of shrinks that I think I hit there).
  • Values tended to be either uniform small integers - 0 or 1, dtype int8 or uint8; or for some an array of floats with one corner zero and the others very large.

I didn't keep the exact tracebacks, but I remember seeing many come from overflow in tick spacing calculations. Again, happy to write a test suite and make more detailed reports upstream if people want to fix this - in which case let's open an issue there!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Starter property-based test suite 303103716
371671971 https://github.com/pydata/xarray/pull/1972#issuecomment-371671971 https://api.github.com/repos/pydata/xarray/issues/1972 MDEyOklzc3VlQ29tbWVudDM3MTY3MTk3MQ== Zac-HD 12229877 2018-03-09T00:26:13Z 2018-03-09T00:26:13Z CONTRIBUTOR

This looks like a great start to me -- thank you!

You're welcome! Same questions as above though, plus "is there anything else you need in this initial PR?". If not, can we merge it

It's impressive that it's possible to break every plotting type with matplotlib :).

As much as I love matplotlib, it's a steaming pile of hacks and I want to avoid it more than I want it cleaned up 😥 (entirely because the process is dysfunctional, not the code)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Starter property-based test suite 303103716
371403712 https://github.com/pydata/xarray/pull/1972#issuecomment-371403712 https://api.github.com/repos/pydata/xarray/issues/1972 MDEyOklzc3VlQ29tbWVudDM3MTQwMzcxMg== Zac-HD 12229877 2018-03-08T07:29:18Z 2018-03-08T07:29:18Z CONTRIBUTOR

Ping @fmaussion / @shoyer - would love your opinions on this, including high-value targets to test.

(btw Appveyor had an internal error; build is otherwise green)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Starter property-based test suite 303103716
371011595 https://github.com/pydata/xarray/pull/1967#issuecomment-371011595 https://api.github.com/repos/pydata/xarray/issues/1967 MDEyOklzc3VlQ29tbWVudDM3MTAxMTU5NQ== Zac-HD 12229877 2018-03-07T03:34:37Z 2018-03-07T03:34:37Z CONTRIBUTOR

Not any more :wink: - it links to the docs that "[xarray.plot.plot()] calls an xarray plotting function based on the dimensions of darray.squeeze()".

I'd actually like to keep the magic in plot() - the whole point of this is that we can do something with the data, not that we do anything in particular. For interactive use, this is valuable enough - in my view - to justify keeping an inconsistent API; particularly when there is an obvious non-magic version to use in a script. In any case, that's a topic for another pull.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix RGB imshow with X or Y dim of size one 302695966
370974503 https://github.com/pydata/xarray/pull/1967#issuecomment-370974503 https://api.github.com/repos/pydata/xarray/issues/1967 MDEyOklzc3VlQ29tbWVudDM3MDk3NDUwMw== Zac-HD 12229877 2018-03-07T00:06:15Z 2018-03-07T03:08:31Z CONTRIBUTOR

Ah, I see what you mean but don't think we need any change or additional test.

There's a catch though - xarray.plot() is special, because it (and only it) squeezes the dimensions of the array before plotting it. Therefore, the following calls all produce the same plot:

xr.DataArray(np.arange(9).reshape((3,3))).plot()
xr.DataArray(np.arange(9).reshape((1,3,3))).plot()
xr.DataArray(np.arange(9).reshape((1,1,3,1,1,1,1,3,1,1))).plot()

My view is that the test you linked to is sufficient for the test you're asking for - imshow is a special case because it can accept 3D input for RGB plots.

TLDR - working as intended IMO, it's just that nobody reads the docs. Changing the API would avoid this but at cost of convenience which is the whole point of DataArray.plot().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix RGB imshow with X or Y dim of size one 302695966
370813600 https://github.com/pydata/xarray/pull/1967#issuecomment-370813600 https://api.github.com/repos/pydata/xarray/issues/1967 MDEyOklzc3VlQ29tbWVudDM3MDgxMzYwMA== Zac-HD 12229877 2018-03-06T15:13:43Z 2018-03-06T15:13:43Z CONTRIBUTOR

No problem - I had an hour free and no open pulls waiting on me, so the timing was good.

As a regression test it's specific to imshow, so I'm not sure what you'd want here (or whether it would work at all on the 2d mixin). More details please?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix RGB imshow with X or Y dim of size one 302695966
370774614 https://github.com/pydata/xarray/issues/1966#issuecomment-370774614 https://api.github.com/repos/pydata/xarray/issues/1966 MDEyOklzc3VlQ29tbWVudDM3MDc3NDYxNA== Zac-HD 12229877 2018-03-06T13:04:03Z 2018-03-06T13:08:54Z CONTRIBUTOR

Ugh, this is literally a one-line fix: I shouldn't have put .squeeze() here. If you also ensure that the dtype is OK (specify uint8, or divide by nine) the example then works.

It's also exactly the kind of thing that Hypothesis would catch for us via #1846. Work (moved to less coding :cry:) and the upcoming major release of Hypothesis have priority, but I'll try to find time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  imshow should work with third dimension of len 1 302679890
369117775 https://github.com/pydata/xarray/pull/1919#issuecomment-369117775 https://api.github.com/repos/pydata/xarray/issues/1919 MDEyOklzc3VlQ29tbWVudDM2OTExNzc3NQ== Zac-HD 12229877 2018-02-28T04:27:16Z 2018-02-28T04:27:16Z CONTRIBUTOR

Last time I used it Hound didn't update the PR status, so if so that removes one of my objections 😄 As a maintainer I prefer automating checks - I make mistakes, especially when tired, and it's nice knowing that the system has my back.

Hound only runs checks in the diff range though, so I think it's still valuable to keep running flake8 ourselves. Trying both for a few months before making a final decision sounds like a good idea to me!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove flake8 from travis 297794911
369111117 https://github.com/pydata/xarray/pull/1919#issuecomment-369111117 https://api.github.com/repos/pydata/xarray/issues/1919 MDEyOklzc3VlQ29tbWVudDM2OTExMTExNw== Zac-HD 12229877 2018-02-28T03:38:47Z 2018-02-28T03:39:07Z CONTRIBUTOR

home-assistant uses https://houndci.com, and it seems to work pretty well for them - it just leaves a review with any flake8 comments at the relevant lines of the pull.

Note that while HoundCI is used to provide faster and more readable feedback to contributors, they still check flake8 in Travis to ensure everything is fixed before merging.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove flake8 from travis 297794911
369106762 https://github.com/pydata/xarray/issues/1912#issuecomment-369106762 https://api.github.com/repos/pydata/xarray/issues/1912 MDEyOklzc3VlQ29tbWVudDM2OTEwNjc2Mg== Zac-HD 12229877 2018-02-28T03:09:46Z 2018-02-28T03:09:46Z CONTRIBUTOR

Per comment on the pull: my experience is that bots are a great complement, but can't replace style checks in CI.

Only remove flake8 from Travis if you're OK with merging pulls that don't meet the style guide (and then amend the contributing guide to match) - that's a valid decision, but also a weaker position than Xarray has previously taken.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Code review bots? 297452821
369106191 https://github.com/pydata/xarray/pull/1919#issuecomment-369106191 https://api.github.com/repos/pydata/xarray/issues/1919 MDEyOklzc3VlQ29tbWVudDM2OTEwNjE5MQ== Zac-HD 12229877 2018-02-28T03:05:48Z 2018-02-28T03:05:48Z CONTRIBUTOR

I would be very disappointed to remove the flake8 check from Travis - in my experience, style guides are only applied consistently if they are enforced by CI.

Ensuring that each pull maintains our standards is much less burdensome than regular "big push" efforts to get back to the standard (see eg #1741 and #1824 - it took two people!), and avoids periods of lower code quality.

In short, I think that Travis is the right place to put these checks - review bots are a complement, not a substitute.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Remove flake8 from travis 297794911
368686954 https://github.com/pydata/xarray/issues/1821#issuecomment-368686954 https://api.github.com/repos/pydata/xarray/issues/1821 MDEyOklzc3VlQ29tbWVudDM2ODY4Njk1NA== Zac-HD 12229877 2018-02-26T23:20:45Z 2018-02-26T23:20:45Z CONTRIBUTOR

First: thanks, everyone, for such a prompt and helpful response! I'm excited both to have 10.1 (:tada:), and by the prospect of faster/automated releases in future.

Reading over the releasing instructions, I think there are three parts we need to work on to go fully automated. By fully automated, I mean "no maintainer action whatsoever beyond merging pulls, which are not release-specific":

  • Most things can be automated with continuous deployment scripts like Hypothesis'. We deploy a patch or minor release for every pull request, but if that's undesirable you could run the deployment from a weekly "cron" job on Travis.
  • Changelog management (for weekly rather than per-PR releases) can be automated with https://github.com/hawkowl/towncrier - there's a substitute for all of us :smile:
  • Xarray would need some novel scripts for uploading Github releases and driving the conda-forge feedstock. (Which Hypothesis would borrow in turn - it's lower priority for us but still nice to have)

In short, my advice is to be creative, and if release processes can't be automated - change them!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  v0.10.1 Release 287852184
368303298 https://github.com/pydata/xarray/issues/1821#issuecomment-368303298 https://api.github.com/repos/pydata/xarray/issues/1821 MDEyOklzc3VlQ29tbWVudDM2ODMwMzI5OA== Zac-HD 12229877 2018-02-25T11:55:31Z 2018-02-25T11:55:31Z CONTRIBUTOR

@jhamman & @shoyer - I think Xarray has a release frequency problem.

  • 1782 has been merged, but I still can't plot timeseries with an all-nan step

  • 1796, #1819, and #1893 have been merged - but I still can't plot RGB images

  • 1840 has been merged - but loading CF-int-encoded data still uses double the memory it needs.

  • All of these daily tasks for myself and my colleagues are harder than they should be.

It's been more than three months now without a patch release. This is really, really frustrating as an Xarray contributor, user, and advocate - getting my work merged upstream literally isn't worth anything until it's released, my colleagues have trouble using it (and go back to Matlab or IDL!), and it's harder to ask for anything in meetings with eg @opendatacube.

Moving to weekly patch releases would fix all of these problems.

Maintainer availability doesn't need to be a limiting factor, either - for example, @HypothesisWorks has a deployment pipeline where the only human involvement is to click 'merge', and I'd be happy to help out if you'd like to set up a similar system.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  v0.10.1 Release 287852184
308928211 https://github.com/pydata/xarray/issues/1440#issuecomment-308928211 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwODkyODIxMQ== Zac-HD 12229877 2017-06-16T04:11:10Z 2018-02-10T07:13:11Z CONTRIBUTOR

@matt-long, I think that's a separate issue. Please open a new pull request, including a link to data that will let us reproduce the problem.

@jhamman - [updated] I was always keen to work on this if I could make time, but have since changed jobs. However I'd still be happy to help anyone who wants to work on it with design and review.

I definitely want to preserve the exact present semantics of dict arguments (so users have exact control, with a warning if it's incompatible with disk chunks). I may interpret int arguments as a (deprecated) hint though, as that's what it's mostly used for, and will add a fairly limited hints API to start with - more advanced users can just specify exact chunks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
364392462 https://github.com/pydata/xarray/issues/1789#issuecomment-364392462 https://api.github.com/repos/pydata/xarray/issues/1789 MDEyOklzc3VlQ29tbWVudDM2NDM5MjQ2Mg== Zac-HD 12229877 2018-02-09T10:13:43Z 2018-02-09T10:13:43Z CONTRIBUTOR

Related to rtfd/readthedocs.org#1820, I think. Searching reveals that there have been a variety of problems with the 'Edit on Github' link over time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Broken link to github on ReadTheDocs  282916278
362976003 https://github.com/pydata/xarray/pull/1787#issuecomment-362976003 https://api.github.com/repos/pydata/xarray/issues/1787 MDEyOklzc3VlQ29tbWVudDM2Mjk3NjAwMw== Zac-HD 12229877 2018-02-05T04:01:16Z 2018-02-05T04:01:16Z CONTRIBUTOR

This is now far enough down my list of priorities that I'm unlikely to get to it any time soon, and (it turns out) there are also some design issues that would need to be resolved.

I therefore suggest that anyone wishing to take this on opens an issue first to clarify the scope of any such change, and then make a PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Include units (if set) in plot labels 282369945
362902142 https://github.com/pydata/xarray/issues/1880#issuecomment-362902142 https://api.github.com/repos/pydata/xarray/issues/1880 MDEyOklzc3VlQ29tbWVudDM2MjkwMjE0Mg== Zac-HD 12229877 2018-02-04T12:11:27Z 2018-02-04T12:11:59Z CONTRIBUTOR

Ouch, this is definitely a bug in the code I wrote - that's terrible 😭

Probably related to matplotlib/matplotlib/pull/10220, where I'm looking at clipping instead of wrapping colours from the other side. (not as in code overlap or causing a bug, but dealing with similar logic)

Fixing this issue may be as simple as setting alpha to 255 instead of 1 for integer dtypes, or it might be more involved. Either way I'll try to open a pull within the next few days.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should imshow() recognise 0-255 images? 293858326
359168533 https://github.com/pydata/xarray/pull/1840#issuecomment-359168533 https://api.github.com/repos/pydata/xarray/issues/1840 MDEyOklzc3VlQ29tbWVudDM1OTE2ODUzMw== Zac-HD 12229877 2018-01-20T12:39:30Z 2018-01-20T12:39:30Z CONTRIBUTOR

Added tests; float32 not upcast, float16 ("intended for storage of many floating-point values where higher precision is not needed, not for performing arithmetic") is upcast but only to float32.

I'll open a new issue to add a basic suite of property-based tests to Xarray 😄

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read small integers as float32, not float64 289853579
358882878 https://github.com/pydata/xarray/pull/1840#issuecomment-358882878 https://api.github.com/repos/pydata/xarray/issues/1840 MDEyOklzc3VlQ29tbWVudDM1ODg4Mjg3OA== Zac-HD 12229877 2018-01-19T06:58:14Z 2018-01-19T06:58:14Z CONTRIBUTOR

Thanks - I was actually writing up an issue and decided it would be easier to demonstrate the proposed fix in a PR, but I'll open an issue first next time.

The checkbox about flake8 could be removed from the issue template now - since #1824 we run flake8 on everything in CI so if tests pass flake8 is passing too.

Re: tests: what do you (and @shoyer) think about using Hypothesis for some property-based tests of variable coding? "encoding then decoding is a no-op" is a classic property 😄 Upside, more powerful and better at finding edge cases; downside slower simply because it checks more cases (a configurable number).

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Read small integers as float32, not float64 289853579
358831413 https://github.com/pydata/xarray/pull/1819#issuecomment-358831413 https://api.github.com/repos/pydata/xarray/issues/1819 MDEyOklzc3VlQ29tbWVudDM1ODgzMTQxMw== Zac-HD 12229877 2018-01-19T00:48:21Z 2018-01-19T00:48:21Z CONTRIBUTOR

Ping @jhamman?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Normalisation for RGB imshow 287747803
358511917 https://github.com/pydata/xarray/pull/1819#issuecomment-358511917 https://api.github.com/repos/pydata/xarray/issues/1819 MDEyOklzc3VlQ29tbWVudDM1ODUxMTkxNw== Zac-HD 12229877 2018-01-18T01:54:35Z 2018-01-18T01:54:35Z CONTRIBUTOR

Method extracted, default for missing bounds fixed, comment added, import moved. Also added a check that you haven't accidentally reversed the bounds while supplying only one.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Normalisation for RGB imshow 287747803
358167385 https://github.com/pydata/xarray/pull/1819#issuecomment-358167385 https://api.github.com/repos/pydata/xarray/issues/1819 MDEyOklzc3VlQ29tbWVudDM1ODE2NzM4NQ== Zac-HD 12229877 2018-01-17T01:44:45Z 2018-01-17T01:44:45Z CONTRIBUTOR

Ping @shoyer, @jhamman - I think we're out of blockers and ready to merge 😄

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Normalisation for RGB imshow 287747803
357812702 https://github.com/pydata/xarray/issues/1829#issuecomment-357812702 https://api.github.com/repos/pydata/xarray/issues/1829 MDEyOklzc3VlQ29tbWVudDM1NzgxMjcwMg== Zac-HD 12229877 2018-01-15T23:35:22Z 2018-01-15T23:35:22Z CONTRIBUTOR

Related: when does Xarray plan to drop Python 2? IMO we should at least join python3statement.org and drop it by 2020, and clearly document the timeline whatever it happens to be.

I'd be keen for an earlier date TBH - keyword-only arguments are great, compatibility shims kinda suck, and dependencies are moving to py3-only at an increasing rate (including matplotlib 3.0, scheduled for July) - but can see the other argument too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Drop support for Python 3.4 288465429
357550651 https://github.com/pydata/xarray/pull/1819#issuecomment-357550651 https://api.github.com/repos/pydata/xarray/issues/1819 MDEyOklzc3VlQ29tbWVudDM1NzU1MDY1MQ== Zac-HD 12229877 2018-01-14T23:08:16Z 2018-01-14T23:08:16Z CONTRIBUTOR

@jhamman - all done 😄

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Normalisation for RGB imshow 287747803
357494081 https://github.com/pydata/xarray/pull/1824#issuecomment-357494081 https://api.github.com/repos/pydata/xarray/issues/1824 MDEyOklzc3VlQ29tbWVudDM1NzQ5NDA4MQ== Zac-HD 12229877 2018-01-14T07:37:53Z 2018-01-14T07:37:53Z CONTRIBUTOR

@shoyer; done 🎉

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Make `flake8 xarray` pass 288322322
356903275 https://github.com/pydata/xarray/pull/1819#issuecomment-356903275 https://api.github.com/repos/pydata/xarray/issues/1819 MDEyOklzc3VlQ29tbWVudDM1NjkwMzI3NQ== Zac-HD 12229877 2018-01-11T11:10:22Z 2018-01-11T11:10:22Z CONTRIBUTOR

CC @shoyer - hopefully I'm in time for Xarray 0.10.1 😉

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Normalisation for RGB imshow 287747803
356498223 https://github.com/pydata/xarray/pull/1796#issuecomment-356498223 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1NjQ5ODIyMw== Zac-HD 12229877 2018-01-10T04:42:44Z 2018-01-10T13:55:50Z CONTRIBUTOR

@shoyer makes some good points about things that should be fixed upstream in Matplotlib - namely normalization of RGB images, but I'm also going to move my color fixes (clipping instead of modulo) upstream instead. Timeline:

  1. I edit this PR, it gets merged, Xarray 10.1 (ETA soonish?) can display RGB images just like Matplotlib
  2. I finish my upstream PR, Matplotlib 2.2 (ETA later this month (!)) supports normalization of RGB images and fixes the modulo/clip bug
  3. Once the matplotlib API is merged, I make a new PR against Xarray to support the same interface (including on older matplotlib versions)
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
356519422 https://github.com/pydata/xarray/pull/1796#issuecomment-356519422 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1NjUxOTQyMg== Zac-HD 12229877 2018-01-10T07:14:16Z 2018-01-10T07:14:16Z CONTRIBUTOR

Looks like TestDatetimePlot.test_datetime_line_plot is failing, but I'm pretty sure that's nothing to do with me (and it passed twice on Travis!).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
356272643 https://github.com/pydata/xarray/pull/1796#issuecomment-356272643 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1NjI3MjY0Mw== Zac-HD 12229877 2018-01-09T12:38:28Z 2018-01-09T12:38:28Z CONTRIBUTOR

Okay... I've now seen three Travis runs. In every one there's been a pass, a fail, and three errors... but different jobs each time. At this point I'm ready to give up and wait for the Travis team to fix it 😕

The code is ready to go though! 🎉

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
356223193 https://github.com/pydata/xarray/pull/1796#issuecomment-356223193 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1NjIyMzE5Mw== Zac-HD 12229877 2018-01-09T09:07:58Z 2018-01-09T09:07:58Z CONTRIBUTOR

More review fixes 😄

I'm pretty sure the build failure is just Travis being flaky - it passes on my machine. Could someone restart it?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
355815405 https://github.com/pydata/xarray/pull/1787#issuecomment-355815405 https://api.github.com/repos/pydata/xarray/issues/1787 MDEyOklzc3VlQ29tbWVudDM1NTgxNTQwNQ== Zac-HD 12229877 2018-01-07T11:21:25Z 2018-01-07T11:21:25Z CONTRIBUTOR
  • Including units in labels more generally sounds good to me, but I'll hold off until the general concept is approved 😉
  • I don't think there's an objectively correct way to label units, but parentheses seem to be fairly standard and least prone to misinterpretation.
  • Happy to add tests, though again I'll wait for in-principle approval. Adding a units attribute to some of the test data is easy enough!
  • units is the convention, and if this is unsatisfactory it's quite trivial to override already - I'd hold off on custom options until someone has an actual use-case for the feature.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Include units (if set) in plot labels 282369945
355814550 https://github.com/pydata/xarray/issues/37#issuecomment-355814550 https://api.github.com/repos/pydata/xarray/issues/37 MDEyOklzc3VlQ29tbWVudDM1NTgxNDU1MA== Zac-HD 12229877 2018-01-07T11:03:37Z 2018-01-07T11:03:37Z CONTRIBUTOR

Closed by #1750?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Functions for converting DataArrays to and from iris.Cubes 28596269
355743363 https://github.com/pydata/xarray/pull/1796#issuecomment-355743363 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1NTc0MzM2Mw== Zac-HD 12229877 2018-01-06T12:18:56Z 2018-01-06T13:33:48Z CONTRIBUTOR

OK, @shoyer & @fmaussion - I think I'm done again!

The only thing I haven't done is change the handling of robust=True, as anything beyond "scale and clip 2nd-to-98th percentiles to [0, 1]" seems fraught with complications. If this is not what the user intended, it's at least the obvious default behaviour and should be simple to understand.

Hopefully this covers the substantial changes; but let me know if there's anything else as I'd really like this merged before the 0.10.1 release - and ideally released in time for demos at my summer school in January 😄

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
355745453 https://github.com/pydata/xarray/pull/1787#issuecomment-355745453 https://api.github.com/repos/pydata/xarray/issues/1787 MDEyOklzc3VlQ29tbWVudDM1NTc0NTQ1Mw== Zac-HD 12229877 2018-01-06T12:59:31Z 2018-01-06T12:59:31Z CONTRIBUTOR

Ping @fmaussion for review? (the build error is Travis download trouble, not failing tests)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Include units (if set) in plot labels 282369945
353773905 https://github.com/pydata/xarray/pull/1796#issuecomment-353773905 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1Mzc3MzkwNQ== Zac-HD 12229877 2017-12-24T09:08:10Z 2018-01-06T12:06:44Z CONTRIBUTOR

Thanks for the review @shoyer! I'll do what I can in the next few days, but that might not be much at all before I get back from a no-internet camping trip around Jan 8th. Items:

  • [x] add validation and tests for rgb argument
  • [x] don't mutate the input data! (I think it doesn't, but obviously this needs to be checked)
  • [x] simplify order heuristic, to "use last dimension of size 3 or 4 as rgb; warn if there are multiple possible"
  • [x] update rasterio example in gallery
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
353351362 https://github.com/pydata/xarray/pull/1796#issuecomment-353351362 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1MzM1MTM2Mg== Zac-HD 12229877 2017-12-21T13:31:28Z 2017-12-21T13:31:28Z CONTRIBUTOR

Done - now with optional argument and nicer commit history, Anything else?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
353260924 https://github.com/pydata/xarray/pull/1796#issuecomment-353260924 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1MzI2MDkyNA== Zac-HD 12229877 2017-12-21T05:21:41Z 2017-12-21T05:21:41Z CONTRIBUTOR

Optional argument coming up then 😄

Anything else I need to do?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
353225923 https://github.com/pydata/xarray/pull/1796#issuecomment-353225923 https://api.github.com/repos/pydata/xarray/issues/1796 MDEyOklzc3VlQ29tbWVudDM1MzIyNTkyMw== Zac-HD 12229877 2017-12-21T00:51:35Z 2017-12-21T00:53:02Z CONTRIBUTOR

I like this. My main concern is that I'd like a fully explicit way to opt into this behavior. Perhaps rgb='band' and/or rgba='band'?

I have a pretty strong preference for RGB images 'just working' - requiring a new argument would mean imshow has a different signature to all other plot methods without adding any information. If it's an optional argument, it barely does anything - in the rare case where the heuristic fails, you can supply x and y dims instead.

In many ways showing an RGB image is a special case, but this is the least-special-case I could make work - and it's consistent with the obvious meaning and upstream behavior of imshow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support RGB[A] arrays in plot.imshow() 283566613
351963684 https://github.com/pydata/xarray/pull/1787#issuecomment-351963684 https://api.github.com/repos/pydata/xarray/issues/1787 MDEyOklzc3VlQ29tbWVudDM1MTk2MzY4NA== Zac-HD 12229877 2017-12-15T09:56:32Z 2017-12-15T22:23:59Z CONTRIBUTOR

The exact key units is required by the CF Conventions and most similar conventions, and very common in other landscape and hydrology data I work with.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Include units (if set) in plot labels 282369945
352110041 https://github.com/pydata/xarray/pull/1782#issuecomment-352110041 https://api.github.com/repos/pydata/xarray/issues/1782 MDEyOklzc3VlQ29tbWVudDM1MjExMDA0MQ== Zac-HD 12229877 2017-12-15T20:57:36Z 2017-12-15T20:57:36Z CONTRIBUTOR

No problem! Do you have any idea when this might be released? I can install from git in my own environment, but I probably can't convince GA to do that in their production environment 😄

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Plot nans 282087995
351701244 https://github.com/pydata/xarray/issues/1780#issuecomment-351701244 https://api.github.com/repos/pydata/xarray/issues/1780 MDEyOklzc3VlQ29tbWVudDM1MTcwMTI0NA== Zac-HD 12229877 2017-12-14T12:46:08Z 2017-12-14T12:46:16Z CONTRIBUTOR

I investigated a bit and it seems that in the case of all nan data you would probably have to force add_colorbar to False and do other things for matplotlib to accept your all nan data.

I don't need to do this, but I'd like to for another feature (imshow RGB images) - the question is how! Passing add_colorbar=False to plot.imshow just gives me errors, and I can't find anywhere in the internals where this would be valid either - it always gets passed through to the matplotlib Artist and then an exception. Any ideas?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.plot raises exception if contents are all NaN 282000017
351675151 https://github.com/pydata/xarray/issues/1780#issuecomment-351675151 https://api.github.com/repos/pydata/xarray/issues/1780 MDEyOklzc3VlQ29tbWVudDM1MTY3NTE1MQ== Zac-HD 12229877 2017-12-14T10:45:31Z 2017-12-14T10:45:31Z CONTRIBUTOR

I'd argue that Xarray should handle these cases - it already does for lower dimensions, eg xr.DataArray(np.full(1, np.nan)).plot() works, and this would be really nice for exploratory analysis as well as consistency.

This is sufficiently annoying to my research group that I'm willing to write the patch, if that helps! I also have some ideas for tests using Hypothesis to ferret out some other problems - for example there's a similar failure if plotting an array with size one in some dimension.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  DataArray.plot raises exception if contents are all NaN 282000017
351649842 https://github.com/pydata/xarray/issues/1604#issuecomment-351649842 https://api.github.com/repos/pydata/xarray/issues/1604 MDEyOklzc3VlQ29tbWVudDM1MTY0OTg0Mg== Zac-HD 12229877 2017-12-14T09:03:58Z 2017-12-14T09:03:58Z CONTRIBUTOR

Closed by #1496, I think.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Where functionality in xarray including else case (dask compability) 262696381
338564810 https://github.com/pydata/xarray/issues/1405#issuecomment-338564810 https://api.github.com/repos/pydata/xarray/issues/1405 MDEyOklzc3VlQ29tbWVudDMzODU2NDgxMA== Zac-HD 12229877 2017-10-23T06:59:06Z 2017-10-23T06:59:14Z CONTRIBUTOR

I think this was closed by #1473.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Using uint64 for Dataset indexing gives ValueError 228023777
338557287 https://github.com/pydata/xarray/issues/1541#issuecomment-338557287 https://api.github.com/repos/pydata/xarray/issues/1541 MDEyOklzc3VlQ29tbWVudDMzODU1NzI4Nw== Zac-HD 12229877 2017-10-23T06:12:40Z 2017-10-23T06:43:34Z CONTRIBUTOR

I'll take this one on 😄 I've also spotted a few things in the sphinx config that could be updated and will keep an eye out for others.

  • adding Numba to the intersphinx mapping
  • pointing at python /3/ instead of /3.5/ docs
  • using ISO8601 date formats
  • keeping the copyright year up to date
  • removing warning filter on seaborn import
  • remove IPython image location workaround
  • loop over modules to import rather than the unrolled style
{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Need small updates of docs 254368462
309353545 https://github.com/pydata/xarray/issues/1440#issuecomment-309353545 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwOTM1MzU0NQ== Zac-HD 12229877 2017-06-19T06:50:57Z 2017-07-14T02:35:04Z CONTRIBUTOR

I've just had a meeting at NCI which has helped clarify what I'm trying to do and how to tell if it's working. This comment is mostly for my own notes, and public for anyone interested. I'll refer to dask chunks as 'blocks' (as in 'blocked algorithms'), and netcdf chunks in a file as 'chunks', to avoid confusion)

The approximate algorithm I'm thinking about is outlined in this comment above. Considerations, in rough order of performance impact, are:

  • No block should include data from multiple files (near-absolute, due to locking - though concurrent read is supported on lower levels?)
  • Contiguous access is critical in un-chunked files - reading the fastest-changing dimension in small parts murders IO performance. It may be important in chunked files too, at the level of chunks, but that's mostly down to Dask and user access patterns.
  • Each chunk, if the file is chunked, must fall entirely into a single block.
  • Blocks should usually be around 100MB, to balance scheduler overhead with memory usage of intermediate results.
  • Blocks should be of even(ish) size - if a dimension is of size 100 with chunks of 30, better to have blocks of 60 at the expense of relative shape than have blocks of 90 with one almost empty.
  • Chunks are cached by underlying libraries, so benchmarks need to clear the caches each run for valid results. Note that this affects IO but typically not decompression of chunks.
  • Contra contiguous access (above), it might be good to prevent very skewed block shapes (ie 1*X*Y or T*1*1). Possibly limiting the lowest edge length of a block (10px?), or limiting the edge ratio of a block (20:1?)
  • If a dimension hint is given (eg 'my analysis will be spatial'), making blocks larger along other dimensions will make whole-file processing more efficient, and subsetting less efficient. (Unless Dask can optimise away loading of part of a block?)

Bottom line, I could come up with something pretty quickly but would perfer to take a little longer to write and explore some benchmarks.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
308926818 https://github.com/pydata/xarray/pull/1457#issuecomment-308926818 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMwODkyNjgxOA== Zac-HD 12229877 2017-06-16T03:57:57Z 2017-06-16T03:57:57Z CONTRIBUTOR

The tests for Hypothesis take almost twice as long to run on Travis at certain times of day, so I certainly wouldn't use it for benchmarking anything!

Also concerned that a dedicated benchmarking machine may lead to software (accidentally!) optimized for a particular architecture or balance of machine resources without due consideration. Maybe @wesm could investigate fault injection to (eg) slow down disk access or add latency for some sets of benchmarks?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
308923548 https://github.com/pydata/xarray/pull/1457#issuecomment-308923548 https://api.github.com/repos/pydata/xarray/issues/1457 MDEyOklzc3VlQ29tbWVudDMwODkyMzU0OA== Zac-HD 12229877 2017-06-16T03:29:12Z 2017-06-16T03:29:12Z CONTRIBUTOR

I like the idea of benchmarks, but have some serious concerns. For Dask and IO-bound work in general, benchmark results will vary widely depending on the hardware and (if relevant) network properties. Results will be noncomparable between SSD and HDD, local and remote network access, and in general depend heavily on the specific IO patterns and storage/compute relationship of the computer.

This isn't a reason not to benchmark though, just a call for very cautious interpretation - it's clearly useful to catch some of the subtle-but-pathological performance problems that have cropped up. In short, I think benchmarks should have a very clear warnings section in the documentation, and no decision should be taken to change code without benchmarking on a variety of computers (SSD/HDD, PC/cluster, local/remote data...).

Also JSON cannot include comments, and there are a number of entries that you need to update, but that's a passing concern.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Feature/benchmark 236347050
307074048 https://github.com/pydata/xarray/issues/1440#issuecomment-307074048 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNzA3NDA0OA== Zac-HD 12229877 2017-06-08T11:16:30Z 2017-06-08T11:16:30Z CONTRIBUTOR

🎉

My view is actually that anyone who can beat the default heuristic should just specify their chunks - you'd already need a good sense for the data and your computation (and the heuristic!). IMO, the few cases where tuning is desirable - but manual chunks are impractical - don't justify adding yet another kwarg to the fairly busy interfaces.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
307002325 https://github.com/pydata/xarray/issues/1440#issuecomment-307002325 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNzAwMjMyNQ== Zac-HD 12229877 2017-06-08T05:28:04Z 2017-06-08T05:28:04Z CONTRIBUTOR

I love a real-world example 😄 This sounds pretty similar to how I'm thinking of doing it, with a few caveats - mostly that cate assumes that the data is 3D, has lat and lon, single time step, spatial dimensions wholly divisible some small N. Obviously this is fine for CCI data, but not generally true of things Xarray might open.

Taking a step back for a moment, chunks are great for avoiding out-of-memory errors, faster processing of reorderable operations, and efficient indexing. The overhead is not great when data is small or chunks are small, it's bad when a single on-disk chunk is on multiple dask chunks, and very bad when a dask chunk includes several files. (of course all of these are generalisations with pathological cases, but IMO good enough to build some heuristics on)

With that in mind, here's how I'd decide whether to use the heuristic:

  • If chunks is a dict, never use heuristic (always use explicit user chunks)
  • If chunks is a hint, eg set or list as discussed above, or a later proposal, always use heuristic mode - guided by the hint, of course. Files which may otherwise default to non-heuristic or non-chunking mode (eg in mfdataset) could use eg. the empty set to activate the heuristics without hints.
  • If chunks is None, and the uncompressed data is above a size threshold (eg 500MB, 1GB), use chunks given by the heuristic

Having decided to use a heuristic, we know the array shape and dimensions, the chunk shape if any, and the hint if any:

  • Start by selecting a maximum nbytes for the chunk, eg 250 MB
  • If the total array nbytes is <= max_nbytes, use a single chunk for the whole thing; return
  • If the array is stored as (a, b, ...) chunks on disk, our dask chunks must be (m.a, n.b, ...), ie each dimension has some independent integer multiple.
  • Loop over dimensions not to chunk (per above, either those not in the set, or those in a string), adding one to the respective multiplier. Alternatively if the file is now five or less dask chunks across, simply divide the on-disk chunks into 4, 3, 2, or 1 dask chunks (avoiding a dimension of 1.1 chunks, etc)
  • For each step, if this would make the chunk larger than max_nbytes, return.
  • Repeat the loop for remaining dimensions.
  • If the array is not chunked on disk, increase a divisor for each dimension to chunk until the chunk size is <= max_nbytes and return.

It's probably a good idea to constrain this further, so that the ratio of chunk edge length along dimensions should not exceed the greater of 100:1 or four times the ratio of chunks on disk (I don't have universal profiling to back this up, but it's always worked well for me). This will mitigate the potentially-very-large effects of dimension order, especially in unchunked files or large chunks.

For datasets (as opposed to arrays), I'd calculate chunks once for the largest dtype and just reuse that shape.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
306688091 https://github.com/pydata/xarray/issues/1440#issuecomment-306688091 https://api.github.com/repos/pydata/xarray/issues/1440 MDEyOklzc3VlQ29tbWVudDMwNjY4ODA5MQ== Zac-HD 12229877 2017-06-07T05:09:06Z 2017-06-07T05:09:06Z CONTRIBUTOR

I'd certainly support a warning when dask chunks do not align with the on-disk chunks.

This sounds like a very good idea to me 👍

I think its unavoidable that users understand how their data will be processed (e.g., whether operations will be mapped over time or space). But maybe some sort of heuristics (if not a fully automated solution) are possible.

I think that depends on the size of the data - a very common workflow in our group is to open some national-scale collection, select a small (MB to low GB) section, and proceed with that. At this scale we only use chunks because many of the input files are larger than memory, and shape is basically irrelevant - chunks avoid loading anything until after selecting the subset (I think this is related to #1396).

It's certainly good to know the main processing dimensions though, and user-guided chunk selection heuristics could take us a long way - I actually think a dimension hint and good heuristics are likely to perform better than most users (who are not experts and have not profiled their performance).

The set notation is also very elegant, but I wonder about the interpretation. With chunks=, I specify how to break up the data - and any omitted dimensions are not chunked. For the hint, I'd expect to express which dimension(s) to keep - ie {'lat', lon'} should indicate that my analysis is mostly spatial, rather than mostly not. Maybe we could use a string (eg time for timeseries or lat lon for spatial) instead of a set to specify large chunk dimensions?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  If a NetCDF file is chunked on disk, open it with compatible dask chunks 233350060
303005011 https://github.com/pydata/xarray/pull/1260#issuecomment-303005011 https://api.github.com/repos/pydata/xarray/issues/1260 MDEyOklzc3VlQ29tbWVudDMwMzAwNTAxMQ== Zac-HD 12229877 2017-05-22T05:49:37Z 2017-05-22T05:49:37Z CONTRIBUTOR

I would also favour doing nothing (ie 3), because most users will already have some solution. It's also easier to change later if we don't do anything now - no need to think at all about backwards compatibility, and the design can be guided by how people are using the existing parts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add RasterIO backend 206905158
290390365 https://github.com/pydata/xarray/pull/1322#issuecomment-290390365 https://api.github.com/repos/pydata/xarray/issues/1322 MDEyOklzc3VlQ29tbWVudDI5MDM5MDM2NQ== Zac-HD 12229877 2017-03-30T12:03:28Z 2017-03-30T12:03:28Z CONTRIBUTOR

@shoyer - my preference is to process and truncate all strings, including object reprs. If you want to overrule that feel free; otherwise I think it's ready to merge.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Shorter repr for attributes 216611104
289344566 https://github.com/pydata/xarray/pull/1322#issuecomment-289344566 https://api.github.com/repos/pydata/xarray/issues/1322 MDEyOklzc3VlQ29tbWVudDI4OTM0NDU2Ng== Zac-HD 12229877 2017-03-27T03:18:29Z 2017-03-27T03:18:29Z CONTRIBUTOR

Ping @shoyer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Shorter repr for attributes 216611104
289185538 https://github.com/pydata/xarray/pull/1322#issuecomment-289185538 https://api.github.com/repos/pydata/xarray/issues/1322 MDEyOklzc3VlQ29tbWVudDI4OTE4NTUzOA== Zac-HD 12229877 2017-03-25T03:30:38Z 2017-03-25T03:30:38Z CONTRIBUTOR

If the logic is "clever", time to ditch it. I've rebased to a clearer version which simply handles indentation and replaces tabs and newlines with their backslash representations.

What properties of this would you want to test?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Shorter repr for attributes 216611104
288898100 https://github.com/pydata/xarray/issues/1319#issuecomment-288898100 https://api.github.com/repos/pydata/xarray/issues/1319 MDEyOklzc3VlQ29tbWVudDI4ODg5ODEwMA== Zac-HD 12229877 2017-03-24T00:15:06Z 2017-03-24T00:15:06Z CONTRIBUTOR

Sure, I'd be happy to. The above example will look much nicer, especially in wrapping environments:

<xarray.Dataset> Dimensions: (time: 246, x: 4000, y: 4000) Coordinates: * y (y) float64 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 -3.9e+06 ... * x (x) float64 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 1.5e+06 ... * time (time) datetime64[ns] 1999-07-16T23:49:39 1999-07-25T23:43:07 ... Data variables: crs int32 ... blue (time, y, x) float64 ... green (time, y, x) float64 ... red (time, y, x) float64 ... nir (time, y, x) float64 ... swir1 (time, y, x) float64 ... swir2 (time, y, x) float64 ... Attributes: date_created: 2017-03-07T11:57:26.511217 Conventions: CF-1.6, ACDD-1.3 history: 2017-03-07T11:57:26.511307+11:00 adh547 datacube... geospatial_bounds: POLYGON ((148.49626113888138 -34.828378308133452... geospatial_bounds_crs: EPSG:4326 geospatial_lat_min: -35.7203183267 geospatial_lat_max: -34.7089119078 geospatial_lat_units: degrees_north geospatial_lon_min: 148.496261139 geospatial_lon_max: 149.734176111 geospatial_lon_units: degrees_east comment: - Ground Control Points (GCP): new GCP chips ... product_suite: Surface Reflectance NBAR+T 25m publisher_email: earth.observation@ga.gov.au keywords_vocabulary: GCMD product_version: 2 cdm_data_type: Grid references: - Berk, A., Anderson, G.P., Acharya, P.K., Ho... platform: LANDSAT-7 keywords: AU/GA,NASA/GSFC/SED/ESD/LANDSAT,REFLECTANCE,ETM+... publisher_name: Section Leader, Operations Section, NEMO, Geosci... institution: Commonwealth of Australia (Geoscience Australia) acknowledgment: Landsat data is provided by the United States Ge... license: CC BY Attribution 4.0 International License title: Surface Reflectance NBAR+T 25 v2 summary: Surface Reflectance (SR) is a suite of Earth Obs... instrument: ETM source: LANDSAT 7 ETM+ surface observation publisher_url: http://www.ga.gov.au

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Truncate long lines in repr of Dataset.attrs 216329175

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 33.156ms · About: xarray-datasette