home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

125 rows where repo = 13221727, state = "closed" and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), closed_at (date)

type 2

  • pull 75
  • issue 50

state 1

  • closed · 125 ✖

repo 1

  • xarray · 125 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2161133346 PR_kwDOAMm_X85oSZw7 8797 tokenize() should ignore difference between None and {} attrs crusaderky 6213168 closed 0 crusaderky 6213168   1 2024-02-29T12:22:24Z 2024-03-01T11:15:30Z 2024-03-01T03:29:51Z MEMBER   0 pydata/xarray/pulls/8797
  • Closes #8788
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8797/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2088095900 PR_kwDOAMm_X85kaiOH 8618 Re-enable mypy checks for parse_dims unit tests crusaderky 6213168 closed 0 crusaderky 6213168   1 2024-01-18T11:32:28Z 2024-01-19T15:49:33Z 2024-01-18T15:34:23Z MEMBER   0 pydata/xarray/pulls/8618

As per https://github.com/pydata/xarray/pull/8606#discussion_r1452680454

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8618/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2079054085 PR_kwDOAMm_X85j77Os 8606 Clean up Dims type annotation crusaderky 6213168 closed 0 crusaderky 6213168   1 2024-01-12T15:05:40Z 2024-01-18T18:14:15Z 2024-01-16T10:26:08Z MEMBER   0 pydata/xarray/pulls/8606  
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8606/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1678587031 I_kwDOAMm_X85kDTSX 7777 xarray minimum versions policy is more aggressive than NEP-29 crusaderky 6213168 closed 0     1 2023-04-21T14:06:15Z 2023-05-01T22:26:57Z 2023-05-01T22:26:57Z MEMBER      

What is your issue?

In #4179 / #4907, the xarray policy around minimum supported version of dependencies was changed, with the reasoning that the previous policy (based on NEP-29) was too aggressive. Ironically, this caused xarray to drop Python 3.8 on Jan 26th (#7461), 3 months before what NEP-29 recommends (Apr 14th). This is hard to defend - and in fact it sparked discontent (see late comments in #7461).

Regardless of what policy xarray decides to use internally, it should never be more aggressive than NEP-29. The xarray documentation is also incorrect, as it states "Python: 24 months (NEP-29)" which is not, in fact, in NEP-29.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7777/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1683335751 PR_kwDOAMm_X85PHLmT 7785 Remove pandas<2 pin crusaderky 6213168 closed 0     1 2023-04-25T14:55:12Z 2023-04-26T17:51:53Z 2023-04-25T15:03:10Z MEMBER   0 pydata/xarray/pulls/7785

XREF #7650

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7785/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1140046499 PR_kwDOAMm_X84y7YhY 6282 Remove xfail from tests decorated by @gen_cluster crusaderky 6213168 closed 0     1 2022-02-16T13:47:56Z 2023-04-25T14:53:35Z 2022-02-16T16:32:35Z MEMBER   0 pydata/xarray/pulls/6282

@gen_cluster has now been fixed upstream.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6282/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
309691307 MDU6SXNzdWUzMDk2OTEzMDc= 2028 slice using non-index coordinates crusaderky 6213168 closed 0     21 2018-03-29T09:53:33Z 2023-02-08T19:47:22Z 2022-10-03T10:38:57Z MEMBER      

It should be relatively straightforward to allow slicing on coordinates that are not backed by an IndexVariable, or in other words coordinates that are on a dimension with a different name, as long as they are 1-dimensional (unsure about the multidimensional case).

E.g. given this array: ``` a = xarray.DataArray( [10, 20, 30], dims=['country'], coords={ 'country': ['US', 'Germany', 'France'], 'currency': ('country', ['USD', 'EUR', 'EUR']) })

<xarray.DataArray (country: 3)> array([10, 20, 30]) Coordinates: * country (country) <U7 'US' 'Germany' 'France' currency (country) <U3 'USD' 'EUR' 'EUR' ```

This is currently not possible: ``` a.sel(currency='EUR')

ValueError: dimensions or multi-index levels ['currency'] do not exist ```

It should be interpreted as a shorthand for: ``` a.sel(country=a.currency == 'EUR')

<xarray.DataArray (country: 2)> array([20, 30]) Coordinates: * country (country) <U7 'Germany' 'France' currency (country) <U3 'EUR' 'EUR' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2028/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
166441031 MDU6SXNzdWUxNjY0NDEwMzE= 907 unstack() treats string coords as objects crusaderky 6213168 closed 0     7 2016-07-19T21:33:28Z 2022-09-27T12:11:36Z 2022-09-27T12:11:35Z MEMBER      

unstack() should be smart enough to recognise that all labels in a coord are strings, and convert them to numpy strings. This is particularly relevant e.g. if you want to dump the xarray to netcdf and then read it with a non-python library.

``` python import xarray

a = xarray.DataArray([[1,2],[3,4]], dims=['x', 'y'], coords={'x': ['x1', 'x2'], 'y': ['y1', 'y2']}) a ```

<xarray.DataArray (x: 2, y: 2)> array([[1, 2], [3, 4]]) Coordinates: * y (y) <U2 'y1' 'y2' * x (x) <U2 'x1' 'x2'

python a.stack(s=['x', 'y']).unstack('s')

<xarray.DataArray (x: 2, y: 2)> array([[1, 2], [3, 4]]) Coordinates: * x (x) object 'x1' 'x2' * y (y) object 'y1' 'y2'

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/907/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
264509098 MDU6SXNzdWUyNjQ1MDkwOTg= 1624 Improve documentation and error validation for set_options(arithmetic_join) crusaderky 6213168 closed 0     7 2017-10-11T09:05:49Z 2022-06-25T20:01:07Z 2022-06-25T20:01:07Z MEMBER      

The documentation for set_options laconically says:

arithmetic_join: DataArray/Dataset alignment in binary operations. Default: 'inner'.

leaving the user wonder what the other options are. Also, the set_options code does not make any kind of domain check on the possible values. By scanning the code I gathered that the valid values (and their meanings) should be the same as align(join=...), but I'd like confirmation on that...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1624/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
502130982 MDU6SXNzdWU1MDIxMzA5ODI= 3370 Hundreds of Sphinx errors crusaderky 6213168 closed 0     14 2019-10-03T15:17:09Z 2022-04-17T20:33:05Z 2022-04-17T20:33:05Z MEMBER      

sphinx-build emits a ton of errors that need to be polished out:

https://readthedocs.org/projects/xray/builds/ -> latest -> open last step

Options for the long term: - Change the "Docs" azure pipelines job to crash if there are new failures. From past experience though, this should come together with a sensible way to whitelist errors that can't be fixed. This will severely slow down development as PRs will systematically fail on such a check. - Add a task in the release process where, immediately before closing a release, the maintainer needs to manually go through the sphinx-build log and fix any new issues. This would be a major extra piece of work for the maintainer.

I am honestly not excited by either of the above. Alternative suggestions are welcome.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3370/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
505550120 MDU6SXNzdWU1MDU1NTAxMjA= 3391 map_blocks doesn't work when dask isn't installed crusaderky 6213168 closed 0     1 2019-10-10T22:53:55Z 2021-11-24T17:25:24Z 2021-11-24T17:25:24Z MEMBER      

Iterative improvement on #3276 @dcherian

map_blocks crashes with ImportError if dask isn't installed, even if it's legal to run it on a DataArray/Dataset without any dask variables. This forces writers of extension libraries to either not use map_blocks, add dask as a strict requirement, or write a switch in their own code.

Please change the code so that it works without dask (you'll need to write a stub of dask.is_dask_collection that always returns False) and add relevant tests to be triggered in our py36-bare-minimum CI environment.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3391/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
980223048 MDExOlB1bGxSZXF1ZXN0NzIwNTAxNTkz 5740 Remove ad-hoc handling of NEP18 libraries in CI crusaderky 6213168 closed 0     1 2021-08-26T13:04:36Z 2021-09-04T10:53:39Z 2021-08-31T10:14:35Z MEMBER   0 pydata/xarray/pulls/5740

sparse and pint are mature enough that it is no longer necessary to have a separate CI environment for them.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5740/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
945560052 MDExOlB1bGxSZXF1ZXN0NjkwODcyNTk1 5610 Fix gen_cluster failures; dask_version tweaks crusaderky 6213168 closed 0     5 2021-07-15T16:26:21Z 2021-07-15T18:04:00Z 2021-07-15T17:25:43Z MEMBER   0 pydata/xarray/pulls/5610
  • fixes one of the issues reported in #5600
  • distributed.utils_test.gen_cluster no longer accepts timeout=None for the sake of robustness
  • deleted ancient dask backwards compatibility code
  • clean up code around dask.__version__
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5610/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
817271773 MDExOlB1bGxSZXF1ZXN0NTgwNzkxNTcy 4965 Support for dask.graph_manipulation crusaderky 6213168 closed 0     1 2021-02-26T11:19:09Z 2021-03-05T09:24:17Z 2021-03-05T09:24:14Z MEMBER   0 pydata/xarray/pulls/4965

Second iteration upon https://github.com/pydata/xarray/pull/4884 CI is currently failing vs. dask git tip because of https://github.com/dask/dask/issues/7263 (unrelated to this PR)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4965/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
804694945 MDExOlB1bGxSZXF1ZXN0NTcwNDE5NjIz 4884 Compatibility with dask 2021.02.0 crusaderky 6213168 closed 0     0 2021-02-09T16:12:02Z 2021-02-11T18:33:03Z 2021-02-11T18:32:59Z MEMBER   0 pydata/xarray/pulls/4884

Closes #4860 Reverts #4873

Restore compatibility with dask 2021.02.0 by avoiding improper assumptions on the implementation details of da.Array.__dask_postpersist__().

This PR does not align xarray to the new dask collection spec (https://github.com/dask/dask/issues/7093), as I just realized that Datasets violate the rule of having all dask keys with the same name if they contain more than one dask variable - and cannot do otherwise. So I have to change the dask collection spec again to accommodate them.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4884/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
671216158 MDExOlB1bGxSZXF1ZXN0NDYxNDM4MDIz 4297 Lazily load resource files crusaderky 6213168 closed 0 crusaderky 6213168   4 2020-08-01T21:31:36Z 2020-09-22T05:32:38Z 2020-08-02T07:05:15Z MEMBER   0 pydata/xarray/pulls/4297
  • Marginal speed-up and RAM footprint reduction when not running in Jupyter Notebook
  • Closes #4294
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4297/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
671108068 MDExOlB1bGxSZXF1ZXN0NDYxMzM1NDAx 4296 Increase support window of all dependencies crusaderky 6213168 closed 0 crusaderky 6213168   7 2020-08-01T18:55:54Z 2020-08-14T09:52:46Z 2020-08-14T09:52:42Z MEMBER   0 pydata/xarray/pulls/4296

Closes #4295

Increase width of the sliding window for minimum supported version: - setuptools from 6 months sliding window to hardcoded >= 38.4, and to 42 months sliding window starting from July 2021 - dask and distributed from 6 months sliding window to hardcoded >= 2.9, and to 12 months sliding window starting from January 2021 - all other libraries from 6 months to 12 months sliding window

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4296/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
671561223 MDExOlB1bGxSZXF1ZXN0NDYxNzY2OTA1 4299 Support PyCharm deployment over SSH crusaderky 6213168 closed 0     3 2020-08-02T06:19:09Z 2020-08-03T19:41:36Z 2020-08-03T19:41:29Z MEMBER   0 pydata/xarray/pulls/4299

Fix pip install . when no .git directory exists; namely when the xarray source directory has been rsync'ed by PyCharm Professional for a remote deployment over SSH.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4299/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
555752381 MDExOlB1bGxSZXF1ZXN0MzY3NjM3MzUw 3724 setuptools-scm (3) crusaderky 6213168 closed 0     3 2020-01-27T18:26:11Z 2020-02-14T12:07:22Z 2020-01-27T18:51:50Z MEMBER   0 pydata/xarray/pulls/3724

Fix https://github.com/pydata/xarray/pull/3714#issuecomment-578626605 @shoyer I have no way of testing if this fixes github - please see by yourself after merging to master.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3724/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
557020666 MDExOlB1bGxSZXF1ZXN0MzY4Njg4MTAz 3727 Python 3.8 CI crusaderky 6213168 closed 0     6 2020-01-29T17:50:52Z 2020-02-10T09:41:07Z 2020-01-31T15:52:19Z MEMBER   0 pydata/xarray/pulls/3727
  • Run full-fat suite of tests for Python 3.8
  • Move asv, MacOSX tests, readthedocs, binder, and more to Python 3.8
  • Test windows against latest numpy version
  • Windows tests remain on Python 3.7 because of a couple of Python 3.8 tests that fail exclusively in CI. Will investigate later.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3727/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
557012230 MDExOlB1bGxSZXF1ZXN0MzY4NjgxMjgw 3726 Avoid unsafe use of pip crusaderky 6213168 closed 0     3 2020-01-29T17:33:48Z 2020-01-30T12:23:05Z 2020-01-29T23:39:40Z MEMBER   0 pydata/xarray/pulls/3726

Closes #3725

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3726/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
554662467 MDExOlB1bGxSZXF1ZXN0MzY2Nzc1ODIz 3721 Add isort to CI crusaderky 6213168 closed 0     9 2020-01-24T10:41:54Z 2020-01-30T12:22:53Z 2020-01-28T19:41:52Z MEMBER   0 pydata/xarray/pulls/3721
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3721/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
553518018 MDExOlB1bGxSZXF1ZXN0MzY1ODM1MjQ3 3714 setuptools-scm and one-liner setup.py crusaderky 6213168 closed 0     12 2020-01-22T12:46:43Z 2020-01-27T07:42:36Z 2020-01-22T15:40:34Z MEMBER   0 pydata/xarray/pulls/3714
  • Closes #3369
  • Replace versioneer with setuptools-scm
  • Replace setup.py with setup.cfg
  • Drop pytest-runner as instructed by deprecation notice on the project webpage
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3714/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
554647652 MDExOlB1bGxSZXF1ZXN0MzY2NzYzMzQ3 3720 setuptools-scm and isort tweaks crusaderky 6213168 closed 0     2 2020-01-24T10:12:03Z 2020-01-24T15:34:34Z 2020-01-24T15:28:48Z MEMBER   0 pydata/xarray/pulls/3720

Follow-up on https://github.com/pydata/xarray/pull/3714

  • Fix regression in mypy if pip creates a zipped archive
  • Avoid breakage in the extremely unlikely event that setuptools is not installed
  • Guarantee xarray.__version__ to be always PEP440-compatible. This prevents a breakage if you run pandas without xarray installed and with the xarray sources folder in PYTHONPATH.
  • Apply isort to xarray.__init__
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3720/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
502082831 MDU6SXNzdWU1MDIwODI4MzE= 3369 Define a process to test the readthedocs CI before merging into master crusaderky 6213168 closed 0     3 2019-10-03T13:56:02Z 2020-01-22T15:40:34Z 2020-01-22T15:40:33Z MEMBER      

This is an offshoot of #3358.

The readthedocs CI has a bad habit of failing even after the Azure Pipelines job "Docs" has succeeded.

After major changes that impact the documentation, and before merging everything into master, it would be advisable to explicitly verify that RTD builds correctly.

So far I tried to 1. create my own readthedocs project, https://readthedocs.org/projects/crusaderky-xarray/ 2. point it to my fork https://github.com/crusaderky/xarray/ 3. enable build for the branch I want to merge

This is currently failing because of an issue with versioneer, which incorrectly sets xarray.__version__ to 0+untagged.111.g6d60700. This in turn causes a failure in a minimum version check in pandas.DataFrame.to_xarray() on pandas>=0.25.

In the master RTD project https://readthedocs.org/projects/xray/, I can instead read xarray: 0.13.0+20.gdd2b803a.

So far the only workaround I could find was to downgrade pandas to 0.24 in ci/requirements/doc.yml.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3369/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
551532886 MDExOlB1bGxSZXF1ZXN0MzY0MjM4MTM2 3703 hardcoded xarray.__all__ crusaderky 6213168 closed 0     4 2020-01-17T17:09:45Z 2020-01-18T00:58:06Z 2020-01-17T20:42:25Z MEMBER   0 pydata/xarray/pulls/3703

Closes #3695

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3703/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
551544665 MDExOlB1bGxSZXF1ZXN0MzY0MjQ3NjE3 3705 One-off isort run crusaderky 6213168 closed 0     4 2020-01-17T17:36:10Z 2020-01-17T22:59:26Z 2020-01-17T21:00:24Z MEMBER   0 pydata/xarray/pulls/3705
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3705/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
551544199 MDExOlB1bGxSZXF1ZXN0MzY0MjQ3MjQz 3704 Bump mypy to v0.761 crusaderky 6213168 closed 0     1 2020-01-17T17:35:09Z 2020-01-17T22:59:19Z 2020-01-17T18:51:51Z MEMBER   0 pydata/xarray/pulls/3704
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3704/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
522935511 MDExOlB1bGxSZXF1ZXN0MzQxMDM3NTg5 3533 2x~5x speed up for isel() in most cases crusaderky 6213168 closed 0     7 2019-11-14T15:34:24Z 2019-12-05T16:45:40Z 2019-12-05T16:39:40Z MEMBER   0 pydata/xarray/pulls/3533

Yet another major improvement for #2799.

Achieve a 2x to 5x boost in isel performance when slicing small arrays by int, slice, list of int, scalar ndarray, or 1-dimensional ndarray.

```python import xarray

da = xarray.DataArray([[1, 2], [3, 4]], dims=['x', 'y']) v = da.variable a = da.variable.values ds = da.to_dataset(name="d")

ds_with_idx = xarray.Dataset({ 'x': [10, 20], 'y': [100, 200], 'd': (('x', 'y'), [[1, 2], [3, 4]]) }) da_with_idx = ds_with_idx.d

before -> after

%timeit a[0] # 121 ns %timeit v[0] # 7 µs %timeit v.isel(x=0) # 10 µs %timeit da[0] # 65 µs -> 15 µs %timeit da.isel(x=0) # 63 µs -> 13 µs %timeit ds.isel(x=0) # 48 µs -> 24 µs %timeit da_with_idx[0] # 209 µs -> 82 µs %timeit da_with_idx.isel(x=0, drop=False) # 135 µs -> 34 µs %timeit da_with_idx.isel(x=0, drop=True) # 101 µs -> 34 µs %timeit ds_with_idx.isel(x=0, drop=False) # 90 µs -> 49 µs %timeit ds_with_idx.isel(x=0, drop=True) # 65 µs -> 49 µs ```

Marked as WIP because this commands running the asv suite to verify there are no regressions for large arrays. (on a separate note, we really need to add the small size cases to asv - as discussed in #3382).

This profoundly alters one of the most important methods in xarray and I must confess it makes me nervous, particularly as I am unsure if the test coverage of DataArray.isel() is as through as that for Dataset.isel().

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3533/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
525689517 MDExOlB1bGxSZXF1ZXN0MzQzMjYxNDg0 3551 Clarify conda environments for new contributors crusaderky 6213168 closed 0     1 2019-11-20T09:47:15Z 2019-11-20T14:50:48Z 2019-11-20T09:47:57Z MEMBER   0 pydata/xarray/pulls/3551
  • [x] Closes #3549
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3551/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
510915725 MDU6SXNzdWU1MTA5MTU3MjU= 3434 v0.14.1 Release crusaderky 6213168 closed 0     18 2019-10-22T21:08:15Z 2019-11-19T23:44:52Z 2019-11-19T23:44:52Z MEMBER      

I think with the multiple recent breakages we've just had due to dependency upgrades, we should push out a patch release with some haste.

Please comment/add/object

Must have

  • [x] numpy 1.18 support #3409
  • [x] pseudonetcdf 3.1.0 support #3409, #3420
  • [x] require cftime != 1.0.4 #3463
  • [x] groupby reduce regression fix #3403
  • [x] pandas master support #3440

Nice to have

  • [x] ellipsis (...) work #1081, #3414, #3418, #3421, #3423, #3424
  • [x] HTML repr #3425 (really mouth-watering, but I'm unsure about how far it is from completion)
  • [x] groupby drop nan groups #3406
  • [x] deprecate allow_lazy #3435
  • [x] __dask_tokenize__ #3446
  • [x] dask name equality #3453
  • [x] Leave empty slot when not using accessors #3531
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3434/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
523438384 MDExOlB1bGxSZXF1ZXN0MzQxNDQyMTI4 3537 Numpy 1.18 support crusaderky 6213168 closed 0     13 2019-11-15T12:17:32Z 2019-11-19T14:06:50Z 2019-11-19T14:06:46Z MEMBER   0 pydata/xarray/pulls/3537

Fix mean() and nanmean() for datetime64 arrays on numpy backend when upgrading from numpy 1.17 to 1.18. All other nan-reductions on datetime64s were broken before and remain broken. mean() on datetime64 and dask was broken before and remains broken.

  • [x] Closes #3409
  • [x] Passes black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3537/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
522780826 MDExOlB1bGxSZXF1ZXN0MzQwOTEwMjQ3 3531 Leave empty slot when not using accessors crusaderky 6213168 closed 0     1 2019-11-14T10:54:55Z 2019-11-15T17:43:57Z 2019-11-15T17:43:54Z MEMBER   0 pydata/xarray/pulls/3531

Save a few bytes and nanoseconds for the overwhelming majority of the users that don't use accessors. Lay the groundwork for potential future use of @pandas.utils.cache_readonly.

xref https://github.com/pydata/xarray/issues/3514

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3531/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
521842949 MDExOlB1bGxSZXF1ZXN0MzQwMTQ1OTg0 3515 Recursive tokenization crusaderky 6213168 closed 0 crusaderky 6213168   1 2019-11-12T22:35:13Z 2019-11-13T00:54:32Z 2019-11-13T00:53:27Z MEMBER   0 pydata/xarray/pulls/3515

After misreading the dask documentation https://docs.dask.org/en/latest/custom-collections.html#deterministic-hashing, I was under the impression that the output of __dask_tokenize__ would be recursively parsed, like it happens for __getstate__ or __reduce__. That's not the case - the output of __dask_tokenize__ is just fed into a str() function so it has to be made explicitly recursive!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3515/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
329251342 MDU6SXNzdWUzMjkyNTEzNDI= 2214 Simplify graph of DataArray.chunk() crusaderky 6213168 closed 0     2 2018-06-04T23:30:19Z 2019-11-10T04:34:58Z 2019-11-10T04:34:58Z MEMBER      

```

dict(xarray.DataArray([1, 2]).chunk().dask_graph()) { ('xarray-<this-array>-7e885b8e329090da3fe58d4483c0cf8b', 0): (<function dask.array.core.getter(a, b, asarray=True, lock=None)>, 'xarray-<this-array>-7e885b8e329090da3fe58d4483c0cf8b', (slice(0, 2, None),)), 'xarray-<this-array>-7e885b8e329090da3fe58d4483c0cf8b': ImplicitToExplicitIndexingAdapter(array=NumpyIndexingAdapter(array=array([1, 2]))) } There is no reason why this should be any more complicated than da.from_array: dict(da.from_array(np.array([1, 2]), chunks=2).dask_graph()) { ('array-de932becc43e72c010bc91ffefe42af1', 0): (<function dask.array.core.getter(a, b, asarray=True, lock=None)>, 'array-original-de932becc43e72c010bc91ffefe42af1', (slice(0, 2, None),)), 'array-original-de932becc43e72c010bc91ffefe42af1': array([1, 2]) } ``` da.from_array itself should be simplified - see twin issue https://github.com/dask/dask/issues/3556

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2214/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
510527025 MDExOlB1bGxSZXF1ZXN0MzMwODg1Mzk2 3429 minor lint tweaks crusaderky 6213168 closed 0     4 2019-10-22T09:15:03Z 2019-10-24T12:53:24Z 2019-10-24T12:53:21Z MEMBER   0 pydata/xarray/pulls/3429
  • Ran pyflakes 2.1.1
  • Some f-string tweaks
  • Ran black -t py36
  • Ran mypy 0.740. We'll need to skip it and jump directly to 0.750 once it's released because of https://github.com/python/mypy/issues/7735
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3429/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
511869575 MDExOlB1bGxSZXF1ZXN0MzMxOTg2MzUw 3442 pandas-dev workaround crusaderky 6213168 closed 0     0 2019-10-24T10:59:55Z 2019-10-24T11:43:42Z 2019-10-24T11:43:36Z MEMBER   0 pydata/xarray/pulls/3442

Temporary hack around #3440 to get green CI

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3442/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
510974193 MDExOlB1bGxSZXF1ZXN0MzMxMjU4MjQx 3436 MAGA (Make Azure Green Again) crusaderky 6213168 closed 0     3 2019-10-22T22:56:21Z 2019-10-24T09:57:59Z 2019-10-23T01:06:10Z MEMBER   0 pydata/xarray/pulls/3436

Let all CI tests become green again to avoid hindering developers who are working on PRs unrelated to the present incompatibilities (numpy=1.18, cftime=1.0.4, pseudonetcdf=3.1.0).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3436/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
509655174 MDExOlB1bGxSZXF1ZXN0MzMwMTYwMDQy 3420 Restore crashing CI tests on pseudonetcdf-3.1 crusaderky 6213168 closed 0     5 2019-10-20T21:26:40Z 2019-10-21T01:32:54Z 2019-10-20T22:42:36Z MEMBER   0 pydata/xarray/pulls/3420

Related to #3409

The crashes caused by pseudonetcdf-3.1 are blocking all PRs. Sorry I don't know anything about pseudonetcdf. This PR takes the issue out of the critical path so that whoever knows about the library can deal with it in due time.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3420/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
506885041 MDU6SXNzdWU1MDY4ODUwNDE= 3397 "How Do I..." formatting issues crusaderky 6213168 closed 0     4 2019-10-14T21:32:27Z 2019-10-16T21:41:06Z 2019-10-16T21:41:06Z MEMBER      

@dcherian The new page http://xarray.pydata.org/en/stable/howdoi.html (#3357) is somewhat painful to read on readthedocs. The table goes out of the screen and one is forced to scroll left and right non stop.

Maybe a better alternative could be with Sphinx definitions syntax (which allows for automatic reflowing)?

rst How do I ... ============ Add variables from other datasets to my dataset? :py:meth:`Dataset.merge` (that's a 4 spaces indent)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3397/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
506216396 MDExOlB1bGxSZXF1ZXN0MzI3NDg0OTQ2 3395 Annotate LRUCache crusaderky 6213168 closed 0     0 2019-10-12T17:44:43Z 2019-10-12T20:05:36Z 2019-10-12T20:05:33Z MEMBER   0 pydata/xarray/pulls/3395

Very minor type annotations work

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3395/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
503163130 MDExOlB1bGxSZXF1ZXN0MzI1MDc2MzQ5 3375 Speed up isel and __getitem__ crusaderky 6213168 closed 0 crusaderky 6213168   5 2019-10-06T21:27:42Z 2019-10-10T09:21:56Z 2019-10-09T18:01:30Z MEMBER   0 pydata/xarray/pulls/3375

First iterative improvement for #2799.

Speed up Dataset.isel up to 33% and DataArray.isel up to 25% (when there are no indices and the numpy array is small). 15% speedup when there are indices.

Benchmarks can be found in #2799.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3375/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
500582648 MDExOlB1bGxSZXF1ZXN0MzIzMDIwOTY1 3358 Rolling minimum dependency versions policy crusaderky 6213168 closed 0 crusaderky 6213168   24 2019-09-30T23:50:39Z 2019-10-09T02:02:29Z 2019-10-08T21:23:47Z MEMBER   0 pydata/xarray/pulls/3358

Closes #3222 Closes #3293

  • Drop support for Python 3.5
  • Upgrade numpy to 1.14 (24 months old)
  • Upgrade pandas to 0.24 (12 months old)
  • Downgrade scipy to 1.0 (policy allows for 1.2, but it breaks numpy=1.14)
  • Downgrade dask to 1.2 (6 months old)
  • Other upgrades/downgrades to comply with the policy
  • CI tool to verify that the minimum dependencies requirements in CI are compliant with the policy
  • Overhaul CI environment for readthedocs

Out of scope: - Purge away all OrderedDict's

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3358/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
481250429 MDU6SXNzdWU0ODEyNTA0Mjk= 3222 Minimum versions for optional libraries crusaderky 6213168 closed 0     12 2019-08-15T17:18:16Z 2019-10-08T21:23:47Z 2019-10-08T21:23:47Z MEMBER      

In CI there are:

  • tests for all the latest versions of all libraries, mandatory and optional (py36, py37, py37-windows)
  • tests for the minimum versions of the mandatory libraries only (py35-min)

There are no tests for legacy versions of the optional libraries.

Today I tried downgrading dask in the py37 environment to dask=1.1.2, which is 6 months old...

...it's a bloodbath. 383 errors of the most diverse kind.

In the codebase I found mentions to much older minimum versions: installing.rst mentions dask >=0.16.1, and Dataset.chunk() even asks for dask>=0.9.

It think we should add CI tests for old versions of the optional dependencies. What policy should we adopt when we find an incompatibility? How old a library should be not to bother fixing bugs and just require a newer version? I personally would go for an aggressive 6 months worth' of backwards compatibility; less if the time it takes to fix the issues is excessive. The tests should run on py36 because py35 builds are becoming very scarce in anaconda.

This has the outlook of being an exercise in extreme frustration. I'm afraid I personally hold zero interest towards packages older than the latest available in the anaconda official repo, so I'm not volunteering for this one (sorry).

I'd like to hear other people's opinions and/or offers of self-immolation... :)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3222/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
502530652 MDExOlB1bGxSZXF1ZXN0MzI0NTkyODE1 3373 Lint crusaderky 6213168 closed 0     1 2019-10-04T09:29:46Z 2019-10-04T22:18:48Z 2019-10-04T22:17:57Z MEMBER   0 pydata/xarray/pulls/3373

Minor cosmetic changes

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3373/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
470714103 MDU6SXNzdWU0NzA3MTQxMDM= 3154 pynio causes dependency conflicts in py36 CI build crusaderky 6213168 closed 0     9 2019-07-20T21:00:43Z 2019-10-03T15:22:17Z 2019-10-03T15:22:17Z MEMBER      

On Saturday night, all Python 3.6 CI builds started failing. Python 3.7 is unaffected. See https://dev.azure.com/xarray/xarray/_build/results?buildId=362&view=logs

MacOSX py36: UnsatisfiableError: The following specifications were found to be in conflict: - pynio - python=3.6 - rasterio

Linux py36: UnsatisfiableError: The following specifications were found to be in conflict: - cfgrib[version='>=0.9.2'] - h5netcdf - pynio

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3154/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
495221393 MDExOlB1bGxSZXF1ZXN0MzE4ODA4Njgy 3318 Allow weakref crusaderky 6213168 closed 0     2 2019-09-18T13:19:09Z 2019-10-03T13:39:35Z 2019-09-18T15:53:51Z MEMBER   0 pydata/xarray/pulls/3318
  • [x] Closes #3317
  • [x] Tests added
  • [x] Passes black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3318/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
501461219 MDExOlB1bGxSZXF1ZXN0MzIzNzI5Mjkx 3365 Demo: CI offline? crusaderky 6213168 closed 0     0 2019-10-02T12:34:38Z 2019-10-02T17:32:18Z 2019-10-02T17:32:13Z MEMBER   0 pydata/xarray/pulls/3365
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3365/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
501461397 MDU6SXNzdWU1MDE0NjEzOTc= 3366 CI offline? crusaderky 6213168 closed 0     2 2019-10-02T12:35:00Z 2019-10-02T17:32:03Z 2019-10-02T17:32:03Z MEMBER      

Azure pipelines is not being triggered by PRs this morning. See https://github.com/pydata/xarray/pull/3358 and https://github.com/pydata/xarray/pull/3365.

Last run was 12 hours ago.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3366/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
500777641 MDExOlB1bGxSZXF1ZXN0MzIzMTc1OTk0 3359 Revisit # noqa annotations crusaderky 6213168 closed 0     1 2019-10-01T09:35:15Z 2019-10-01T18:13:59Z 2019-10-01T18:13:56Z MEMBER   0 pydata/xarray/pulls/3359

Revisit all # noqa annotation. Remove useless ones; replace blanket ones with specific error messages. Work around https://github.com/PyCQA/pyflakes/issues/453.

note: # noqa: F811 on the @overload'ed functions works around a pyflakes bug already fixed in git master (https://github.com/PyCQA/pyflakes/pull/435) but not in a release yet, so it has to stay for now.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3359/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
500912288 MDExOlB1bGxSZXF1ZXN0MzIzMjg1ODgw 3360 WIP: Fix codecov.io upload on Windows crusaderky 6213168 closed 0     1 2019-10-01T13:53:19Z 2019-10-01T15:13:21Z 2019-10-01T14:11:22Z MEMBER   0 pydata/xarray/pulls/3360

Closes #3354

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3360/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
498399866 MDExOlB1bGxSZXF1ZXN0MzIxMzM5MjE1 3346 CI test suites with pinned minimum dependencies crusaderky 6213168 closed 0     2 2019-09-25T16:38:44Z 2019-09-26T09:38:59Z 2019-09-26T09:38:47Z MEMBER   0 pydata/xarray/pulls/3346

Second step towards resolving #3222. Added two suites of CI tests: - Pinned minimum versions for all optional dependencies, except NEP18-dependant ones - Pinned minimum versions for NEP18 optional dependencies - at the moment only sparse; soon also pint (#3238)

All versions are the frozen snapshot of what py36.yml deploys today. This PR ensures that we won't have accidental breakages from this moment on. I made no effort to try downgrading to sensible obsolete versions, as that would require a completely different order of magnitude of work. I would suggest to proceed with the downgrades (and consequent bugfixes) over several small, iterative future PRs that build upon this framework.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3346/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
497945632 MDExOlB1bGxSZXF1ZXN0MzIwOTgwNzIw 3340 CI environments overhaul crusaderky 6213168 closed 0     7 2019-09-24T22:01:10Z 2019-09-25T01:50:08Z 2019-09-25T01:40:55Z MEMBER   0 pydata/xarray/pulls/3340

Propaedeutic CI work to #3222.

  • py36 and py37 are now identical
  • Many optional dependencies were missing in one test suite or another (see details below)
  • Tests that require hypothesis now always run if hypothesis is installed
  • py37-windows.yml requirements file has been rebuilt starting from py37.yml
  • Sorted requirements files alphabetically for better maintainability
  • Added black. This is not needed by CI, but I personally use these yaml files to deploy my dev environment and I would expect many more developers to do the same. Alternatively, we could go the other way around and remove flake8 from everywhere and mypy from py36 and py37-windows. IMHO the marginal speedup would not be worth the complication.

Added packages to py36.yml (net of changes in order): + black + hypothesis + nc-time-axis + numba + numbagg + pynio (https://github.com/pydata/xarray/issues/3154 seems to be now fixed upstream) + sparse

Added packages to py37.yml (net of changes in order):

  • black
  • cdms2
  • hypothesis
  • iris>=1.10
  • numba (previously implicitly installed from pip by numbagg; now installed from conda)
  • pynio

Added packages to py37-windows.yml (net of changes in order):

  • black
  • bottleneck
  • flake8
  • hypothesis
  • iris>=1.10
  • lxml
  • mypy==0.720
  • numba
  • numbagg
  • pseudonetcdf>=3.0.1
  • pydap
  • sparse
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3340/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
478886013 MDExOlB1bGxSZXF1ZXN0MzA1OTA3Mzk2 3196 One-off isort run crusaderky 6213168 closed 0     5 2019-08-09T09:17:39Z 2019-09-09T08:28:05Z 2019-08-23T20:33:04Z MEMBER   0 pydata/xarray/pulls/3196

A one-off, manually vetted and tweaked isort run

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3196/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
484499801 MDExOlB1bGxSZXF1ZXN0MzEwMzYxOTMz 3250 __slots__ crusaderky 6213168 closed 0     10 2019-08-23T12:16:44Z 2019-08-30T12:13:28Z 2019-08-29T17:14:20Z MEMBER   0 pydata/xarray/pulls/3250

What changes: - Most classes now define __slots__ - removed _initialized property - Enforced checks that all subclasses must also define __slots__. For third-party subclasses, this is for now a DeprecationWarning and should be changed into a hard crash later on. - 22% reduction in RAM usage - 5% performance speedup for a DataArray method that performs a _to_temp_dataset roundtrip

DISCUSS: support for third party subclasses is very poor at the moment (#1097). Should we skip the deprecation altogether?

Performance benchmark: ```python import timeit import psutil import xarray

a = xarray.DataArray([1, 2], dims=['x'], coords={'x': [10, 20]}) RUNS = 10000 t = timeit.timeit("a.roll(x=1, roll_coords=True)", globals=globals(), number=RUNS) print("{:.0f} us".format(t / RUNS * 1e6))

p = psutil.Process() N = 100000 rss0 = p.memory_info().rss x = [ xarray.DataArray([1, 2], dims=['x'], coords={'x': [10, 20]}) for _ in range(N) ] rss1 = p.memory_info().rss print("{:.0f} bytes".format((rss1 - rss0) / N)) ``` Output:

| test | env | master | slots | |:-------------:|:---:|:----------:| ----------:| | DataArray.roll | py35-min | 332 us | 360 us | | DataArray.roll | py37 | 354 us | 337 us | | RAM usage of a DataArray | py35-min | 2755 bytes | 2074 bytes | | RAM usage of a DataArray | py37 | 1970 bytes | 1532 bytes |

The performance degradation on Python 3.5 is caused by the deprecation mechanism - see changes to common.py.

I honestly never realised that xarray objects are measured in kilobytes (vs. 32 bytes of underlying buffers!)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3250/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
479587855 MDExOlB1bGxSZXF1ZXN0MzA2NDQ4ODIw 3207 Annotations for .data_vars() and .coords() crusaderky 6213168 closed 0     0 2019-08-12T11:08:45Z 2019-08-13T04:01:26Z 2019-08-12T20:49:02Z MEMBER   0 pydata/xarray/pulls/3207
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3207/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
479359871 MDExOlB1bGxSZXF1ZXN0MzA2Mjc0MTUz 3203 Match mypy version between CI and pre-commit hook crusaderky 6213168 closed 0     0 2019-08-11T11:30:36Z 2019-08-12T21:03:11Z 2019-08-11T22:32:41Z MEMBER   0 pydata/xarray/pulls/3203

Pre-commit hook is currently failing because of an issue detected by mypy 0.720 but not by mypy 0.650

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3203/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
479359010 MDExOlB1bGxSZXF1ZXN0MzA2MjczNTY3 3202 chunk sparse arrays crusaderky 6213168 closed 0 crusaderky 6213168   4 2019-08-11T11:19:16Z 2019-08-12T21:02:31Z 2019-08-12T21:02:25Z MEMBER   0 pydata/xarray/pulls/3202

Closes #3191

@shoyer I completely disabled wrapping in ImplicitToExplicitIndexingAdapter for sparse arrays, cupy arrays, etc. I'm not sure if it's desirable; the chief problem is that I don't think I understand the purpose of ImplicitToExplicitIndexingAdapter to begin with... some enlightenment would be appreciated.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3202/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
478343417 MDU6SXNzdWU0NzgzNDM0MTc= 3191 DataArray.chunk() from sparse array produces malformed dask array crusaderky 6213168 closed 0     1 2019-08-08T09:08:56Z 2019-08-12T21:02:24Z 2019-08-12T21:02:24Z MEMBER      

3117 by @nvictus introduces support for sparse in plain xarray.

dask already supports it.

Running with: - xarray git head - dask 2.2.0 - numpy 1.16.4 - sparse 0.7.0 - NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1

```python

import numpy, sparse, xarray, dask.array s = sparse.COO(numpy.array([1, 2]))
da1 = dask.array.from_array(s) da1._meta <COO: shape=(0,), dtype=int64, nnz=0, fill_value=0> da1.compute() <COO: shape=(2,), dtype=int64, nnz=2, fill_value=0> da2 = xarray.DataArray(s).chunk().data da2._meta
array([], dtype=int64) # Wrong da2.compute() RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method. ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3191/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
478891507 MDExOlB1bGxSZXF1ZXN0MzA1OTExODA2 3197 Enforce mypy compliance in CI crusaderky 6213168 closed 0     6 2019-08-09T09:29:55Z 2019-08-11T08:49:02Z 2019-08-10T09:48:33Z MEMBER   0 pydata/xarray/pulls/3197
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3197/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
478969353 MDExOlB1bGxSZXF1ZXN0MzA1OTc1NjM4 3198 Ignore example.grib.0112.idx crusaderky 6213168 closed 0     0 2019-08-09T12:47:12Z 2019-08-09T12:49:02Z 2019-08-09T12:48:08Z MEMBER   0 pydata/xarray/pulls/3198

open_dataset("<name>.grib", engine="cfgrib") creates a new file in the same directory called <name>.grib.<numbers>.idx

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3198/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
477814538 MDExOlB1bGxSZXF1ZXN0MzA1MDYyMzUw 3190 pyupgrade one-off run crusaderky 6213168 closed 0     2 2019-08-07T09:32:57Z 2019-08-09T08:50:22Z 2019-08-07T17:26:01Z MEMBER   0 pydata/xarray/pulls/3190

A one-off, manually vetted and tweaked run of pyupgrade

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3190/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
476218350 MDExOlB1bGxSZXF1ZXN0MzAzODE4ODg3 3177 More annotations crusaderky 6213168 closed 0     6 2019-08-02T14:49:50Z 2019-08-09T08:50:13Z 2019-08-06T01:19:36Z MEMBER   0 pydata/xarray/pulls/3177
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3177/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
202423683 MDU6SXNzdWUyMDI0MjM2ODM= 1224 fast weighted sum crusaderky 6213168 closed 0     5 2017-01-23T00:29:19Z 2019-08-09T08:36:11Z 2019-08-09T08:36:11Z MEMBER      

In my project I'm struggling with weighted sums of 2000-4000 dask-based xarrays. The time to reach the final dask-based array, the size of the final dask dict, and the time to compute the actual result are horrendous.

So I wrote the below which - as laborious as it may look - gives a performance boost nothing short of miraculous. At the bottom you'll find some benchmarks as well.

https://gist.github.com/crusaderky/62832a5ffc72ccb3e0954021b0996fdf

In my project, this deflated the size of the final dask dict from 5.2 million keys to 3.3 million and cut a 30% from the time required to define it.

I think it's generic enough to be a good addition to the core xarray module. Impressions?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1224/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
466750687 MDU6SXNzdWU0NjY3NTA2ODc= 3092 black formatting crusaderky 6213168 closed 0     14 2019-07-11T08:43:55Z 2019-08-08T22:34:53Z 2019-08-08T22:34:53Z MEMBER      

I, like many others, have irreversibly fallen in love with black. Can we apply it to the existing codebase and as an enforced CI test? The only (big) problem is that developers will need to manually apply it to any open branches and then merge from master - and even then, merging likely won't be trivial. How did the dask project tackle the issue?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3092/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
475599589 MDU6SXNzdWU0NzU1OTk1ODk= 3174 CI failure downloading external data crusaderky 6213168 closed 0     2 2019-08-01T10:21:36Z 2019-08-07T08:41:13Z 2019-08-07T08:41:13Z MEMBER      

The 'Docs' ci project is failing because http://naciscdn.org is unresponsive:

https://dev.azure.com/xarray/xarray/_build/results?buildId=408&view=logs&jobId=7e620c85-24a8-5ffa-8b1f-642bc9b1fc36

Excerpt: ``` /usr/share/miniconda/envs/xarray-tests/lib/python3.7/site-packages/cartopy/io/init.py:260: DownloadWarning: Downloading: http://naciscdn.org/naturalearth/110m/physical/ne_110m_coastline.zip warnings.warn('Downloading: {}'.format(url), DownloadWarning)

Exception occurred: File "/usr/share/miniconda/envs/xarray-tests/lib/python3.7/urllib/request.py", line 1319, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno 110] Connection timed out> The full traceback has been saved in /tmp/sphinx-err-nq73diee.log, if you want to report the issue to the developers. Please also report this if it was a user error, so that a better error message can be provided next time. A bug report can be filed in the tracker at https://github.com/sphinx-doc/sphinx/issues. Thanks!

[error]Bash exited with code '2'.

[section]Finishing: Build HTML docs

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3174/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
466886456 MDExOlB1bGxSZXF1ZXN0Mjk2NjQ1MTgy 3095 Fix regression: IndexVariable.copy(deep=True) casts dtype=U to object crusaderky 6213168 closed 0     6 2019-07-11T13:16:16Z 2019-08-02T14:37:52Z 2019-08-02T14:02:50Z MEMBER   0 pydata/xarray/pulls/3095
  • [x] Closes #3094
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3095/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
466815556 MDU6SXNzdWU0NjY4MTU1NTY= 3094 REGRESSION: copy(deep=True) casts unicode indices to object crusaderky 6213168 closed 0     3 2019-07-11T10:46:28Z 2019-08-02T14:02:50Z 2019-08-02T14:02:50Z MEMBER      

Dataset.copy(deep=True) and DataArray.copy (deep=True/False) accidentally cast IndexVariable's with dtype='<U*' to object. Same applies to copy.copy() and copy.deepcopy().

This is a regression in xarray >= 0.12.2. xarray 0.12.1 and earlier are unaffected.

```

In [1]: ds = xarray.Dataset( ...: coords={'x': ['foo'], 'y': ('x', ['bar'])}, ...: data_vars={'z': ('x', ['baz'])})

In [2]: ds
Out[2]: <xarray.Dataset> Dimensions: (x: 1) Coordinates: * x (x) <U3 'foo' y (x) <U3 'bar' Data variables: z (x) <U3 'baz'

In [3]: ds.copy()
Out[3]: <xarray.Dataset> Dimensions: (x: 1) Coordinates: * x (x) <U3 'foo' y (x) <U3 'bar' Data variables: z (x) <U3 'baz'

In [4]: ds.copy(deep=True)
Out[4]: <xarray.Dataset> Dimensions: (x: 1) Coordinates: * x (x) object 'foo' y (x) <U3 'bar' Data variables: z (x) <U3 'baz'

In [5]: ds.z
Out[5]: <xarray.DataArray 'z' (x: 1)> array(['baz'], dtype='<U3') Coordinates: * x (x) <U3 'foo' y (x) <U3 'bar'

In [6]: ds.z.copy()
Out[6]: <xarray.DataArray 'z' (x: 1)> array(['baz'], dtype='<U3') Coordinates: * x (x) object 'foo' y (x) <U3 'bar'

In [7]: ds.z.copy(deep=True)
Out[7]: <xarray.DataArray 'z' (x: 1)> array(['baz'], dtype='<U3') Coordinates: * x (x) object 'foo' y (x) <U3 'bar' ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3094/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
475571573 MDExOlB1bGxSZXF1ZXN0MzAzMjk0OTEx 3173 Fix distributed.Client.compute applied to DataArray crusaderky 6213168 closed 0     1 2019-08-01T09:22:39Z 2019-08-02T05:04:51Z 2019-08-01T21:43:11Z MEMBER   0 pydata/xarray/pulls/3173
  • [x] Closes #3171
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3173/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
475244610 MDU6SXNzdWU0NzUyNDQ2MTA= 3171 distributed.Client.compute fails on DataArray crusaderky 6213168 closed 0     2 2019-07-31T16:33:01Z 2019-08-01T21:43:11Z 2019-08-01T21:43:11Z MEMBER      

As of - dask 2.1.0 - distributed 2.1.0 - xarray 0.12.1 or git head (didn't try older versions):

```python

import xarray import distributed client = distributed.Client(set_as_default=False) ds = xarray.Dataset({'d': ('x', [1, 2])}).chunk(1) client.compute(ds).result() <xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: d (x) int64 1 2

client.compute(ds.d).result() distributed.worker - WARNING - Compute Failed Function: _dask_finalize args: ([[array([1]), array([2])]], <function Dataset._dask_postcompute at 0x316a1db70>, ([(True, <this-array>, (<function Variable._dask_finalize at 0x3168f7f28>, (<function finalize at 0x1166bb8c8>, (), ('x',), OrderedDict(), None)))], set(), {'x': 2}, None, None, None, None), 'd') kwargs: {} Exception: KeyError(<this-array>)


KeyError Traceback (most recent call last) <ipython-input-8-2dbfe1b2ff17> in <module> ----> 1 client.compute(ds.d).result()

/anaconda3/lib/python3.7/site-packages/distributed/client.py in result(self, timeout) 226 result = self.client.sync(self._result, callback_timeout=timeout, raiseit=False) 227 if self.status == "error": --> 228 six.reraise(*result) 229 elif self.status == "cancelled": 230 raise result

/anaconda3/lib/python3.7/site-packages/six.py in reraise(tp, value, tb) 690 value = tp() 691 if value.traceback is not tb: --> 692 raise value.with_traceback(tb) 693 raise value 694 finally:

~/PycharmProjects/xarray/xarray/core/dataarray.py in _dask_finalize() 706 def _dask_finalize(results, func, args, name): 707 ds = func(results, *args) --> 708 variable = ds._variables.pop(_THIS_ARRAY) 709 coords = ds._variables 710 return DataArray(variable, coords, name=name, fastpath=True) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3171/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
467756080 MDExOlB1bGxSZXF1ZXN0Mjk3MzQwNTEy 3112 More annotations in Dataset crusaderky 6213168 closed 0     10 2019-07-13T19:06:49Z 2019-08-01T10:41:51Z 2019-07-31T17:48:00Z MEMBER   0 pydata/xarray/pulls/3112
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3112/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
252548859 MDU6SXNzdWUyNTI1NDg4NTk= 1524 (trivial) xarray.quantile silently resolves dask arrays crusaderky 6213168 closed 0     9 2017-08-24T09:54:11Z 2019-07-23T00:18:06Z 2017-08-28T17:31:57Z MEMBER      

In variable.py, line 1116, you're missing a raise statement:

if isinstance(self.data, dask_array_type): TypeError("quantile does not work for arrays stored as dask " "arrays. Load the data via .compute() or .load() prior " "to calling this method.")

Currently looking into extending dask.percentile() to support more than 1D arrays, and then use it in xarray too.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1524/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
466765652 MDExOlB1bGxSZXF1ZXN0Mjk2NTQ1MjA4 3093 Increase minimum Python version to 3.5.3 crusaderky 6213168 closed 0     2 2019-07-11T09:12:02Z 2019-07-13T23:54:48Z 2019-07-13T21:58:31Z MEMBER   0 pydata/xarray/pulls/3093

Closes #3089

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3093/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
465984161 MDU6SXNzdWU0NjU5ODQxNjE= 3089 Python 3.5.0-3.5.1 support crusaderky 6213168 closed 0     5 2019-07-09T21:04:28Z 2019-07-13T21:58:31Z 2019-07-13T21:58:31Z MEMBER      

Python 3.5.0 has gone out of the conda-forge repository. 3.5.1 is still there... for now. The anaconda repository starts directly from 3.5.4. 3.5.0 and 3.5.1 are a colossal pain in the back for typing support. Is this a good time to increase the requirement to >= 3.5.2? I honestly can't think how anybody could be unable to upgrade to the latest available 3.5 with minimal effort...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3089/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
264517839 MDU6SXNzdWUyNjQ1MTc4Mzk= 1625 Option for arithmetics to ignore nans created by alignment crusaderky 6213168 closed 0     3 2017-10-11T09:33:34Z 2019-07-11T09:48:07Z 2019-07-11T09:48:07Z MEMBER      

Can anybody tell me if there is anybody who benefits from this behaviour? I can't think of any good use cases.

``` wallet = xarray.DataArray([50, 70], dims=['currency'], coords={'currency': ['EUR', 'USD']}) restaurant_bill = xarray.DataArray([30], dims=['currency'], coords={'currency': ['USD']}) with xarray.set_options(arithmetic_join="outer"): print(wallet - restaurant_bill)

<xarray.DataArray (currency: 2)> array([ nan, 40.]) Coordinates: * currency (currency) object 'EUR' 'USD' ```

While it is fairly clear why it can be desirable to have nan + not nan = nan as a default in arithmetic when the nan is already present in one of the input arrays, when the nan is introduced as part of an automatic align things become much less intuitive.

Proposal: - add a parameter to xarray.align, fillvalue=numpy.nan, which determines what will appear in the newly created array elements - change __add__, __sub__ etc. to invoke xarray.align(fillvalue=0) - change __mul__, __truediv__ etc. to invoke xarray.align(fillvalue=1)

In theory the setting could be left as an opt-in as set_options(arithmetic_align_fillvalue='neutral'), yet I wonder who would actually want the current behaviour?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1625/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
466004569 MDExOlB1bGxSZXF1ZXN0Mjk1OTM1Nzg2 3090 WIP: more annotations crusaderky 6213168 closed 0     3 2019-07-09T22:02:44Z 2019-07-11T08:40:34Z 2019-07-11T04:20:56Z MEMBER   0 pydata/xarray/pulls/3090
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3090/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
    xarray 13221727 pull
442159309 MDExOlB1bGxSZXF1ZXN0Mjc3MzMxMjQx 2950 Base classes in Python 3 don't need to subclass object crusaderky 6213168 closed 0     3 2019-05-09T10:14:38Z 2019-07-09T20:06:21Z 2019-05-09T16:01:37Z MEMBER   0 pydata/xarray/pulls/2950
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2950/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
462401539 MDExOlB1bGxSZXF1ZXN0MjkzMTAxODQx 3065 kwargs.pop() cleanup crusaderky 6213168 closed 0     7 2019-06-30T12:47:07Z 2019-07-09T20:06:13Z 2019-07-01T01:58:50Z MEMBER   0 pydata/xarray/pulls/3065
  • Clean up everywhere the pattern def my_func(*args, **kwargs): my_optional_arg = kwargs.pop('my_optional_arg', None) which was inherited from not being able to put named keyword arguments after *args in Python 2.

  • Fix bug in SplineInterpolator where the __init__ method would write to the class attributes of BaseInterpolator.

  • map_dataarray was unintentionally and subtly relying on _process_cmap_cbar_kwargs to modify the kwargs in place. _process_cmap_cbar_kwargs is now strictly read-only and the modifications in kwargs have been made explicit in the caller function.
  • Rename all 'kwds' to 'kwargs' for sake of coherency
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3065/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
464929212 MDExOlB1bGxSZXF1ZXN0Mjk1MDg4MjMx 3088 More annotations crusaderky 6213168 closed 0     3 2019-07-07T08:40:15Z 2019-07-09T20:04:37Z 2019-07-09T16:23:12Z MEMBER   0 pydata/xarray/pulls/3088

A little incremental addition to type annotations. By no means complete, but it should be ready for merge in its own right nonetheless.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3088/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
438421176 MDExOlB1bGxSZXF1ZXN0Mjc0NDU1ODQz 2929 Typing for DataArray/Dataset crusaderky 6213168 closed 0     25 2019-04-29T17:19:35Z 2019-06-30T10:08:39Z 2019-06-25T22:03:40Z MEMBER   0 pydata/xarray/pulls/2929

Status: * I'm generally not pleased with the amount of added verbosity. Happy to accept suggestions on how to improve. * Switching all variable names from str to Hashable. Without proper unit tests however (out of scope) non-string hashables are expected not to work most of the times. My preference would still be to stay limited on str... * DataArray done. * Dataset not done (except where it was hindering DataArray). * mypy passes with the only error "Mapping[...]" has no attribute "copy". This is due to the fact that I can't see a way to use typing.OrderedDict without breaking compatibility with python < 3.7.2. * py.test should be successful

@shoyer any early feedback is appreciated

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2929/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
430214243 MDExOlB1bGxSZXF1ZXN0MjY4MTUyODIw 2877 WIP: type annotations crusaderky 6213168 closed 0     12 2019-04-08T00:55:31Z 2019-04-24T14:54:07Z 2019-04-10T18:41:50Z MEMBER   0 pydata/xarray/pulls/2877

Fixes #2869

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2877/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
341355638 MDU6SXNzdWUzNDEzNTU2Mzg= 2289 DataArray.to_csv() crusaderky 6213168 closed 0     6 2018-07-15T21:56:20Z 2019-03-12T15:01:18Z 2019-03-12T15:01:18Z MEMBER      

I'm using xarray to aggregate 38 GB worth of NetCDF data into a bunch of CSV reports. I have two problems:

  1. The reports are 500,000 rows by 2,000 columns. Before somebody says "if you're using CSV for this size of data you're doing it wrong" - yes, I know, but it was the only way to make the data accessible to a bunch of people that only know how to use Excel and VBA. :tired_face: The sheer size of the reports means that (1) it's unsavory to keep the whole thing in RAM (2) pandas to_csv will take ages to complete (as it's single-threaded). The slowness is compounded by the fact that I have to compress everything with gzip.
  2. I have to produce up to 40 reports from the exact same NetCDF files. I use dask to perform the computation, and different reports share a large amount of intermediate graph nodes. So I need to do everything in a single invocation to dask.compute() to allow the dask scheduler to de-duplicate the nodes.

To solve both problems, I wrote a new function: http://xarray-extras.readthedocs.io/en/latest/api/csv.html

And now my high level wrapper code looks like this: ```

DataSet from 200 .nc files, with a total of 500000 points on the 'row' dimension

nc = xarray.open_mfdataset('inputs..nc') reports = [ # DataArrays with shape (500000, 2000), with the rows split in 200 chunks gen_report0(nc), gen_report1(nc), .... gen_report39(nc), ] futures = [ # dask.delayed objects to_csv(reports[0], 'report0.csv.gz', compression='gzip'), to_csv(reports[1], 'report1.csv.gz', compression='gzip'), .... to_csv(reports[39], 'report39.csv.gz', compression='gzip'), ] dask.compute(futures) ``` The function is currently production quality in xarray-extras, but it would be very easy to refactor it as a method of xarray.DataArray in the main library.

Opinions?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2289/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
166439490 MDU6SXNzdWUxNjY0Mzk0OTA= 906 unstack() sorts data alphabetically crusaderky 6213168 closed 0     14 2016-07-19T21:25:26Z 2019-02-23T12:47:00Z 2019-02-23T12:47:00Z MEMBER      

DataArray.unstack() sorts the data alphabetically by label. Besides being poor for performance, this is very problematic whenever the order matters, and the labels are not in alphabetical order to begin with.

``` python

import xarray import pandas

index = [ ['x1', 'first' ], ['x1', 'second'], ['x1', 'third' ], ['x1', 'fourth'], ['x0', 'first' ], ['x0', 'second'], ['x0', 'third' ], ['x0', 'fourth'], ] index = pandas.MultiIndex.from_tuples(index, names=['x', 'count']) s = pandas.Series(list(range(8)), index) a = xarray.DataArray(s) a ```

<xarray.DataArray (dim_0: 8)> array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int64) Coordinates: * dim_0 (dim_0) object ('x1', 'first') ('x1', 'second') ('x1', 'third') ...

python a.unstack('dim_0')

<xarray.DataArray (x: 2, count: 4)> array([[4, 7, 5, 6], [0, 3, 1, 2]], dtype=int64) Coordinates: * x (x) object 'x0' 'x1' * count (count) object 'first' 'fourth' 'second' 'third'

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/906/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
168469112 MDU6SXNzdWUxNjg0NjkxMTI= 926 stack() on dask array produces inefficient chunking crusaderky 6213168 closed 0     4 2016-07-30T14:12:34Z 2019-02-01T16:04:43Z 2019-02-01T16:04:43Z MEMBER      

Whe the stack() method is used on a xarray with dask backend, one would expect that every output chunk is produced by exactly 1 input chunk.

This is not the case, as stack() actually produces an extremely fragmented dask array: https://gist.github.com/crusaderky/07991681d49117bfbef7a8870e3cba67

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/926/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
193294729 MDU6SXNzdWUxOTMyOTQ3Mjk= 1152 Scalar coords seep into index coords crusaderky 6213168 closed 0     8 2016-12-03T15:43:53Z 2019-02-01T16:02:12Z 2019-02-01T16:02:12Z MEMBER      

Is this by design? I can't put any sense in it ```

a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'x': [1, 2, 3], 'y': 10}) a.coords['x'] <xarray.DataArray 'x' (x: 3)> array([1, 2, 3]) Coordinates: * x (x) int64 1 2 3 y int64 10 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1152/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
172291585 MDU6SXNzdWUxNzIyOTE1ODU= 979 align() should align chunks crusaderky 6213168 closed 0     4 2016-08-20T21:25:01Z 2019-01-24T17:19:30Z 2019-01-24T17:19:30Z MEMBER      

In the xarray docs I read

With the current version of dask, there is no automatic alignment of chunks when performing operations between dask arrays with different chunk sizes. If your computation involves multiple dask arrays with different chunks, you may need to explicitly rechunk each array to ensure compatibility.

While chunk auto-alignment could be done within the dask library, that would be limited to arrays with the same dimensionality and same dims order. For example it would not be possible to have a dask library call to align the chunks on xarrays with the following dims: - (time, latitude, longitude) - (time) - (longitude, latitude)

even if it makes perfect sense in xarray.

I think xarray.align() should take care of it automatically.

A safe algorithm would be to always scale down the chunksize when in conflict. This would prevent having chunks larger than expected, and should minimise (in a greedy way) the number of operations. It's also a good idea on dask.distributed, where merging two chunks could cause one of them to travel on the network - which is very expensive.

e.g. to reconcile chunksizes a: (5, 10, 6) b: (5, 7, 9) the algorithm would rechunk both arrays to (5, 7, 3, 6).

Finally, when served with a numpy-based array and a dask-based array, align() should convert the numpy array to dask. The critical use case that would benefit from this behaviour is when align() is invoked inside a broadcast() between a tiny constant you just loaded from csv/pandas/pure python list/whatever - e.g. dims=(time, ) shape=(100, ) - and a huge dask-backed array e.g. dims=(time, scenario) shape=(100, 2**30) chunks=(25, 2**20).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/979/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
296927704 MDU6SXNzdWUyOTY5Mjc3MDQ= 1909 Failure in test_cross_engine_read_write_netcdf3 crusaderky 6213168 closed 0     3 2018-02-13T23:48:44Z 2019-01-13T20:56:14Z 2019-01-13T20:56:14Z MEMBER      

Two unit tests are failing in the latest git master: - GenericNetCDFDataTest.test_cross_engine_read_write_netcdf3 - GenericNetCDFDataTestAutocloseTrue.test_cross_engine_read_write_netcdf3

Both with the message:

``` xarray/tests/test_backends.py:1558:


xarray/backends/api.py:286: in open_dataset autoclose=autoclose) xarray/backends/netCDF4_.py:275: in open ds = opener() xarray/backends/netCDF4_.py:199: in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs) netCDF4/_netCDF4.pyx:2015: in netCDF4._netCDF4.Dataset.init ???


??? E OSError: [Errno -36] NetCDF: Invalid argument: b'/tmp/tmpwp675lnc/temp-1069.nc'

netCDF4/_netCDF4.pyx:1636: OSError ```

Attaching conda list: conda.txt

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1909/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
339611449 MDU6SXNzdWUzMzk2MTE0NDk= 2273 to_netcdf uses deprecated and unnecessary dask call crusaderky 6213168 closed 0     4 2018-07-09T21:20:20Z 2018-07-31T20:03:41Z 2018-07-31T19:42:20Z MEMBER      

```

ds = xarray.Dataset({'x': 1}) ds.to_netcdf('foo.nc') dask/utils.py:1010: UserWarning: Deprecated, see dask.base.get_scheduler instead ```

Stack trace: ```

xarray/backends/common.py(44)get_scheduler() 43 from dask.utils import effective_get ---> 44 actual_get = effective_get(get, collection) ``` There are two separate problems here:

  • dask recently changed API from get(get=callable) to get(scheduler=str). Should we
  • just increase the minimum version of dask (I doubt anybody will complain)
  • go through the hoops of dynamically invoking a different API depending on the dask version :sweat:
  • silence the warning now, and then increase the minimum version of dask the day that dask removes the old API entirely (risky)?
  • xarray is calling dask even when it's unnecessary, as none of the variables in the example Dataset had a dask backend. I don't think there are any CI suites for NetCDF without dask. I'm also wondering if they would bring any actual added value, as dask is small, has no exotic dependencies, and is pure Python; so I doubt anybody will have problems installing it whatever his setup is.

@shoyer opinion?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2273/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
328734143 MDExOlB1bGxSZXF1ZXN0MTkyMTkyMDYw 2212 Trivial documentation fix crusaderky 6213168 closed 0     1 2018-06-02T10:47:20Z 2018-06-07T23:49:44Z 2018-06-02T12:15:33Z MEMBER   0 pydata/xarray/pulls/2212
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2212/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
324066244 MDExOlB1bGxSZXF1ZXN0MTg4NzY5OTUx 2150 WIP: Fix regression in to_netcdf encoding parameter crusaderky 6213168 closed 0     0 2018-05-17T15:09:16Z 2018-06-02T10:06:11Z 2018-06-02T10:06:11Z MEMBER   0 pydata/xarray/pulls/2150

Fixes #2149 DONE: unit tests triggering the issue. The new tests are all successful against 0.10.3. TODO: actual bugfix

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2150/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
324040111 MDU6SXNzdWUzMjQwNDAxMTE= 2149 [REGRESSION] to_netcdf doesn't accept dtype=S1 encoding anymore crusaderky 6213168 closed 0     5 2018-05-17T14:09:15Z 2018-06-01T01:09:38Z 2018-06-01T01:09:38Z MEMBER      

In xarray 0.10.4, the dtype encoding in to_netcdf has stopped working, for all engines: ```

import xarray ds = xarray.Dataset({'x': ['foo', 'bar', 'baz']}) ds.to_netcdf('test.nc', encoding={'x': {'dtype': 'S1'}}) [...]

xarray/backends/netCDF4_.py in _extract_nc4_variable_encoding(variable, raise_on_invalid, lsd_okay, h5py_okay, backend, unlimited_dims) 196 if invalid: 197 raise ValueError('unexpected encoding parameters for %r backend: ' --> 198 ' %r' % (backend, invalid)) 199 else: 200 for k in list(encoding):

ValueError: unexpected encoding parameters for 'netCDF4' backend: ['dtype'] ``` I'm still trying to figure out how the regression tests didn't pick it up and what change introduced it.

@shoyer I'm working on this as my top priority. Do you agree this is serious enough for an emergency re-release? (0.10.4.1 or 0.10.5, your choice)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2149/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
324410381 MDU6SXNzdWUzMjQ0MTAzODE= 2161 Regression: Dataset.update(Dataset) crusaderky 6213168 closed 0     0 2018-05-18T13:26:58Z 2018-05-29T04:34:47Z 2018-05-29T04:34:47Z MEMBER      

Dataset().update(Dataset())

FutureWarning: iteration over an xarray.Dataset will change in xarray v0.11 to only include data variables, not coordinates. Iterate over the Dataset.variables property instead to preserve existing behavior in a forwards compatible manner.

This is a regression in xarray 0.10.4. @shoyer this isn't serious enough to warrant an immediate release on its own, but we're already doing one so we might as well include it.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2161/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
324409064 MDU6SXNzdWUzMjQ0MDkwNjQ= 2160 pandas-0.23 breaks stack with duplicated indices crusaderky 6213168 closed 0     3 2018-05-18T13:23:26Z 2018-05-26T03:29:46Z 2018-05-26T03:29:46Z MEMBER      

In this script: ``` import pandas import xarray

df = pandas.DataFrame( [[1, 2], [3, 4]], index=['foo', 'foo'], columns=['bar', 'baz']) print(df.stack())

a = xarray.DataArray(df) print(a.stack(s=a.dims)) ```

The first part works both with pandas 0.22 and 0.23. The second part works in xarray 0.10. 4 + pandas 0.22, and crashes with pandas 0.23:

File "/mnt/resource/tmp/anaconda_guido/lib/python3.6/site-packages/xarray/core/dataarray.py", line 1115, in stack ds = self._to_temp_dataset().stack(**dimensions) File "/mnt/resource/tmp/anaconda_guido/lib/python3.6/site-packages/xarray/core/dataset.py", line 2123, in stack result = result._stack_once(dims, new_dim) File "/mnt/resource/tmp/anaconda_guido/lib/python3.6/site-packages/xarray/core/dataset.py", line 2092, in _stack_once idx = utils.multiindex_from_product_levels(levels, names=dims) File "/mnt/resource/tmp/anaconda_guido/lib/python3.6/site-packages/xarray/core/utils.py", line 96, in multiindex_from_product_levels return pd.MultiIndex(levels, labels, sortorder=0, names=names) File "/mnt/resource/tmp/anaconda_guido/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 240, in __new__ result._verify_integrity() File "/mnt/resource/tmp/anaconda_guido/lib/python3.6/site-packages/pandas/core/indexes/multi.py", line 283, in _verify_integrity level=i)) ValueError: Level values must be unique: ['foo', 'foo'] on level 0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2160/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
320443090 MDExOlB1bGxSZXF1ZXN0MTg2MTEzNzM5 2106 xarray.dot to pass **kwargs to einsum crusaderky 6213168 closed 0     3 2018-05-04T22:01:04Z 2018-05-17T13:54:41Z 2018-05-14T21:06:38Z MEMBER   0 pydata/xarray/pulls/2106

Late addition to #2089

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2106/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
297631403 MDExOlB1bGxSZXF1ZXN0MTY5NTEyMjU1 1915 h5netcdf new API support crusaderky 6213168 closed 0     13 2018-02-15T23:15:55Z 2018-05-11T23:49:00Z 2018-05-08T02:25:40Z MEMBER   0 pydata/xarray/pulls/1915

Closes #1536

Support arbitrary compression plugins through the h5netcdf new API.

Done: - public API and docstrings (untested) - implementation - unit tests - What's New

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1915/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
253476466 MDU6SXNzdWUyNTM0NzY0NjY= 1536 Better compression algorithms for NetCDF crusaderky 6213168 closed 0     28 2017-08-28T22:35:31Z 2018-05-08T02:25:40Z 2018-05-08T02:25:40Z MEMBER      

As of today, Dataset.to_netcdf() exclusively allows writing uncompressed or compressed with zlib. zlib was absolutely revolutionary when it was released... in 1995. Time has passed, and much better compression algorithms have appeared over time. Good news is, h5py supports LZF out of the box, and is extensible with plugins to support theoretically any other algorithm. h5netcdf exposes such interface through its new (non-legacy) API; however Dataset.to_netcdf(engine='h5netcdf') supports the legacy API exclusively.

I already tested that, once you manage to write to disk with LZF (using h5netcdf directly), open_dataset(engine='h5netcdf') transparently opens the compressed store.

Options: - write a new engine for Dataset.to_netcdf() to support the new h5netcdf API. - switch the whole engine='h5netcdf' to the new API. Drop support for the old parameters in to_netcdf(). This is less bad than it sounds, as people can switch to another engine in case of trouble. This is the cleanest solution, but also the most disruptive one. - switch the whole engine='h5netcdf' to the new API; have to_netcdf() accept both new and legacy parameters, and implement a translation layer of parameters from the legacy API to the new API. The benefit here is that, as long as the user sticks to the legacy API, he can hop between engines transparently. On the other hand I have a hard time believing anybody would care. - ?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1536/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
317421267 MDU6SXNzdWUzMTc0MjEyNjc= 2079 New feature: interp1d crusaderky 6213168 closed 0     8 2018-04-24T22:45:03Z 2018-05-06T19:30:32Z 2018-05-06T19:30:32Z MEMBER      

I've written a series of wrappers for the 1-dimensional scipy interpolators.

Prototype code and colourful demo plots: https://gist.github.com/crusaderky/b0aa6b8fdf6e036cb364f6f40476cc67

Features

  • Interpolate a ND array on any arbitrary dimension
  • Nearest-neighbour, linear, quadratic, cubic, Akima, PCHIP, and custom interpolators are supported
  • dask supported on both on the interpolated array and x_new
  • Supports ND x_new arrays
  • The CPU-heavy interpolator generation (splrep) is executed only once and then can be applied to multiple x_new (splev)
  • Pickleable and distributed-friendly

Design hacks

  • Depends on dask module, even when all inputs are based on plain numpy.
  • Abuses attrs and the ability to invoke a.attrname to get the user experience of a new DataArray method.
  • Abuses the fact that the chunks of a dask.array.Array can contain anything and you won't notice until you compute them.

Limitations

  • Can't dump to netcdf. Not solvable without hacking into the implementation details of scipy.
  • Datasets are not supported. Trivial to fix after solving #1699.
  • Chunks are not supported on x_new. Trivial to fix after solving #1995.
  • Chunks are not supported along the interpolated dimension. This is very complicated to solve. If x and x_new were always monotonic ascending,it would be (not trivially) solvable with dask.array.ghost.ghost. If you make no assumptions about monotonicity, things become way more complicated. A solution would need to go in the dask module, and then be invoked trivially from here with dask='allowed'.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2079/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
318574154 MDExOlB1bGxSZXF1ZXN0MTg0NzQ2MTgy 2089 xarray.dot is now based on da.einsum crusaderky 6213168 closed 0     3 2018-04-27T23:09:32Z 2018-05-04T21:51:01Z 2018-05-04T21:51:00Z MEMBER   0 pydata/xarray/pulls/2089

Closes #2074 Requires dask >= 0.17.3

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2089/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
316618290 MDU6SXNzdWUzMTY2MTgyOTA= 2074 xarray.dot() dask problems crusaderky 6213168 closed 0     10 2018-04-22T22:18:10Z 2018-05-04T21:51:00Z 2018-05-04T21:51:00Z MEMBER      

xarray.dot() has comparable performance with numpy.einsum. However, when it uses a dask backend, it's much slower than the new dask.array.einsum function (https://github.com/dask/dask/pull/3412). The performance gap widens when the dimension upon which you are reducing is chunked.

Also, for some reason dot(a<s, t>, b<t>, dims=[t]) and dot(a<s,t>, a<s,t>, dims=[s,t]) do work (very slowly) when s and t are chunked, while dot(a<s, t>, a<s, t>, dims=[t]) crashes complaining it can't operate on a chunked core dim (related discussion: https://github.com/pydata/xarray/issues/1995).

The proposed solution is to simply wait for https://github.com/dask/dask/pull/3412 to reach the next release and then reimplement xarray.dot to use dask.array.einsum. This means that dask users will lose the ability to use xarray.dot if they upgrade xarray version but not dask version, but I believe it shouldn't be a big problem for most?

``` import numpy import dask.array import xarray

def bench(tchunk, a_by_a, dims, iis): print(f"\nbench({tchunk}, {a_by_a}, {dims}, {iis})")

a = xarray.DataArray(
    dask.array.random.random((500000, 100), chunks=(50000, tchunk)),
    dims=['s', 't'])
if a_by_a:
    b = a
else:
    b = xarray.DataArray(
        dask.array.random.random((100, ), chunks=tchunk),
        dims=['t'])

print("xarray.dot(numpy backend):")
%timeit xarray.dot(a.compute(), b.compute(), dims=dims)
print("numpy.einsum:")
%timeit numpy.einsum(iis, a, b)
print("xarray.dot(dask backend):")
try:
    %timeit xarray.dot(a, b, dims=dims).compute()
except ValueError as e:
    print(e)
print("dask.array.einsum:")
%timeit dask.array.einsum(iis, a, b).compute()

bench(100, False, ['t'], '...i,...i') bench( 20, False, ['t'], '...i,...i') bench(100, True, ['t'], '...i,...i') bench( 20, True, ['t'], '...i,...i') bench(100, True, ['s', 't'], '...ij,...ij') bench( 20, True, ['s', 't'], '...ij,...ij') Output: bench(100, False, ['t'], ...i,...i) xarray.dot(numpy backend): 195 ms ± 3.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) numpy.einsum: 205 ms ± 2.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) xarray.dot(dask backend): 356 ms ± 44.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 244 ms ± 10.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

bench(20, False, ['t'], ...i,...i) xarray.dot(numpy backend): 297 ms ± 16.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 254 ms ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 732 ms ± 74.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 274 ms ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

bench(100, True, ['t'], ...i,...i) xarray.dot(numpy backend): 438 ms ± 43.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 415 ms ± 17.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 633 ms ± 31.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 431 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

bench(20, True, ['t'], ...i,...i) xarray.dot(numpy backend): 457 ms ± 17.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 463 ms ± 24.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): dimension 't' on 0th function argument to apply_ufunc with dask='parallelized' consists of multiple chunks, but is also a core dimension. To fix, rechunk into a single dask array chunk along this dimension, i.e., .rechunk({'t': -1}), but beware that this may significantly increase memory usage. dask.array.einsum: 485 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

bench(100, True, ['s', 't'], ...ij,...ij) xarray.dot(numpy backend): 418 ms ± 14.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 444 ms ± 43.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 384 ms ± 57.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 415 ms ± 19.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

bench(20, True, ['s', 't'], ...ij,...ij) xarray.dot(numpy backend): 489 ms ± 2.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) numpy.einsum: 443 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) xarray.dot(dask backend): 585 ms ± 64.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) dask.array.einsum: 455 ms ± 13.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2074/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
320104170 MDU6SXNzdWUzMjAxMDQxNzA= 2103 An elegant way to guarantee single chunk along dim crusaderky 6213168 closed 0     2 2018-05-03T22:40:48Z 2018-05-04T20:11:30Z 2018-05-04T20:10:50Z MEMBER      

Algorithms that are wrapped by xarray.apply_ufunc(dask='parallelized'), and in general most algorithms for which aren't embarassingly parallel and for which there isn't a sophisticated dask function that allows for multiple chunks, cannot have multiple chunks on their core dimensions.

I have lost count of how many times I prefixed my invocations of apply_ufunc on a DataArray with the same blurb, over and over again: if x.chunks: x = x.chunk({dim: x.shape[x.dims.index(dim)]}) The reason why it looks so awful is that DataArray.shape, DataArray.dims, Variable.shape and Variable.dims are positional.

I can see a few possible solutions to the problem:

Design 1

Change DataArray.chunk etc. to accept a special chunk size, e.g. -1, which means "whatever the size of that dim is". The above would become: if x.chunks: x = x.chunk({dim: -1}) which is much more bearable. One could argue that the implementation would need to happen in dask.array.rechunk; on the other hand in dask it woulf feel silly, because already today you can do it in a very synthetic way: x = x.rechunk({axis: x.shape[axis]}) I'm not overly fond of this solution as it would be rather obscure for anybody who isn't super familiar with the API documentation.

Design 2

Add properties to DataArray and Variable, ddims and dshape (happy to hear suggestions about better names), which would return dims and shape as a OrderedDict, just like Dataset.dims and Dataset.shape.

The above would become: if x.chunks: x = x.chunk({dim: x.dshape[dim]})

Design 3

Change dask.array.rechunk to accept numpy.inf / math.inf as the chunk size. This makes sense, as the function already accepts chunk sizes that are larger than the shape - however, it's currently limited to int. This is probably my personal favourite, and trivial to implement too.

The above would become: if x.chunks: x = x.chunk({dim: np.inf})

Design 4

Introduce a convenience method for DataArray, Dataset, and Variable, ensure_single_chunk(*dims). Below a prototype: ``` def ensure_single_chunk(a, *dims): """If a has dask backend and two or more chunks on dims, rechunk it so that they become single-chunked. This is typically a prerequisite for computing any algorithm along dim that is not embarassingly parallel (short of sophisticated implementations such as those found in the dask module).

:param a:
    any xarray object
:param str dims:
    one or more dims of a to rechunk
:returns:
    copy of a, where all listed dims are guaranteed to be on a single dask chunk.
    if a has numpy backend, return a shallow copy of it.
"""
if isinstance(a, xarray.Dataset):
    dims = set(dims)
    unknown_dims = dims - a.dims.keys()
    if unknown_dims:
        raise ValueError("dim(s) %s not found" % ",".join(unknown_dims))
    a = a.copy(deep=False)
    for k, v in a.variables.items():
        if v.chunks:
            a[k] = ensure_single_chunk(v, *(set(v.dims) & dims))
    return a

if not isinstance(a, (xarray.DataArray, xarray.Variable)):
    raise TypeError('a must be a DataArray, Dataset, or Variable')

if not a.chunks:
    # numpy backend
    return a.copy(deep=False)

return a.chunk({
    dim: a.shape[a.dims.index(dim)]
    for dim in dims
})

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2103/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 4720.533ms · About: xarray-datasette