home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

223 rows where user = 2443309 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: milestone, comments, state_reason, created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 144
  • issue 79

state 2

  • closed 206
  • open 17

repo 1

  • xarray 223
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
602256880 MDU6SXNzdWU2MDIyNTY4ODA= 3981 [Proposal] Expose Variable without Pandas dependency jhamman 2443309 open 0     23 2020-04-17T22:00:10Z 2024-04-24T17:19:55Z   MEMBER      

This issue proposes exposing Xarray's Variable class as a stand-alone array class with named axes (dims) and arbitrary metadata (attrs) but without coordinates (indexes). Yes, this already exists but the Variable class in currently inseparable from our Pandas dependency, despite not utilizing any of its functionality. What would this entail?

The biggest change would be in making Pandas an optional dependency and isolating any imports. This change could be confined to the Variable object or could be propagated further as the Explicit Indexes work proceeds (#1603).

Why?

Within Xarray, the Variable class is a vital building block for many of our internal data structures. Recently, the utility of a simple array with named dimensions has been highlighted by a few potential user communities:

  • Scikit-learn: https://github.com/scikit-learn/enhancement_proposals/pull/18
  • PyTorch: (https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html, http://nlp.seas.harvard.edu/NamedTensor)

An example from the above linked SLEP as to why users may not want Pandas a dependency in Xarray:

@amueller: ...If we go this route, I think we need to make xarray, and therefore pandas, a mandatory dependency... ... @adrinjalali: ...And we still do have the option of making a NamedArray. xarray uses the pandas' index classes for the indexing and stuff, which is something we really don't need...

Since we already have a class developed that meets these applications' use cases, its seems only prudent to evaluate the feasibility in exposing the Variable as a low-level api object.

In conclusion, I'm not sure this is currently worth the effort but its probably worth exploring at this point.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3981/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
606530049 MDU6SXNzdWU2MDY1MzAwNDk= 4001 [community] Bi-weekly community developers meeting jhamman 2443309 open 0     14 2020-04-24T19:22:01Z 2024-03-27T15:33:28Z   MEMBER      

Hello Xarray Community and @pydata/xarray,

Starting next week, we will be hosting a bi-weekly 30-minute community/developers meeting. The goal of this meeting is to help coordinate Xarray development efforts and better connect the user/developer community.

When

Every other Wednesday at 8:30a PT (11:30a ET) beginning April 29th, 2020.

Calendar options: - Google Calendar - Ical format

Where

https://us02web.zoom.us/j/87503265754?pwd=cEFJMzFqdTFaS3BMdkx4UkNZRk1QZz09

Rolling agenda and meeting notes

We'll keep a rolling agenda and set of meeting notes - Through Sept. 2022. - Starting October 2022 (requires sign-in) - Starting March 2024

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4001/reactions",
    "total_count": 5,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 5,
    "eyes": 0
}
    xarray 13221727 issue
2089084562 PR_kwDOAMm_X85kd6jT 8622 Update min deps in docs jhamman 2443309 closed 0     0 2024-01-18T21:35:49Z 2024-01-19T00:12:08Z 2024-01-19T00:12:07Z MEMBER   0 pydata/xarray/pulls/8622

Follow up to https://github.com/pydata/xarray/pull/8586

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8622/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1564627108 I_kwDOAMm_X85dQlCk 7495 Deprecate open_zarr in favor of open_dataset(..., engine='zarr') jhamman 2443309 open 0     2 2023-01-31T16:21:07Z 2023-12-12T18:00:15Z   MEMBER      

What is your issue?

We have discussed many time deprecating xarray.open_zarr in favor of xarray.open_dataset(..., engine='zarr'). This issue tracks that process and is a place for us to discuss any issues that may arise as a result of the change.

xref: https://github.com/pydata/xarray/issues/2812, https://github.com/pydata/xarray/issues/7293

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7495/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1564661430 PR_kwDOAMm_X85I7qzk 7496 deprecate open_zarr jhamman 2443309 open 0     13 2023-01-31T16:40:38Z 2023-10-27T05:14:02Z   MEMBER   0 pydata/xarray/pulls/7496

This PR deprecates open_zarr in favor of open_dataset(..., engine='zarr').

  • [x] Closes #7495
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7496/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1953088785 PR_kwDOAMm_X85dUY1- 8346 Bump minimum numpy version jhamman 2443309 closed 0     3 2023-10-19T21:31:58Z 2023-10-19T22:16:23Z 2023-10-19T22:16:22Z MEMBER   0 pydata/xarray/pulls/8346

I believe this was missed in v2023.08.0 (Aug 18, 2023).

xref: https://github.com/conda-forge/xarray-feedstock/pull/97

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8346/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
    xarray 13221727 pull
33637243 MDU6SXNzdWUzMzYzNzI0Mw== 131 Dataset summary methods jhamman 2443309 closed 0   0.2 650893 10 2014-05-16T00:17:56Z 2023-09-28T12:42:34Z 2014-05-21T21:47:29Z MEMBER      

Add summary methods to Dataset object. For example, it would be great if you could summarize a entire dataset in a single line.

(1) Mean of all variables in dataset.

python mean_ds = ds.mean()

(2) Mean of all variables in dataset along a dimension:

python time_mean_ds = ds.mean(dim='time')

In the case where a dimension is specified and there are variables that don't use that dimension, I'd imagine you would just pass that variable through unchanged.

Related to #122.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/131/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1562712670 PR_kwDOAMm_X85I1FYF 7488 Attempt to reproduce #7079 in CI jhamman 2443309 closed 0     1 2023-01-30T15:57:44Z 2023-09-20T00:11:39Z 2023-09-19T23:52:20Z MEMBER   0 pydata/xarray/pulls/7488
  • [x] towards understanding #7079
  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7488/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1673579421 I_kwDOAMm_X85jwMud 7765 Revisiting Xarray's Minimum dependency versions policy jhamman 2443309 open 0     9 2023-04-18T17:46:03Z 2023-09-19T15:54:09Z   MEMBER      

What is your issue?

We have recently had a few reports expressing frustration with our minimum dependency version policy. This issue aims to discuss if changes to our policy are needed.

Background

  1. Our current minimum dependency versions policy reads:

    Minimum dependency versions

    Xarray adopts a rolling policy regarding the minimum supported version of its dependencies:

    • Python: 24 months (NEP-29)
    • numpy: 18 months (NEP-29)
    • all other libraries: 12 months

    This means the latest minor (X.Y) version from N months prior. Patch versions (x.y.Z) are not pinned, and only the latest available at the moment of publishing the xarray release is guaranteed to work.

    You can see the actual minimum tested versions:

    pydata/xarray

  2. We have a script that checks versions and dates and advises us on when to bump minimum versions.

    https://github.com/pydata/xarray/blob/main/ci/min_deps_check.py

Diagnosis

  1. Our policy and min_deps_check.py script have greatly reduced our deliberations on which versions to support and the maintenance burden of supporting out dated versions of dependencies.
  2. We likely need to update our policy and min_deps_check.py script to properly account for Python's SEMVER bugfix releases. Depending on how you interpret the policy, we may have prematurely dropped Python 3.8 (see below for a potential action item).

Discussion questions

  1. Is the policy working as designed, are the support windows documented above still appropriate for where Xarray is at today?
  2. Is this policy still in line with how our peer libraries are operating?

Action items

  1. There is likely a bug in the patch-version comparison in the minimum Python version. Moreover, we don't differentiate between bugfix and security releases. I suggest we have a special policy for our minimum supported Python version that reads something like: > Python: 24 months from the last bugfix release (security releases are not considered).

xref: https://github.com/pydata/xarray/issues/4179, https://github.com/pydata/xarray/pull/7461

Moderators note: I suspect a number of folks will want to comment on this issue with "Please support Python 3.8 for longer...". If that is the nature of your comment, please just give this a ❤️ reaction rather than filling up the discussion.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7765/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
95114700 MDU6SXNzdWU5NTExNDcwMA== 475 API design for pointwise indexing jhamman 2443309 open 0     39 2015-07-15T06:04:47Z 2023-08-23T12:37:23Z   MEMBER      

There have been a number of threads discussing possible improvements/extensions to xray indexing. The current indexing behavior for isel is orthogonal indexing - in other words, each coordinate is treated independently (see #214 and #411 for more discussion).

So the question: what is the best way to incorporate diagonal or pointwise indexing in xray? I see two main goals / applications: 1. support simple form of numpy style integer array indexing 2. support pointwise array indexing along coordinates via computation of nearest-neighbor indexes - I think this can also be thought of as a form of resampling.

Input from @WeatherGod, @wholmgren, and @shoyer would be great.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/475/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1822860755 PR_kwDOAMm_X85Wd1dG 8022 (chore) min versions bump jhamman 2443309 closed 0     1 2023-07-26T17:31:12Z 2023-07-27T04:27:44Z 2023-07-27T04:27:40Z MEMBER   0 pydata/xarray/pulls/8022
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8022/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1383037028 I_kwDOAMm_X85Sb3hk 7071 Should Xarray have a read_csv method? jhamman 2443309 open 0     5 2022-09-22T21:28:46Z 2023-06-13T01:45:33Z   MEMBER      

Is your feature request related to a problem?

Most users of Xarray/Pandas start with an IO call of some sort. In Xarray, our open_dataset(..., engine=engine) interface provides an extensible interface to more complex backends (NetCDF, Zarr, GRIB, etc.). For tabular data types, we have traditionally pointed users to Pandas. While this works for users that are comfortable with Pandas, it is an added hurdle to users getting started with Xarray.

Describe the solution you'd like

It should be easy and obvious how a user can get a CSV (or other tabular data) into Xarray. Ideally, we don't force the user to use a third part library.

Describe alternatives you've considered

I can think of three possible solutions:

  1. We expose a new function read_csv, it may do something like this:

python def read_csv(filepath_or_buffer, **kwargs): df = pd.read_csv(filepath_or_buffer, **kwargs) ds = xr.Dataset.from_dataframe(df) return ds

  1. We develop a storage backend to support reading CSV-like data:

python ds = open_dataset(filepath, engine='csv')

  1. We copy (1) as an example and put it in Xarray's documentation. Explicitly showing how you would use Pandas to produce a Dataset from a CSV.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7071/reactions",
    "total_count": 5,
    "+1": 5,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1705857851 PR_kwDOAMm_X85QS3VM 7836 Fix link to xarray twitter page jhamman 2443309 closed 0     0 2023-05-11T13:53:14Z 2023-05-11T23:00:36Z 2023-05-11T23:00:35Z MEMBER   0 pydata/xarray/pulls/7836
  • [x] Closes #7835

Thanks @pierre-manchon for the report!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7836/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
    xarray 13221727 pull
1699112787 PR_kwDOAMm_X85P8LbF 7825 test: Fix test_write_read_select_write for Zarr V3 jhamman 2443309 closed 0     1 2023-05-07T15:26:56Z 2023-05-10T02:43:22Z 2023-05-10T02:43:22Z MEMBER   0 pydata/xarray/pulls/7825

Previously, the first context manager in this test was closed before accessing the data. This resulted in key errors when trying to access the opened dataset.

  • [x] Fixes the Zarr V3 parts of https://github.com/pydata/xarray/issues/7707
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7825/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1575494367 I_kwDOAMm_X85d6CLf 7515 Aesara as an array backend in Xarray jhamman 2443309 open 0     11 2023-02-08T05:15:35Z 2023-05-01T14:40:39Z   MEMBER      

Is your feature request related to a problem?

I recently learned about a meta-tensor library called Aesara which got me wondering if it would be a good array backend for Xarray.

Aesara is a Python library that allows you to define, optimize/rewrite, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It is composed of different parts: - Symbolic representation of mathematical operations on arrays - Speed and stability optimization - Efficient symbolic differentiation - Powerful rewrite system to programmatically modify your models - Extendable backends. Aesara currently compiles to C, Jax and Numba.

xref: https://github.com/aesara-devs/aesara/issues/352, @OriolAbril, @twiecki

Has anyone looked into this yet?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7515/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
1550109629 PR_kwDOAMm_X85ILNM- 7461 bump minimum versions, drop py38 jhamman 2443309 closed 0     18 2023-01-19T23:38:42Z 2023-04-21T14:07:09Z 2023-01-26T16:57:10Z MEMBER   0 pydata/xarray/pulls/7461

This updates our minimum versions based on our 24/18/12 month policy.

Details are shown below.

  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
``` ❯ ./ci/min_deps_check.py ./ci/requirements/min-all-deps.yml ... Package Required Policy Status ----------------- -------------------- -------------------- ------ python 3.9 (2020-10-07) 3.9 (2020-10-07) = boto3 1.20 (2021-11-08) 1.20 (2021-11-08) = bottleneck 1.3 (2021-01-20) 1.3 (2021-01-20) = cartopy 0.20 (2021-09-17) 0.20 (2021-09-17) = cdms2 3.1 (- ) - (- ) (!) cfgrib 0.9 (2019-02-25) 0.9 (2019-02-25) = cftime 1.5 (2021-05-20) 1.5 (2021-05-20) = dask-core 2022.1 (2022-01-14) 2022.1 (2022-01-14) = distributed 2022.1 (2022-01-14) 2022.1 (2022-01-14) = flox 0.5 (2022-05-02) 0.3 (2021-12-28) > (!) h5netcdf 0.13 (2022-01-12) 0.13 (2022-01-12) = h5py 3.6 (2021-11-17) 3.6 (2021-11-17) = hdf5 1.12 (2021-01-01) 1.12 (2021-01-01) = iris 3.1 (2021-11-23) 3.1 (2021-11-23) = lxml 4.7 (2021-12-14) 4.7 (2021-12-14) = matplotlib-base 3.5 (2021-11-17) 3.5 (2021-11-17) = nc-time-axis 1.4 (2021-10-23) 1.4 (2021-10-23) = netcdf4 1.5.7 (2021-04-19) 1.5 (2021-04-19) = (w) numba 0.55 (2022-01-14) 0.55 (2022-01-14) = numpy 1.21 (2021-06-22) 1.21 (2021-06-22) = packaging 21.3 (2021-11-18) 21.3 (2021-11-18) = pandas 1.3 (2021-07-02) 1.3 (2021-07-02) = pint 0.18 (2021-10-26) 0.18 (2021-10-26) = pseudonetcdf 3.2 (2021-10-16) 3.2 (2021-10-16) = pydap 3.2 (2020-10-13) 3.2 (2020-10-13) = rasterio 1.2 (2021-09-02) 1.2 (2021-09-02) = scipy 1.7 (2021-06-27) 1.7 (2021-06-27) = seaborn 0.11 (2020-09-19) 0.11 (2020-09-19) = sparse 0.13 (2021-08-28) 0.13 (2021-08-28) = toolz 0.11 (2020-09-23) 0.11 (2020-09-23) = typing_extensions 4.0 (2021-11-17) 4.0 (2021-11-17) = zarr 2.10 (2021-09-19) 2.10 (2021-09-19) = Errors: ------- 1. not found in conda: cdms2 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7461/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
110820316 MDU6SXNzdWUxMTA4MjAzMTY= 620 Don't squeeze DataArray before plotting jhamman 2443309 open 0     5 2015-10-10T22:26:51Z 2023-04-08T17:20:50Z   MEMBER      

As was discussed in #608, we should honor the shape of the DataArray when selecting plot methods. Currently, we're squeezing the DataArray before plotting. This ends up plotting a line plot for a DataArray with shape (N, 1). We should find a way to plot a pcolormesh or imshow plot in this case. The trick will be figuring out what to do in _infer_interval_breaks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/620/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1651243130 PR_kwDOAMm_X85Nclrx 7708 deprecate encoding setters jhamman 2443309 open 0     0 2023-04-03T02:59:15Z 2023-04-03T22:12:31Z   MEMBER   0 pydata/xarray/pulls/7708
  • [x] Toward #6323
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7708/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1644566201 PR_kwDOAMm_X85NGfRt 7693 add to_zarr method to dataarray jhamman 2443309 closed 0     0 2023-03-28T19:49:00Z 2023-04-03T15:53:39Z 2023-04-03T15:53:35Z MEMBER   0 pydata/xarray/pulls/7693

This PR add's the to_zarr method to Xarray's DataArray objects. This allows users to roundtrip named and unnamed DataArrays to Zarr without having to first convert to a Dataset.

  • [x] Closes #7692
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7693/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1644429340 I_kwDOAMm_X85iBAAc 7692 Feature proposal: DataArray.to_zarr() jhamman 2443309 closed 0     5 2023-03-28T18:00:24Z 2023-04-03T15:53:37Z 2023-04-03T15:53:37Z MEMBER      

Is your feature request related to a problem?

It would be nice to mimic the behavior of DataArray.to_netcdf for the Zarr backend.

Describe the solution you'd like

This should be possible: python xr.open_dataarray('file.nc').to_zarr('store.zarr')

Describe alternatives you've considered

None.

Additional context

xref DataArray.to_netcdf issue/PR: #915 / #990

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7692/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1642922680 PR_kwDOAMm_X85NA9uq 7689 add reset_encoding to dataset/dataarray/variable jhamman 2443309 closed 0     6 2023-03-27T22:34:27Z 2023-03-30T21:28:53Z 2023-03-30T21:09:16Z MEMBER   0 pydata/xarray/pulls/7689
  • [x] Closes #7686
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7689/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1642635191 I_kwDOAMm_X85h6J-3 7686 Add reset_encoding to Dataset and DataArray objects jhamman 2443309 closed 0     2 2023-03-27T18:51:39Z 2023-03-30T21:09:17Z 2023-03-30T21:09:17Z MEMBER      

Is your feature request related to a problem?

Xarray maintains the encoding of datasets read from most of its supported backend formats (e.g. NetCDF, Zarr, etc.). This is very useful when you want to perfectly roundtrip but it often gets in the way, causing conflicts when writing a modified dataset or when appending to another dataset. Most of the time, the solution is to just remove the encoding from the dataset and continue on. The following code sample is found in a number of issues that reference this problem.

```python for v in list(ds.coords.keys()): if ds.coords[v].dtype == object: ds[v].encoding.clear()

for v in list(ds.variables.keys()):
    if ds[v].dtype == object:
        ds[v].encoding.clear()

```

A sample of issues that show variants of this problem.

  • https://github.com/pydata/xarray/issues/3476
  • https://github.com/pydata/xarray/issues/3739
  • https://github.com/pydata/xarray/issues/4380
  • https://github.com/pydata/xarray/issues/5219
  • https://github.com/pydata/xarray/issues/5969
  • https://github.com/pydata/xarray/issues/6329
  • https://github.com/pydata/xarray/issues/6352

Describe the solution you'd like

In many cases, the solution to these problems is to leave the original dataset encoding behind and either use Xarray's default encoding (or the backends default) or to specify one's own encoding options. Both cases would benefit from a convenience method to reset the original encoding. Something like would serve this process:

python ds = xr.open_dataset(...).reset_encoding()

Describe alternatives you've considered

Variations on the API above could also be considered:

python xr.open_dataset(..., keep_encoding=False)

or even: python with xr.set_options(keep_encoding=False): ds = xr.open_dataset(...)

We can/should also do a better job of surfacing inconsistent encoding in our backends (e.g. to_netcdf).

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7686/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1624835973 PR_kwDOAMm_X85MEd7D 7631 Remove incomplete sentence in IO docs jhamman 2443309 closed 0     0 2023-03-15T06:22:21Z 2023-03-15T12:04:08Z 2023-03-15T12:04:06Z MEMBER   0 pydata/xarray/pulls/7631
  • [x] Closes #7624
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7631/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1558497871 I_kwDOAMm_X85c5MpP 7479 Use NumPy's SupportsDType jhamman 2443309 closed 0     0 2023-01-26T17:21:32Z 2023-02-28T23:23:47Z 2023-02-28T23:23:47Z MEMBER      

What is your issue?

Now that we've bumped our minimum NumPy version to 1.21, we can address this comment:

https://github.com/pydata/xarray/blob/b21f62ee37eea3650a58e9ffa3a7c9f4ae83006b/xarray/core/types.py#L57-L62

I decided not to tackle this as part of #7461 but we may be able to do something like this:

python from numpy.typing._dtype_like import _DTypeLikeNested, _ShapeLike, _SupportsDType

xref: #6834 cc @headtr1ck

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7479/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1549639421 PR_kwDOAMm_X85IJnRV 7458 Lint with ruff jhamman 2443309 closed 0     1 2023-01-19T17:40:47Z 2023-01-30T18:12:18Z 2023-01-30T18:12:13Z MEMBER   0 pydata/xarray/pulls/7458

This switches our primary linter to Ruff. As adervertised, Ruff is very fast. Plust we get the benefit of using a single tool that combines the previous functionality of pyflakes, isort, and pyupgrade.

  • [x] Closes https://twitter.com/TEGNicholasCode/status/1613226956887056385
  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst

cc @max-sixty, @TomNicholas

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7458/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 1
}
    xarray 13221727 pull
1532648441 PR_kwDOAMm_X85HWTes 7436 pin scipy version in doc environment jhamman 2443309 closed 0     1 2023-01-13T17:08:50Z 2023-01-13T17:37:59Z 2023-01-13T17:37:59Z MEMBER   0 pydata/xarray/pulls/7436

This should fix our doc build.

  • [x] Closes #7434
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7436/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
681291824 MDU6SXNzdWU2ODEyOTE4MjQ= 4348 maximum recursion with dask and pydap backend jhamman 2443309 open 0     2 2020-08-18T19:47:26Z 2022-12-15T18:47:38Z   MEMBER      

What happened:

I'm getting a maximum recursion error when using the Pydap backend with Dask distributed. It seems the we're failing to successfully pickle the pydap backend store.

What you expected to happen:

Successful parallel loading of opendap dataset.

Minimal Complete Verifiable Example:

```python import xarray as xr from dask.distributed import Client

client = Client()

ds = xr.open_dataset('http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_pet_1958_CurrentYear_GLOBE.nc', engine='pydap', chunks={'lat': 1024, 'lon': 1024, 'time': 12}).load() ```

yields:

Killed worker on the client:

--------------------------------------------------------------------------- KilledWorker Traceback (most recent call last) <ipython-input-4-713e4114ee96> in <module> 4 client = Client() 5 ----> 6 ds = xr.open_dataset('http://thredds.northwestknowledge.net:8080/thredds/dodsC/agg_terraclimate_pet_1958_CurrentYear_GLOBE.nc', 7 engine='pydap', chunks={'lat': 1024, 'lon': 1024, 'time': 12}).load() ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/xarray/core/dataset.py in load(self, **kwargs) 652 653 # evaluate all the dask arrays simultaneously --> 654 evaluated_data = da.compute(*lazy_data.values(), **kwargs) 655 656 for k, data in zip(lazy_data, evaluated_data): ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/dask/base.py in compute(*args, **kwargs) 435 keys = [x.__dask_keys__() for x in collections] 436 postcomputes = [x.__dask_postcompute__() for x in collections] --> 437 results = schedule(dsk, keys, **kwargs) 438 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) 439 ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 2594 should_rejoin = False 2595 try: -> 2596 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 2597 finally: 2598 for f in futures.values(): ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous) 1886 else: 1887 local_worker = None -> 1888 return self.sync( 1889 self._gather, 1890 futures, ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs) 775 return future 776 else: --> 777 return sync( 778 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs 779 ) ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 346 if error[0]: 347 typ, exc, tb = error[0] --> 348 raise exc.with_traceback(tb) 349 else: 350 return result[0] ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/utils.py in f() 330 if callback_timeout is not None: 331 future = asyncio.wait_for(future, callback_timeout) --> 332 result[0] = yield future 333 except Exception as exc: 334 error[0] = sys.exc_info() ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/tornado/gen.py in run(self) 733 734 try: --> 735 value = future.result() 736 except Exception: 737 exc_info = sys.exc_info() ~/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1751 exc = CancelledError(key) 1752 else: -> 1753 raise exception.with_traceback(traceback) 1754 raise exc 1755 if errors == "skip": KilledWorker: ('open_dataset-54c87cd25bf4e9df37cb3030e6602974pet-d39db76f8636f3803611948183e52c13', <Worker 'tcp://127.0.0.1:57343', name: 0, memory: 0, processing: 1>)

and the above mentioned recursion error on the workers:

distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Registered to: tcp://127.0.0.1:57334 distributed.worker - INFO - ------------------------------------------------- distributed.worker - ERROR - maximum recursion depth exceeded Traceback (most recent call last): File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/worker.py", line 931, in handle_scheduler await self.handle_stream( File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/core.py", line 455, in handle_stream msgs = await comm.read() File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/comm/tcp.py", line 211, in read msg = await from_frames( File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/comm/utils.py", line 75, in from_frames res = _from_frames() File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/comm/utils.py", line 60, in _from_frames return protocol.loads( File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/core.py", line 130, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 269, in deserialize return loads(header, frames) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 59, in pickle_loads return pickle.loads(b"".join(frames)) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 59, in loads return pickle.loads(x) File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/pydap/model.py", line 235, in __getattr__ return self.attributes[attr] File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/pydap/model.py", line 235, in __getattr__ return self.attributes[attr] File "/Users/jhamman/miniconda3/envs/carbonplan38/lib/python3.8/site-packages/pydap/model.py", line 235, in __getattr__ return self.attributes[attr] [Previous line repeated 973 more times] RecursionError: maximum recursion depth exceeded distributed.worker - INFO - Connection to scheduler broken. Reconnecting...

Anything else we need to know?:

I've found this to be reproducible with a few kinds of Dask clusters. Setting Client(processes=False) does correct the problem at the expense of multiprocessiing.

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.2 | packaged by conda-forge | (default, Mar 5 2020, 16:54:44) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 19.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.1 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.3 pydap: installed h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.1.1.2 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.0.28 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.13.0 distributed: 2.13.0 matplotlib: 3.2.1 cartopy: 0.17.0 seaborn: 0.10.0 numbagg: installed setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: installed pytest: 5.4.1 IPython: 7.13.0 sphinx: 3.1.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4348/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1456026667 PR_kwDOAMm_X85DQfj3 7301 deprecate pynio backend jhamman 2443309 closed 0     3 2022-11-19T00:15:11Z 2022-11-26T15:41:07Z 2022-11-26T15:40:36Z MEMBER   0 pydata/xarray/pulls/7301

This PR finally deprecates the PyNIO backend. PyNIO is technically in maintenance mode but it hasn't had any maintenance in 4+ years. Its conda packages cannot be installed in any of our test environments. I have added a future warning to the NioDataStore.__init__ method and noted the deprecation in the IO docs.

  • [x] Closes #4491
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7301/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1455786576 PR_kwDOAMm_X85DPqH_ 7300 bump min deps jhamman 2443309 closed 0     2 2022-11-18T20:53:45Z 2022-11-19T04:15:23Z 2022-11-19T04:15:23Z MEMBER   0 pydata/xarray/pulls/7300

The min versions checks are failing in #6475. This hopefully fixes those failures.

  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7300/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1217821452 PR_kwDOAMm_X8425iyT 6530 Doc index update jhamman 2443309 closed 0     2 2022-04-27T20:00:10Z 2022-05-31T18:28:13Z 2022-05-31T18:28:13Z MEMBER   0 pydata/xarray/pulls/6530

In light of the new splash page site (https://xarray.dev), this PR updates the documentation site's index page to simply provide pointers to key parts of Xarray's documentation.

TODOs: - [x] Get feedback on the content and layout - [x] Update the Icon SVGs (these along with the layout were borrowed, in part, from Pandas).

cc @andersy005, @rabernat

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6530/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1247083449 PR_kwDOAMm_X844ZETT 6635 Feature/to dict encoding jhamman 2443309 closed 0     0 2022-05-24T20:21:24Z 2022-05-26T19:50:53Z 2022-05-26T19:17:35Z MEMBER   0 pydata/xarray/pulls/6635

This adds an encoding option to Xarray's to_dict methods.

  • [x] Closes #6634
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6635/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1247014308 I_kwDOAMm_X85KU-2k 6634 Optionally include encoding in Dataset to_dict jhamman 2443309 closed 0     0 2022-05-24T19:10:01Z 2022-05-26T19:17:35Z 2022-05-26T19:17:35Z MEMBER      

Is your feature request related to a problem?

When using Xarray's to_dict methods to record a Dataset's schema, it would be useful to (optionally) include encoding in the output.

Describe the solution you'd like

The feature request may be resolved by simply adding an encoding keyword argument. This may look like this:

python ds = xr.Dataset(...) ds.to_dict(data=False, encoding=True)

Describe alternatives you've considered

It is currently possible to manually extract encoding attributes but this is a less desirable solution.

xref: https://github.com/pangeo-forge/pangeo-forge-recipes/issues/256

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6634/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
636449225 MDU6SXNzdWU2MzY0NDkyMjU= 4139 [Feature request] Support file-like objects in open_rasterio jhamman 2443309 closed 0     2 2020-06-10T18:11:26Z 2022-04-19T17:15:21Z 2022-04-19T17:15:20Z MEMBER      

With some acrobatics, it is possible to open file-like objects to rasterio. It would be useful if xarray supported this workflow, particularly for working with cloud optimized geotiffs and fs-spec.

MCVE Code Sample

```python with open('my_data.tif', 'rb') as f: da = xr.open_rasterio(f)

```

Expected Output

DataArray -> equivalent to xr.open_rasterio('my_data.tif')

Problem Description

We only currently allow str, rasterio.DatasetReader, or rasterio.WarpedVRT as inputs to open_rasterio.

Versions

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: 2a288f6ed4286910fcf3ab9895e1e9cbd44d30b4 python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 07:56:27) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.15.2.dev68+gb896a68f pandas: 1.0.4 numpy: 1.18.5 scipy: None netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.5 cfgrib: None iris: None bottleneck: None dask: 2.18.1 distributed: 2.18.0 matplotlib: None cartopy: None seaborn: None numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 20.1 conda: None pytest: 5.4.3 IPython: 7.13.0 sphinx: 3.0.3

xref: https://github.com/pangeo-data/pangeo-datastore/issues/109

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4139/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1118974427 PR_kwDOAMm_X84x0GoS 6214 update HOW_TO_RELEASE.md jhamman 2443309 closed 0     2 2022-01-31T05:01:14Z 2022-03-03T13:05:04Z 2022-01-31T18:35:27Z MEMBER   0 pydata/xarray/pulls/6214

This PR updates our step-by-step guide for releasing Xarray. It makes a few minor changes to account for #6206 and officially documents the switch to CALVER. This should be clearly documented in whats-new.rst as part of the first release utilizing CALVER.

Also, note that this should probably wait until we make the 0.20.1 patch release.

  • [x] Closes #6176, #6206
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6214/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1108564253 I_kwDOAMm_X85CE1kd 6176 Xarray versioning to switch to CalVer jhamman 2443309 closed 0     10 2022-01-19T21:09:45Z 2022-03-03T04:32:10Z 2022-01-31T18:35:27Z MEMBER      

Xarray is planning to switch to Calendar versioning (calver). This issue serves as a general announcement.

The idea has come up in multiple developer meetings (#4001) and is part of a larger effort to increase our release cadence (#5927). Today's developer meeting included unanimous consent for the change. Other projects in Xarray's ecosystem have also made this change recently (e.g. https://github.com/dask/community/issues/100). While it is likely we will make this change in the next release or two, users and developers should feel free to voice objections here.

The proposed calver implementation follows the same schema as the Dask project, that is; YYYY.MM.X (4 digit year, two digit month, one digit micro zero-indexed version. For example, the code block below provides comparison of the current and future version tags:

```python In [1]: import xarray as xr

current

In [2]: xr.version Out[2]: '0.19.1'

proposed

In [2]: xr.version Out[2]: '2022.01.0' ```

cc @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6176/reactions",
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1129263296 PR_kwDOAMm_X84yVrKT 6262 [docs] update urls throughout documentation jhamman 2443309 closed 0     0 2022-02-10T00:41:54Z 2022-02-10T19:44:57Z 2022-02-10T19:44:52Z MEMBER   0 pydata/xarray/pulls/6262

We are in the process of moving our documentation url from https://xarray.pydata.org to https://docs.xarray.dev. This PR makes that change throughout the documentation. Additionally, I corrected some broken links and fixed some missing https urls in the process.

cc @andersy005

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6262/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
636451398 MDExOlB1bGxSZXF1ZXN0NDMyNjIxMjgy 4140 support file-like objects in xarray.open_rasterio jhamman 2443309 closed 0     6 2020-06-10T18:15:18Z 2021-12-03T19:22:14Z 2021-11-15T16:17:59Z MEMBER   0 pydata/xarray/pulls/4140
  • [x] Closes #4139
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API

cc @scottyhq and @martindurant

xref: https://github.com/pangeo-data/pangeo-datastore/issues/109

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4140/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1047795001 PR_kwDOAMm_X84uPpLm 5956 Create CITATION.cff jhamman 2443309 closed 0     1 2021-11-08T18:40:15Z 2021-11-09T20:56:25Z 2021-11-09T18:15:01Z MEMBER   0 pydata/xarray/pulls/5956

This adds a new file to the root of the Xarray repository, CITATION.cff. GitHub recently added support for citation files and adding this file will add a UI feature to the Xarray GitHub repo.

The author list is based on the latest Zenodo release (0.20.1) and I did my best to find everyone's ORCIDs.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5956/reactions",
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 3,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
139064764 MDU6SXNzdWUxMzkwNjQ3NjQ= 787 Add Groupby and Rolling methods to docs jhamman 2443309 closed 0     2 2016-03-07T19:10:26Z 2021-11-08T19:51:00Z 2021-11-08T19:51:00Z MEMBER      

The injected apply/reduce methods for the Groupby and Rolling objects are not shown in the api documentation page. While there is obviously a fair bit of overlap between the similar DataArray/Dataset methods, it would help users to know what methods are available to the Groupby and Rolling methods if we explicitly listed them in the documentation. Suggestions on the best format to show these mehtods (e.g. Rolling.mean) are welcomed.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/787/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
985498976 MDExOlB1bGxSZXF1ZXN0NzI0Nzg1NjIz 5759 update development roadmap jhamman 2443309 closed 0     1 2021-09-01T18:50:15Z 2021-09-07T15:30:49Z 2021-09-07T15:03:06Z MEMBER   0 pydata/xarray/pulls/5759
  • [x] Passes pre-commit run --all-files

cc @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5759/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
663968779 MDU6SXNzdWU2NjM5Njg3Nzk= 4253 [community] Backends refactor meeting jhamman 2443309 closed 0     13 2020-07-22T18:39:19Z 2021-03-11T20:42:33Z 2021-03-11T20:42:33Z MEMBER      

In today's dev call, we opted to schedule a separate meeting to discuss the backends refactor that BOpen (@alexamici and his team) is beginning to work on. This issue is meant to coordinate the scheduling of this meeting. To that end, I've created the following Doodle Poll to help choose a time: https://doodle.com/poll/4mtzxncka7gee4mq

Anyone from @pydata/xarray should feel free to join if there is interest. At a minimum, I'm hoping to have @alexamici, @aurghs, @shoyer, and @rabernat there.

Please respond to the poll by COB tomorrow so I can quickly get the meeting on the books. Thanks!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4253/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
473795509 MDExOlB1bGxSZXF1ZXN0MzAxODY2NzAx 3166 [Feature] Backend entrypoint jhamman 2443309 closed 0     3 2019-07-28T23:01:47Z 2021-01-12T16:41:23Z 2021-01-12T16:41:23Z MEMBER   0 pydata/xarray/pulls/3166

In this PR, I'm experimenting with using the entrypoints package to support 3rd party backends. This does not attempt to solidify the API for what the store is, I feel like that should happen in a second PR. Here's how it would work...

In @rabernat's xmitgcm package, there is a _MDSDataStore that inherits from xarray.backends.common.AbstractDataStore. To allow reading mds datasets directly in xarray.open_dataset, xmitgcm would add the following lines to its setup.py file:

python setup( ... entry_points={ 'xarray.backends': [ 'mds = xmitgcm.mds_store:_MDSDataStore', ... ] } )

Xarray would then be able to discover this backend at runtime and users could use the store directly in open_dataset calls like this:

python ds = xr.open_dataset('./path/to/file.mds', engine='mds', backend_kwargs={...})

Note: I recognize that xmitgcm.open_mdsdataset has a bunch of other user options that I'm likely ignoring here but this is meant just as an illustration.

Now a list of caveats and things to consider:

  1. I have only done this for open_dataset, not for to_netcdf. We may want to consider more generic serialization method that allows for plug-able writers.
  2. open_dataset has some special handling for some readers (lock and group selection, file-like objects, etc.). We should work toward moving as much of that logic into the Store objects as possible.
  3. We should decide what to do when a 3rd party plugin conflicts with an existing backend. For example, someone could include an entrypoint with the key of netcdf4.

  • [x] Partially closes #1970
  • [ ] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3166/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
110807626 MDU6SXNzdWUxMTA4MDc2MjY= 619 Improve plot aspect handling when using cartopy jhamman 2443309 open 0     5 2015-10-10T17:43:55Z 2021-01-03T16:17:29Z   MEMBER      

This applies to single plots and FacetGrids.

The current plotting behavior when using a projection that changes the plot aspect is as follows:

``` Python from xray.tutorial import load_dataset

ds = load_dataset('air_temperature')

ax = plt.subplot(projection=ccrs.LambertConformal()) ds.air.isel(time=0).plot(transform=ccrs.PlateCarree()) ax.coastlines() ax.gridlines() ```

Python fg = ds.air.isel(time=slice(0, 9)).plot(col='time', col_wrap=3, transform=ccrs.PlateCarree(), subplot_kws=dict(projection=ccrs.LambertConformal())) for ax in fg.axes.flat: ax.coastlines() ax.gridlines()

There are two problems here, I think both are related to the aspect of the subplot: 1. In the single case, the subplot aspect is correct but the colorbar is not scaled appropriately 2. In the FacetGrid case, the subplot aspects are not correct but the colorbar is.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/619/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
140264913 MDU6SXNzdWUxNDAyNjQ5MTM= 792 ENH: Don't infer pcolormesh interval breaks for unevenly spaced coordiantes jhamman 2443309 open 0     7 2016-03-11T19:06:30Z 2020-12-29T17:50:33Z   MEMBER      

Based on discussion in #781 and #782, it seems like a bad idea to infer (guess) the spacing of coordinates when they are unevenly spaced. As @ocefpaf points out:

guessing should be an active user choice, not the automatic behavior.

So the options moving forward are to 1. never infer the interval breaks and be okay with pcolormesh and imshow producing dissimilar plots, or 2. only infer the interval breaks when the coordinates are evenly spaced.

cc @clarkfitzg

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/792/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
302806158 MDU6SXNzdWUzMDI4MDYxNTg= 1970 API Design for Xarray Backends jhamman 2443309 open 0     9 2018-03-06T18:02:05Z 2020-10-06T06:15:56Z   MEMBER      

It has come time to formalize the API for Xarray backends. We now have the following backends implemented in xarray:

| Backend | Read | Write | |----------------|------|-------| | netcdf4-python | x | x | | h5netcdf | x | x | | pydap | x | | | pynio | x | | | scipy | x | x | | rasterio* | x | | | zarr | x | x |

* currently does not inherit from backends.AbstractDatastore

And there are conversations about adding additional backends, for example:

  • TileDB: https://github.com/pangeo-data/storage-benchmarks/issues/6
  • PseudoNetCDF: #1905

However, as anyone who has worked on implementing or optimizing any of our current backends can attest, the existing DataStore API is not particularly user/developer friendly. @shoyer asked me to open an issue to discuss what a more user friendly backend API would look like so that is what this issue will be. I have left out a thorough description of the current API because, well, I don't think it can done in a succinct manner (thats the problem).

Note that @shoyer started down a API refactor some time ago in #1087 but that effort has stalled, presumably because we don't have a well defined set of development goals here.

cc @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1970/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
287223508 MDU6SXNzdWUyODcyMjM1MDg= 1815 apply_ufunc(dask='parallelized') with multiple outputs jhamman 2443309 closed 0     17 2018-01-09T20:40:52Z 2020-08-19T06:57:55Z 2020-08-19T06:57:55Z MEMBER      

I have an application where I'd like to use apply_ufunc with dask on a function that requires multiple inputs and outputs. This was left as a TODO item in the #1517. However, its not clear to me looking at the code how this can be done given the current form of dask's atop. I'm hoping @shoyer has already thought of a clever solution here...

Code Sample, a copy-pastable example if possible

```python def func(foo, bar):

assert foo.shape == bar.shape
spam = np.zeros_like(bar)
spam2 = np.full_like(bar, 2)


return spam, spam2

foo = xr.DataArray(np.zeros((10, 10))).chunk() bar = xr.DataArray(np.zeros((10, 10))).chunk() + 5

xrfunc = xr.apply_ufunc(func, foo, bar, output_core_dims=[[], []], dask='parallelized') ```

Problem description

This currently raises a NotImplementedError.

Expected Output

Multiple dask arrays. In my example above, two dask arrays.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.4.86+ machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.0+dev.c92020a pandas: 0.22.0 numpy: 1.13.3 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: 0.5.0 Nio: None zarr: 2.2.0a2.dev176 bottleneck: 1.2.1 cyordereddict: None dask: 0.16.0 distributed: 1.20.2+36.g7387410 matplotlib: 2.1.1 cartopy: None seaborn: None setuptools: 38.4.0 pip: 9.0.1 conda: 4.3.29 pytest: 3.3.2 IPython: 6.2.1 sphinx: None

cc @mrocklin, @arbennett

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1815/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
588165025 MDExOlB1bGxSZXF1ZXN0MzkzOTY0MzE4 3897 expose a few zarr backend functions as semi-public api jhamman 2443309 closed 0     3 2020-03-26T05:24:22Z 2020-08-10T15:20:31Z 2020-03-27T22:37:26Z MEMBER   0 pydata/xarray/pulls/3897
  • [x] Fixes #3851
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3897/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
663962183 MDExOlB1bGxSZXF1ZXN0NDU1MjgyNTI2 4252 update docs to point to xarray-contrib and xarray-tutorial jhamman 2443309 closed 0     1 2020-07-22T18:27:29Z 2020-07-23T16:34:18Z 2020-07-23T16:34:10Z MEMBER   0 pydata/xarray/pulls/4252
  • [x] Closes #1850
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4252/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
264049503 MDU6SXNzdWUyNjQwNDk1MDM= 1614 Rules for propagating attrs and encoding jhamman 2443309 open 0     15 2017-10-09T22:56:02Z 2020-04-05T19:12:10Z   MEMBER      

We need to come up with some clear rules for when and how xarray should propagate metadata (attrs/encoding). This has come up routinely (e.g. #25, #138, #442, #688, #828, #988, #1009, #1271, #1297, #1586) and we don't have a clear direction as to when to keep/drop metadata.

I'll take a first cut:

| operation | attrs | encoding | status | |------------ |------------ |------------ |------------ | | reduce | drop | drop | | | arithmetic | drop | drop | implemented | | copy | keep | keep | | | concat | keep first | keep first | implemented | | slice | keep | drop | | | where | keep | keep | |

cc @shoyer (following up on https://github.com/pydata/xarray/issues/1586#issuecomment-334954046)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1614/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
318988669 MDU6SXNzdWUzMTg5ODg2Njk= 2094 Drop win-32 platform CI from appveyor matrix? jhamman 2443309 closed 0     3 2018-04-30T18:29:17Z 2020-03-30T20:30:58Z 2020-03-24T03:41:24Z MEMBER      

Conda-forge has dropped support for 32-bit windows builds (https://github.com/conda-forge/cftime-feedstock/issues/2#issuecomment-385485144). Do we want to continue testing against this environment? The point becomes moot after #1876 gets wrapped up in ~7 months.

xref: https://github.com/pydata/xarray/pull/1252

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2094/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
578017585 MDU6SXNzdWU1NzgwMTc1ODU= 3851 Exposing Zarr backend internals as semi-public API jhamman 2443309 closed 0     3 2020-03-09T16:04:49Z 2020-03-27T22:37:26Z 2020-03-27T22:37:26Z MEMBER      

We recently built a prototype REST API for serving xarray datasets via a Fast-API application (see #3850 for more details). In the process of doing this, we needed to use a few internal functions in Xarray's Zarr backend:

python from xarray.backends.zarr import ( _DIMENSION_KEY, _encode_zarr_attr_value, _extract_zarr_variable_encoding, encode_zarr_variable, ) from xarray.core.pycompat import dask_array_type from xarray.util.print_versions import get_sys_info, netcdf_and_hdf5_versions

Obviously, none of these imports are really meant for use outside of Xarray's backends so I'd like to discuss how we may go about exposing these functions (or variables) as semi-public (advanced use) API features. Thoughts?

cc @rabernat

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3851/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
197920258 MDU6SXNzdWUxOTc5MjAyNTg= 1188 Should we deprecate the compat and encoding constructor arguments? jhamman 2443309 closed 0     5 2016-12-28T21:41:26Z 2020-03-24T14:34:37Z 2020-03-24T14:34:37Z MEMBER      

In https://github.com/pydata/xarray/pull/1170#discussion_r94078121, @shoyer writes:

...I would consider deprecating the encoding argument to DataArray instead. It would also make sense to get rid of the compat argument to Dataset.

These extra arguments are not part of the fundamental xarray data model and thus are a little distracting, especially to new users.

@pydata/xarray and others, what do we think about deprecating the compat argument to the Dataset constructor and the encoding arguement to the DataArray (and Dataset via #1170).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1188/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
578005145 MDExOlB1bGxSZXF1ZXN0Mzg1NjY1Nzk1 3850 Add xpublish to related projects jhamman 2443309 closed 0     0 2020-03-09T15:46:14Z 2020-03-10T06:06:08Z 2020-03-10T06:06:08Z MEMBER   0 pydata/xarray/pulls/3850

We've recently released Xpublish. This PR adds the project to the _related-projects` page in the Xarray documentation. To find out more about Xpublish, check out the docs or the release announcement blogpost.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3850/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
508743579 MDU6SXNzdWU1MDg3NDM1Nzk= 3413 Can apply_ufunc be used on arrays with different dimension sizes jhamman 2443309 closed 0     2 2019-10-17T22:04:00Z 2019-12-11T22:32:23Z 2019-12-11T22:32:23Z MEMBER      

We have an application where we want to use apply_ufunc to apply a function that takes two 1-D arrays and returns a scalar value (basically a reduction over the only axis). We start with two DataArrays that share all the same dimensions - except for the lengths of the dimension we'll be reducing along (t in this case):

```python def diff_mean(X, y): ''' a function that only works on 1d arrays that are different lengths''' assert X.ndim == 1, X.ndim assert y.ndim == 1, y.ndim assert len(X) != len(y), X return X.mean() - y.mean()

X = np.random.random((10, 4, 5)) y = np.random.random((6, 4, 5))

Xda = xr.DataArray(X, dims=('t', 'x', 'y')).chunk({'t': -1, 'x': 2, 'y': 2}) yda = xr.DataArray(y, dims=('t', 'x', 'y')).chunk({'t': -1, 'x': 2, 'y': 2}) ```

Then, we'd like to use apply_ufunc to apply our function (e.g. diff_mean):

python out = xr.apply_ufunc( diff_mean, Xda, yda, vectorize=True, dask="parallelized", output_dtypes=[np.float], input_core_dims=[['t'], ['t']], )

This fails with an error when aligning the t dimensions:

```python-traceback

ValueError Traceback (most recent call last) <ipython-input-4-e90cf6fba482> in <module> 9 dask="parallelized", 10 output_dtypes=[np.float], ---> 11 input_core_dims=[['t'], ['t']], 12 )

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, *args) 1042 join=join, 1043 exclude_dims=exclude_dims, -> 1044 keep_attrs=keep_attrs 1045 ) 1046 elif any(isinstance(a, Variable) for a in args):

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args) 222 if len(args) > 1: 223 args = deep_align( --> 224 args, join=join, copy=False, exclude=exclude_dims, raise_on_invalid=False 225 ) 226

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/alignment.py in deep_align(objects, join, copy, indexes, exclude, raise_on_invalid, fill_value) 403 indexes=indexes, 404 exclude=exclude, --> 405 fill_value=fill_value 406 ) 407

~/miniconda3/envs/xarray-ml/lib/python3.7/site-packages/xarray/core/alignment.py in align(join, copy, indexes, exclude, fill_value, *objects) 321 "arguments without labels along dimension %r cannot be " 322 "aligned because they have different dimension sizes: %r" --> 323 % (dim, sizes) 324 ) 325

ValueError: arguments without labels along dimension 't' cannot be aligned because they have different dimension sizes: {10, 6}

```

https://nbviewer.jupyter.org/gist/jhamman/0e52d9bb29f679e26b0878c58bb813d2

I'm curious if this can be made to work with apply_ufunc or if we should pursue other options here. Advice and suggestions appreciated.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: None libnetcdf: None xarray: 0.14.0 pandas: 0.25.1 numpy: 1.17.1 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.3.2 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.3.0 distributed: 2.3.2 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None setuptools: 41.2.0 pip: 19.2.3 conda: None pytest: 5.0.1 IPython: 7.8.0 sphinx: 2.2.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3413/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
527830145 MDExOlB1bGxSZXF1ZXN0MzQ1MDAzOTU4 3568 add environment file for binderized examples jhamman 2443309 closed 0     1 2019-11-25T04:00:59Z 2019-11-25T15:57:19Z 2019-11-25T15:57:19Z MEMBER   0 pydata/xarray/pulls/3568
  • [x] Closes #3563
  • [ ] Tests added
  • [ ] Passes black . && mypy . && flake8
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3568/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
132774456 MDU6SXNzdWUxMzI3NzQ0NTY= 757 Ordered Groupby Keys jhamman 2443309 open 0     6 2016-02-10T18:05:08Z 2019-11-20T16:12:41Z   MEMBER      

The current behavior of the xarray's Groupby.groups property provides a standard (unordered) dictionary. This is fine for most cases but leads to odd orderings in use cases like this one where I am using xarray's FacetGrid plotting:

``` Python plot_kwargs = dict(col='season', vmin=15, vmax=35, levels=12, extend='both')

da_obs = ds_obs.SALT.isel(depth=0).groupby('time.season').mean('time') da_obs.plot(**plot_kwargs) ```

Note that MAM and JJA are out of order.

I think this could be easily fixed by using an OrderedDict in xarray.core.Groupby.groups.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/757/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
280385592 MDU6SXNzdWUyODAzODU1OTI= 1769 Extend to_masked_array to support dask MaskedArrays jhamman 2443309 open 0     5 2017-12-08T06:22:56Z 2019-11-08T17:19:44Z   MEMBER      

Following @shoyer's comment, it will be pretty straightforward to support creating dask masked arrays within the to_masked_array method. My thought would be that data arrays use dask, would be converted to dask masked arrays, rather than to numpy arrays as they are currently.

Two kinks:

1) The dask masked array feature requires dask 0.15.3 or newer. 2) I'm not sure how to test if an object is a dask.array.ma.MaskedArray (Dask doesn't have a MaskedArray class). @mrocklin - thoughts?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1769/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
505409694 MDExOlB1bGxSZXF1ZXN0MzI2ODQ4ODk1 3389 OrderedDict --> dict, some python3.5 cleanup too jhamman 2443309 closed 0     9 2019-10-10T17:30:43Z 2019-10-23T07:07:10Z 2019-10-12T21:33:34Z MEMBER   0 pydata/xarray/pulls/3389
  • [x] Toward https://github.com/pydata/xarray/issues/3380#issuecomment-539224341
  • [x] Passes black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

See below for inline comments where I could use some input from @shoyer and @crusaderky

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3389/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
503700649 MDU6SXNzdWU1MDM3MDA2NDk= 3380 [Release] 0.14 jhamman 2443309 closed 0     19 2019-10-07T21:28:28Z 2019-10-15T01:08:11Z 2019-10-14T21:26:59Z MEMBER      

3358 is going to make some fairly major changes to the minimum supported versions of required and optional dependencies. We also have a few bug fixes that have landed since releasing 0.13 that would be good to get out.

From what I can tell, the following pending PRs are close enough to get into this release. - [ ] ~tests for arrays with units #3238~ - [x] map_blocks #3276 - [x] Rolling minimum dependency versions policy #3358 - [x] Remove all OrderedDict's (#3389) - [x] Speed up isel and __getitem__ #3375 - [x] Fix concat bug when concatenating unlabeled dimensions. #3362 - [ ] ~Add hypothesis test for netCDF4 roundtrip #3283~ - [x] Fix groupby reduce for dataarray #3338 - [x] Need a fix for https://github.com/pydata/xarray/issues/3377

Am I missing anything else that needs to get in?

I think we should aim to wrap this release up soon (this week). I can volunteer to go through the release steps once we're ready.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3380/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
505617351 MDExOlB1bGxSZXF1ZXN0MzI3MDEzMDQx 3392 fix for #3377 jhamman 2443309 closed 0     1 2019-10-11T03:32:19Z 2019-10-11T11:30:52Z 2019-10-11T11:30:51Z MEMBER   0 pydata/xarray/pulls/3392
  • [x] Closes #3377
  • [x] Tests added
  • [x] Passes black . && mypy . && flake8
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3392/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
406035264 MDExOlB1bGxSZXF1ZXN0MjQ5ODQ1MTAz 2737 add h5netcdf+dask tests jhamman 2443309 closed 0     7 2019-02-02T23:50:20Z 2019-02-12T06:31:01Z 2019-02-12T05:39:19Z MEMBER   0 pydata/xarray/pulls/2737
  • [x] Closes #1571
  • [x] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2737/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
407548101 MDExOlB1bGxSZXF1ZXN0MjUwOTk3NTYx 2750 remove references to cyordereddict jhamman 2443309 closed 0     0 2019-02-07T05:32:27Z 2019-02-07T18:30:01Z 2019-02-07T18:30:01Z MEMBER   0 pydata/xarray/pulls/2750
  • [x] Closes #2744
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2750/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
406049155 MDExOlB1bGxSZXF1ZXN0MjQ5ODUzNTA1 2738 reintroduce pynio/rasterio/iris to py36 test env jhamman 2443309 closed 0     1 2019-02-03T03:43:31Z 2019-02-07T00:08:49Z 2019-02-07T00:08:17Z MEMBER   0 pydata/xarray/pulls/2738
  • [x] Closes #1910
  • [x] Tests added

xref: #2683

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2738/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
297227247 MDU6SXNzdWUyOTcyMjcyNDc= 1910 Pynio tests are being skipped on TravisCI jhamman 2443309 closed 0     3 2018-02-14T20:03:31Z 2019-02-07T00:08:17Z 2019-02-07T00:08:17Z MEMBER      

Problem description

Currently on Travis, the Pynio tests are being skipped. The py27-cdat+iris+pynio is supposed to be running tests for each of these but it is not.

https://travis-ci.org/pydata/xarray/jobs/341426116#L2429-L2518

I can't look at this right now in depth but I'm wondering if this is related to #1531.

reported by @WeatherGod

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1910/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
406187700 MDExOlB1bGxSZXF1ZXN0MjQ5OTQyODM1 2741 remove xfail from test_cross_engine_read_write_netcdf4 jhamman 2443309 closed 0     0 2019-02-04T05:35:18Z 2019-02-06T22:49:19Z 2019-02-04T14:50:16Z MEMBER   0 pydata/xarray/pulls/2741

This is passing in my local test environment. We'll see on CI...

  • [x] Closes #535
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2741/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
400841236 MDExOlB1bGxSZXF1ZXN0MjQ1OTM1OTA4 2691 try no rasterio in py36 env jhamman 2443309 closed 0     4 2019-01-18T18:35:58Z 2019-02-03T03:44:11Z 2019-01-18T21:47:44Z MEMBER   0 pydata/xarray/pulls/2691

As described in #2683, our test suite is failing on Travis with an unfortunate segfault. For now, I've just taken rasterio (and therefore GDAL) out of the offending environment. I'll use this PR to test a few other options.

cc @max-sixty

  • [x] Closes #2683
  • [ ] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2691/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
406023579 MDExOlB1bGxSZXF1ZXN0MjQ5ODM4MTA3 2736 remove bottleneck dev build from travis jhamman 2443309 closed 0     0 2019-02-02T21:18:29Z 2019-02-03T03:32:38Z 2019-02-03T03:32:21Z MEMBER   0 pydata/xarray/pulls/2736

This dev build is failing due to problems with bottlenecks setup script. Generally, the bottleneck package seems to be missing some maintenance effort so until a new release is issued, I don't think we need to be testing against its dev state.

  • [x] Closes #1109

xref: #2661

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2736/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
405955807 MDExOlB1bGxSZXF1ZXN0MjQ5Nzk2MzQx 2735 add tests for handling of empty pandas objects in constructors jhamman 2443309 closed 0     3 2019-02-02T06:54:42Z 2019-02-02T23:18:21Z 2019-02-02T07:47:58Z MEMBER   0 pydata/xarray/pulls/2735
  • [x] Closes #697
  • [x] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2735/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
405038519 MDExOlB1bGxSZXF1ZXN0MjQ5MDg2NjYx 2730 improve error message for invalid encoding jhamman 2443309 closed 0     1 2019-01-31T01:20:49Z 2019-01-31T17:27:03Z 2019-01-31T17:26:54Z MEMBER   0 pydata/xarray/pulls/2730

Improved error message for invalid encodings.

  • [x] Closes #2728
  • [ ] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2730/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
395431629 MDExOlB1bGxSZXF1ZXN0MjQxODg3MjU2 2645 Remove py2 compat jhamman 2443309 closed 0     14 2019-01-03T01:20:51Z 2019-01-25T16:46:22Z 2019-01-25T16:38:45Z MEMBER   0 pydata/xarray/pulls/2645

I was feeling particularly zealous today so I decided to see what it would take to strip out all the Python 2 compatibility code in xarray. I expect some will feel its too soon to merge this so I'm mostly putting this up for show-and-tell and to highlight some of the knots we've tied ourselves into over the years.

  • [x] Closes #1876
  • [ ] Tests added
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2645/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
302930480 MDU6SXNzdWUzMDI5MzA0ODA= 1971 Should we be testing against multiple dask schedulers? jhamman 2443309 closed 0     5 2018-03-07T01:25:37Z 2019-01-13T20:58:21Z 2019-01-13T20:58:20Z MEMBER      

Almost all of our unit tests are against the dask's default scheduler (usually dask.threaded). While it is true that beauty of dask is that one can separate the scheduler from the logical implementation, there are a few idiosyncrasies to consider, particularly in xarray's backends. To that end, we have a few tests covering the integration of the distributed scheduler with xarray's backends but the test coverage is not particularly complete.

If nothing more, I think it is worth considering tests that use the threaded, multiprocessing, and distributed schedulers for a larger subset of the backends tests (those that use dask).

Note, I'm bringing this up because I'm seeing some failing tests in #1793 that are unrelated to my code change but do appear to be related to dask and possibly a different different default scheduler (example failure).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1971/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
395004129 MDExOlB1bGxSZXF1ZXN0MjQxNTgxMjY0 2637 DEP: drop python 2 support and associated ci mods jhamman 2443309 closed 0     3 2018-12-31T16:35:59Z 2019-01-02T04:52:18Z 2019-01-02T04:52:04Z MEMBER   0 pydata/xarray/pulls/2637

This is a WIP. I expect the CI changes to take a few iterations.

  • [x] Closes #1876
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2637/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
293414745 MDU6SXNzdWUyOTM0MTQ3NDU= 1876 DEP: drop Python 2.7 support jhamman 2443309 closed 0     2 2018-02-01T06:11:07Z 2019-01-02T04:52:04Z 2019-01-02T04:52:04Z MEMBER      

The timeline for dropping Python 2.7 support for new Xarray releases is the end of 2018.

This issue can be used to track the necessary documentation and code changes to make that happen.

xref: #1830

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1876/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
377423603 MDExOlB1bGxSZXF1ZXN0MjI4MzcwMzUz 2545 Expand test environment for Python 3.7 jhamman 2443309 closed 0     2 2018-11-05T14:27:50Z 2018-11-06T16:29:35Z 2018-11-06T16:22:46Z MEMBER   0 pydata/xarray/pulls/2545

Just adding a full environment for python 3.7.

  • [x] Extends #2271
  • [x] Tests added
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2545/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
377075253 MDExOlB1bGxSZXF1ZXN0MjI4MTMwMzQx 2538 Stop loading tutorial data by default jhamman 2443309 closed 0     6 2018-11-03T17:24:26Z 2018-11-05T15:36:17Z 2018-11-05T15:36:17Z MEMBER   0 pydata/xarray/pulls/2538
  • [x] Tests added
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

In working on an xarray/dask tutorial, I've come to realize we eagerly load the tutorial datasets in xarray.tutorial.load_dataset. I'm going to just say that I don't think we should do that but I could be missing some rational. I didn't open an issue so please feel free to share thoughts here.

One option would be to create a new function (xr.tutorial.open_dataset) that does what I'm suggesting and then slowly deprecate tutorial.load_dataset. Thoughts?

xref: https://github.com/dask/dask-examples/pull/51

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2538/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
362913084 MDExOlB1bGxSZXF1ZXN0MjE3NDkyNDIy 2432 switch travis language to generic jhamman 2443309 closed 0     3 2018-09-23T04:37:38Z 2018-09-26T23:27:55Z 2018-09-26T23:27:54Z MEMBER   0 pydata/xarray/pulls/2432

Following up on #2271. This switches the set language in our Travis-CI config from "python" to "generic". Since we don't use any of the Travis Python utilities, we didn't really need the python setting and the generic setting gives a few benefits:

  • smaller base image which should give a bit faster spin-up time
  • build matrix without reliance on python version, instead we just point to the conda environment file

  • [x] Tests passed (for all non-documentation changes)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2432/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
339197312 MDExOlB1bGxSZXF1ZXN0MTk5OTI1NDg3 2271 dev/test build for python 3.7 jhamman 2443309 closed 0     3 2018-07-08T05:02:19Z 2018-09-22T23:09:43Z 2018-09-22T20:13:28Z MEMBER   0 pydata/xarray/pulls/2271
  • [x] Tests added
  • [x] Tests passed (for all non-documentation changes)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2271/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
323765896 MDU6SXNzdWUzMjM3NjU4OTY= 2142 add CFTimeIndex enabled date_range function jhamman 2443309 closed 0     1 2018-05-16T20:02:08Z 2018-09-19T20:24:40Z 2018-09-19T20:24:40Z MEMBER      

Pandas' date_range function is a fast and flexible way to create DateTimeIndex objects. Now that we have a functioning CFTimeIndex, it would be great to add a version of the date_range function that supports other calendars and dates out of range for Pandas.

Code Sampl and expected output

```python In [1]: import xarray as xr

In [2]: xr.date_range('2000-02-26', '2000-03-02') Out[2]: DatetimeIndex(['2000-02-26', '2000-02-27', '2000-02-28', '2000-02-29', '2000-03-01', '2000-03-02'], dtype='datetime64[ns]', freq='D')

In [3]: xr.date_range('2000-02-26', '2000-03-02', calendar='noleap') Out[3]: CFTimeIndex(['2000-02-26', '2000-02-27', '2000-02-28', '2000-03-01', '2000-03-02'], dtype='cftime.datetime', freq='D') ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2142/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
361453268 MDExOlB1bGxSZXF1ZXN0MjE2NDIxMTE3 2421 Update NumFOCUS donate link jhamman 2443309 closed 0     1 2018-09-18T19:40:53Z 2018-09-19T05:59:28Z 2018-09-19T05:59:28Z MEMBER   0 pydata/xarray/pulls/2421
  • [ ] Closes #xxxx (remove if there is no corresponding issue, which should only be the case for minor changes)
  • [ ] Tests added (for all bug fixes or enhancements)
  • [ ] Tests passed (for all non-documentation changes)
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2421/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
357720579 MDExOlB1bGxSZXF1ZXN0MjEzNjY4MTgz 2403 add some blurbs about numfocus sponsorship to docs jhamman 2443309 closed 0     3 2018-09-06T15:54:06Z 2018-09-19T05:37:34Z 2018-09-11T02:14:18Z MEMBER   0 pydata/xarray/pulls/2403

Xarray is now a fiscally sponsored project of NumFOCUS. This PR adds a few blurbs of text highlighting that on the main readme and index page of the docs.

TODO: - Update flipcause to xarray specific donation page

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2403/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
358870903 MDExOlB1bGxSZXF1ZXN0MjE0NTAwNjk5 2409 Numfocus jhamman 2443309 closed 0     0 2018-09-11T03:15:52Z 2018-09-11T05:13:51Z 2018-09-11T05:13:51Z MEMBER   0 pydata/xarray/pulls/2409

followup PR fixing two small typos in my previous PR.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2409/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
345300237 MDExOlB1bGxSZXF1ZXN0MjA0NDg4NDI2 2320 Fix for zarr encoding bug jhamman 2443309 closed 0     1 2018-07-27T17:05:27Z 2018-08-14T03:46:37Z 2018-08-14T03:46:34Z MEMBER   0 pydata/xarray/pulls/2320
  • [x] Closes #2278
  • [x] Tests added
  • [ ] Tests passed
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2320/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
340489812 MDExOlB1bGxSZXF1ZXN0MjAwODg4Mzc0 2282 fix dask get_scheduler warning jhamman 2443309 closed 0     1 2018-07-12T05:01:02Z 2018-07-14T16:19:58Z 2018-07-14T16:19:53Z MEMBER   0 pydata/xarray/pulls/2282
  • [x] Closes #2238
  • [ ] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [ ] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2282/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
327905732 MDExOlB1bGxSZXF1ZXN0MTkxNTg1ODU4 2204 update minimum versions and associated code cleanup jhamman 2443309 closed 0   0.11 2856429 6 2018-05-30T21:27:14Z 2018-07-08T00:55:36Z 2018-07-08T00:55:32Z MEMBER   0 pydata/xarray/pulls/2204
  • [x] closes #2200, closes #1829, closes #2203
  • [x] Tests passed (for all non-documentation changes)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This updates the following minimum versions:

  • numpy: 1.11 (Mar 27, 2016) --> 1.12 (Jan 15, 2017)
  • pandas: 0.18 (Mar 11, 2016) --> 0.19 (Oct 2, 2016)
  • dask: 0.9 (May 10, 2016) --> 0.16

and drops our tests for python 3.4.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2204/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
288465429 MDU6SXNzdWUyODg0NjU0Mjk= 1829 Drop support for Python 3.4 jhamman 2443309 closed 0   0.11 2856429 13 2018-01-15T02:38:19Z 2018-07-08T00:55:32Z 2018-07-08T00:55:32Z MEMBER      

Python 3.7-final is due out in June (PEP 537). When do we want to deprecate 3.4 and when should we drop support all together. @maxim-lian brought this up in a PR he's working on: https://github.com/pydata/xarray/pull/1828#issuecomment-357562144.

For reference, we dropped Python 3.3 in #1175 (12/20/2016).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1829/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
327893262 MDU6SXNzdWUzMjc4OTMyNjI= 2203 Update minimum version of dask jhamman 2443309 closed 0     6 2018-05-30T20:47:57Z 2018-07-08T00:55:32Z 2018-07-08T00:55:32Z MEMBER      

Xarray currently states that it supports dask version 0.9 and later. However, 1) I don't think this is true and my quick test shows that some of our tests fail using dask 0.9, and 2) we have a growing number of tests that are being skipped for older dask versions:

$ grep -irn "dask.__version__" xarray/tests/*py xarray/tests/__init__.py:90: if LooseVersion(dask.__version__) < '0.18': xarray/tests/test_computation.py:755: if LooseVersion(dask.__version__) < LooseVersion('0.17.3'): xarray/tests/test_computation.py:841: if not use_dask or LooseVersion(dask.__version__) > LooseVersion('0.17.4'): xarray/tests/test_dask.py:211: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:223: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:284: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:296: @pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:387: if LooseVersion(dask.__version__) == LooseVersion('0.15.3'): xarray/tests/test_dask.py:784: pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:802: pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_dask.py:818:@pytest.mark.skipif(LooseVersion(dask.__version__) <= '0.15.4', xarray/tests/test_variable.py:1664: if LooseVersion(dask.__version__) <= LooseVersion('0.15.1'): xarray/tests/test_variable.py:1670: if LooseVersion(dask.__version__) <= LooseVersion('0.15.1'):

I'd like to see xarray bump the minimum version number of dask to something around 0.15.4 (Oct. 2017) or 0.16 (Nov. 2017).

cc @mrocklin, @pydata/xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2203/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
327875183 MDU6SXNzdWUzMjc4NzUxODM= 2200 DEPS: drop numpy < 1.12 jhamman 2443309 closed 0     0 2018-05-30T19:52:40Z 2018-07-08T00:55:31Z 2018-07-08T00:55:31Z MEMBER      

Pandas is dropping Numpy 1.11 and earlier in their 0.24 release. It is probably easiest to follow suit with xarray.

xref: https://github.com/pandas-dev/pandas/issues/21242

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2200/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
331752926 MDExOlB1bGxSZXF1ZXN0MTk0NDA3MzU5 2228 fix zarr chunking bug jhamman 2443309 closed 0     2 2018-06-12T21:04:10Z 2018-06-13T13:07:58Z 2018-06-13T05:51:36Z MEMBER   0 pydata/xarray/pulls/2228
  • [x] Closes #2225
  • [x] Tests added
  • [x] Tests passed
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2228/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
331415995 MDU6SXNzdWUzMzE0MTU5OTU= 2225 Zarr Backend: check for non-uniform chunks is too strict jhamman 2443309 closed 0     3 2018-06-12T02:36:05Z 2018-06-13T05:51:36Z 2018-06-13T05:51:36Z MEMBER      

I think the following block of code is more strict than either dask or zarr requires:

https://github.com/pydata/xarray/blob/6c3abedf906482111b06207b9016ea8493c42713/xarray/backends/zarr.py#L80-L89

It should be possible to have uneven chunks in the last position of multiple dimensions in a zarr dataset.

Code Sample, a copy-pastable example if possible

```python In [1]: import xarray as xr

In [2]: import dask.array as dsa

In [3]: da = xr.DataArray(dsa.random.random((8, 7, 11), chunks=(3, 3, 3)), dims=('x', 'y', 't'))

In [4]: da Out[4]: <xarray.DataArray 'da.random.random_sample-1aed3ea2f9dd784ec947cb119459fa56' (x: 8, y: 7, t: 11)> dask.array<shape=(8, 7, 11), dtype=float64, chunksize=(3, 3, 3)> Dimensions without coordinates: x, y, t

In [5]: da.data.chunks Out[5]: ((3, 3, 2), (3, 3, 1), (3, 3, 3, 2))

In [6]: da.to_dataset('varname').to_zarr('/Users/jhamman/workdir/test_chunks.zarr') /Users/jhamman/anaconda/bin/ipython:1: FutureWarning: the order of the arguments on DataArray.to_dataset has changed; you now need to supply name as a keyword argument #!/Users/jhamman/anaconda/bin/python


ValueError Traceback (most recent call last) <ipython-input-7-32fa9a7d0276> in <module>() ----> 1 da.to_dataset('varname').to_zarr('/Users/jhamman/workdir/test_chunks.zarr')

~/anaconda/lib/python3.6/site-packages/xarray/core/dataset.py in to_zarr(self, store, mode, synchronizer, group, encoding, compute) 1185 from ..backends.api import to_zarr 1186 return to_zarr(self, store=store, mode=mode, synchronizer=synchronizer, -> 1187 group=group, encoding=encoding, compute=compute) 1188 1189 def unicode(self):

~/anaconda/lib/python3.6/site-packages/xarray/backends/api.py in to_zarr(dataset, store, mode, synchronizer, group, encoding, compute) 856 # I think zarr stores should always be sync'd immediately 857 # TODO: figure out how to properly handle unlimited_dims --> 858 dataset.dump_to_store(store, sync=True, encoding=encoding, compute=compute) 859 860 if not compute:

~/anaconda/lib/python3.6/site-packages/xarray/core/dataset.py in dump_to_store(self, store, encoder, sync, encoding, unlimited_dims, compute) 1073 1074 store.store(variables, attrs, check_encoding, -> 1075 unlimited_dims=unlimited_dims) 1076 if sync: 1077 store.sync(compute=compute)

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in store(self, variables, attributes, args, kwargs) 341 def store(self, variables, attributes, args, kwargs): 342 AbstractWritableDataStore.store(self, variables, attributes, --> 343 *args, kwargs) 344 345 def sync(self, compute=True):

~/anaconda/lib/python3.6/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, unlimited_dims) 366 self.set_dimensions(variables, unlimited_dims=unlimited_dims) 367 self.set_variables(variables, check_encoding_set, --> 368 unlimited_dims=unlimited_dims) 369 370 def set_attributes(self, attributes):

~/anaconda/lib/python3.6/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, unlimited_dims) 403 check = vn in check_encoding_set 404 target, source = self.prepare_variable( --> 405 name, v, check, unlimited_dims=unlimited_dims) 406 407 self.writer.add(source, target)

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims) 325 326 encoding = _extract_zarr_variable_encoding( --> 327 variable, raise_on_invalid=check_encoding) 328 329 encoded_attrs = OrderedDict()

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in _extract_zarr_variable_encoding(variable, raise_on_invalid) 181 182 chunks = _determine_zarr_chunks(encoding.get('chunks'), variable.chunks, --> 183 variable.ndim) 184 encoding['chunks'] = chunks 185 return encoding

~/anaconda/lib/python3.6/site-packages/xarray/backends/zarr.py in _determine_zarr_chunks(enc_chunks, var_chunks, ndim) 87 "Zarr requires uniform chunk sizes excpet for final chunk." 88 " Variable %r has incompatible chunks. Consider " ---> 89 "rechunking using chunk()." % (var_chunks,)) 90 # last chunk is allowed to be smaller 91 last_var_chunk = all_var_chunks[-1]

ValueError: Zarr requires uniform chunk sizes excpet for final chunk. Variable ((3, 3, 2), (3, 3, 1), (3, 3, 3, 2)) has incompatible chunks. Consider rechunking using chunk(). ```

Problem description

[this should explain why the current behavior is a problem and why the expected output is a better solution.]

Expected Output

IIUC, Zarr allows multiple dims to have uneven chunks, so long as they are all in the last position:

```Python In [9]: import zarr

In [10]: z = zarr.zeros((8, 7, 11), chunks=(3, 3, 3), dtype='i4')

In [11]: z.chunks Out[11]: (3, 3, 3) ```

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.7 pandas: 0.22.0 numpy: 1.14.3 scipy: 1.1.0 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.7.1 Nio: None zarr: 2.2.0 bottleneck: 1.2.1 cyordereddict: None dask: 0.17.2 distributed: 1.21.6 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.3 conda: 4.5.4 pytest: 3.5.1 IPython: 6.3.1 sphinx: 1.7.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2225/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
323017930 MDExOlB1bGxSZXF1ZXN0MTg3OTc4ODg2 2131 Feature/pickle rasterio jhamman 2443309 closed 0     13 2018-05-14T23:38:59Z 2018-06-08T05:00:59Z 2018-06-07T18:02:56Z MEMBER   0 pydata/xarray/pulls/2131
  • [x] Closes #2121
  • [x] Tests added
  • [x] Tests passed
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

cc @rsignell-usgs

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2131/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
322445312 MDU6SXNzdWUzMjI0NDUzMTI= 2121 rasterio backend should use DataStorePickleMixin (or something similar) jhamman 2443309 closed 0     2 2018-05-11T21:51:59Z 2018-06-07T18:02:56Z 2018-06-07T18:02:56Z MEMBER      

Code Sample, a copy-pastable example if possible

```Python In [1]: import xarray as xr

In [2]: ds = xr.open_rasterio('RGB.byte.tif')

In [3]: ds Out[3]: <xarray.DataArray (band: 3, y: 718, x: 791)> [1703814 values with dtype=uint8] Coordinates: * band (band) int64 1 2 3 * y (y) float64 2.827e+06 2.826e+06 2.826e+06 2.826e+06 2.826e+06 ... * x (x) float64 1.021e+05 1.024e+05 1.027e+05 1.03e+05 1.033e+05 ... Attributes: transform: (101985.0, 300.0379266750948, 0.0, 2826915.0, 0.0, -300.0417... crs: +init=epsg:32618 res: (300.0379266750948, 300.041782729805) is_tiled: 0 nodatavals: (0.0, 0.0, 0.0)

In [4]: import pickle

In [5]: pickle.dumps(ds)

TypeError Traceback (most recent call last) <ipython-input-5-a165c2473431> in <module>() ----> 1 pickle.dumps(ds)

TypeError: can't pickle rasterio._io.RasterReader objects ```

Problem description

Originally reported by @rsignell-usgs in https://github.com/pangeo-data/pangeo/issues/249#issuecomment-388445370, the rasterio backend is not pickle-able. This obviously causes problems when using dask-distributed. We probably need to use DataStorePickleMixin or something similar on rasterio datasets to allow multiple readers of the same dataset.

Expected Output

python pickle.dumps(ds)

returns a pickled dataset.

Output of xr.show_versions()

xr.show_versions() /Users/jhamman/anaconda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`. from ._conv import register_converters as _register_converters INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Darwin OS-release: 17.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.3 pandas: 0.22.0 numpy: 1.14.2 scipy: 1.0.1 netCDF4: 1.3.1 h5netcdf: 0.5.1 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.2 distributed: 1.21.6 matplotlib: 2.2.2 cartopy: 0.16.0 seaborn: 0.8.1 setuptools: 39.0.1 pip: 9.0.3 conda: 4.5.1 pytest: 3.5.1 IPython: 6.3.1 sphinx: 1.7.4
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2121/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
324204749 MDExOlB1bGxSZXF1ZXN0MTg4ODc1NDU3 2154 fix unlimited dims bug jhamman 2443309 closed 0     1 2018-05-17T22:13:51Z 2018-05-25T00:32:02Z 2018-05-18T14:48:11Z MEMBER   0 pydata/xarray/pulls/2154
  • [x] Closes #2134
  • [x] Tests added
  • [x] Tests passed
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2154/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
324544072 MDExOlB1bGxSZXF1ZXN0MTg5MTI4NzY0 2163 Versioneer jhamman 2443309 closed 0     2 2018-05-18T20:35:39Z 2018-05-20T23:14:03Z 2018-05-20T23:14:03Z MEMBER   0 pydata/xarray/pulls/2163
  • [x] Closes #1300 (in a more portable way)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

This eliminates the need to edit setup.py before / after release and is a nice step towards simplifying xarray's release process.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2163/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
323732892 MDExOlB1bGxSZXF1ZXN0MTg4NTE4Nzg2 2141 expose CFTimeIndex to public API jhamman 2443309 closed 0     0 2018-05-16T18:19:59Z 2018-05-16T19:48:00Z 2018-05-16T19:48:00Z MEMBER   0 pydata/xarray/pulls/2141
  • [x] Closes #2140 ~- [ ] Tests added (for all bug fixes or enhancements)~
  • [ ] Tests passed
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

cc @spencerkclark and @shoyer

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2141/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
286542795 MDExOlB1bGxSZXF1ZXN0MTYxNTA4MzMx 1811 WIP: Compute==False for to_zarr and to_netcdf jhamman 2443309 closed 0     17 2018-01-07T05:01:42Z 2018-05-16T15:06:51Z 2018-05-16T15:05:03Z MEMBER   0 pydata/xarray/pulls/1811

review of this can wait until after #1800 is merged.

  • [x] Closes #1784
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API (remove if this change should not be visible to users, e.g., if it is an internal clean-up, or if this is part of a larger project that will be documented later)

cc @mrocklin

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1811/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
304589831 MDExOlB1bGxSZXF1ZXN0MTc0NTMxNTcy 1983 Parallel open_mfdataset jhamman 2443309 closed 0     18 2018-03-13T00:44:35Z 2018-04-20T12:04:31Z 2018-04-20T12:04:23Z MEMBER   0 pydata/xarray/pulls/1983
  • [x] Closes #1981
  • [x] Tests added
  • [x] Tests passed
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

I'm sharing this in the hopes of getting comments from @mrocklin and @pydata/xarray.

What this does:

  • implements a dask.bag map/apply on the xarray open_dataset and preprocess steps in open_mfdataset
  • adds a new parallel option to open_mfdataset
  • provides about a 40% speedup in opening a multifile dataset when using the distributed scheduler (I tested on 1000 netcdf files that took about 9 seconds to open/concatenate in the default configuration)

What it does not do (yet):

  • check that autoclose=True when multiple processes are being use (multiprocessing/distributed scheduler)
  • provide any speedup with the multiprocessing backend (I do not understand why this is)

Benchmark Example

```Python In [1]: import xarray as xr ...: import dask ...: import dask.threaded ...: import dask.multiprocessing ...: from dask.distributed import Client ...:

In [2]: c = Client() ...: c ...: Out[2]: <Client: scheduler='tcp://127.0.0.1:59576' processes=4 cores=4>

In [4]: %%time ...: with dask.set_options(get=dask.multiprocessing.get): ...: ds = xr.open_mfdataset('../test_files/test_netcdf_*nc', autoclose=True, parallel=True) ...: CPU times: user 4.76 s, sys: 201 ms, total: 4.96 s Wall time: 7.74 s

In [5]: %%time ...: with dask.set_options(get=c.get): ...: ds = xr.open_mfdataset('../test_files/test_netcdf_*nc', autoclose=True, parallel=True) ...: ...: CPU times: user 1.88 s, sys: 60.6 ms, total: 1.94 s Wall time: 4.41 s

In [6]: %%time ...: with dask.set_options(get=dask.threaded.get): ...: ds = xr.open_mfdataset('../test_files/test_netcdf_*nc') ...: CPU times: user 7.77 s, sys: 247 ms, total: 8.02 s Wall time: 8.17 s

In [7]: %%time ...: with dask.set_options(get=dask.threaded.get): ...: ds = xr.open_mfdataset('../test_files/test_netcdf_*nc', autoclose=True) ...: ...: CPU times: user 7.89 s, sys: 202 ms, total: 8.09 s Wall time: 8.21 s

In [8]: ds Out[8]: <xarray.Dataset> Dimensions: (lat: 45, lon: 90, time: 1000) Coordinates: * lon (lon) float64 0.0 4.045 8.09 12.13 16.18 20.22 24.27 28.31 ... * lat (lat) float64 -90.0 -85.91 -81.82 -77.73 -73.64 -69.55 -65.45 ... * time (time) datetime64[ns] 1970-01-01 1970-01-02 1970-01-11 ... Data variables: foo (time, lon, lat) float64 dask.array<shape=(1000, 90, 45), chunksize=(1, 90, 45)> bar (time, lon, lat) float64 dask.array<shape=(1000, 90, 45), chunksize=(1, 90, 45)> baz (time, lon, lat) float32 dask.array<shape=(1000, 90, 45), chunksize=(1, 90, 45)> Attributes: history: created for xarray benchmarking

```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1983/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
304201107 MDU6SXNzdWUzMDQyMDExMDc= 1981 use dask to open datasets in parallel jhamman 2443309 closed 0     5 2018-03-11T22:33:52Z 2018-04-20T12:04:23Z 2018-04-20T12:04:23Z MEMBER      

Code Sample, a copy-pastable example if possible

python xr.open_mfdataset('path/to/many/files*.nc', method='parallel')

Problem description

We have many issues describing the less than stelar performance of open_mfdataset (e.g. #511, #893, #1385, #1788, #1823). The problem can be broken into three pieces: 1) open each file, 2) decode/preprocess each datasets, and 3) merge/combine/concat the collection of datasets. We can perform (1) and (2) in parallel (performance improvements to (3) would be a separate task). Lately, I'm finding that for large numbers of files, it can take many seconds to many minutes just to open all the files in a multi-file dataset of mine.

I'm proposing that we use something like dask.bag to parallelize steps (1) and (2). I've played around with this a bit and it "works" almost right out of the box, provided you are using the "autoclose=True" option. A concrete example:

We could change the line: Python datasets = [open_dataset(p, **open_kwargs) for p in paths] to Python import dask.bag as db paths_bag = db.from_sequence(paths) datasets = paths_bag.map(open_dataset, **open_kwargs).compute()

I'm curious what others think of this idea and what the potential downfalls may be.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1981/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
283388962 MDExOlB1bGxSZXF1ZXN0MTU5Mjg2OTk0 1793 fix distributed writes jhamman 2443309 closed 0   0.10.3 3008859 35 2017-12-19T22:24:41Z 2018-03-13T15:32:54Z 2018-03-10T15:43:18Z MEMBER   0 pydata/xarray/pulls/1793
  • [x] Closes #1464
  • [x] Tests added
  • [x] Tests passed
  • [x] Passes git diff upstream/master **/*py | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Right now, I've just modified the dask distributed integration tests so we can all see the failing tests.

I'm happy to push this further but I thought I'd see if either @shoyer or @mrocklin have an idea where to start?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1793/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
304097233 MDExOlB1bGxSZXF1ZXN0MTc0MTg1NDI5 1980 Fix for failing zarr test jhamman 2443309 closed 0     2 2018-03-10T19:26:37Z 2018-03-12T05:37:09Z 2018-03-12T05:37:02Z MEMBER   0 pydata/xarray/pulls/1980
  • [x] Closes #1979 and #1955
  • [x] Tests added
  • [x] Tests passed
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1980/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
298854863 MDExOlB1bGxSZXF1ZXN0MTcwMzg1ODI4 1933 Use conda-forge netcdftime wherever netcdf4 was tested jhamman 2443309 closed 0     8 2018-02-21T06:22:08Z 2018-03-09T19:22:34Z 2018-03-09T19:22:20Z MEMBER   0 pydata/xarray/pulls/1933
  • [x] Closes #1920
  • [x] Tests added (for all bug fixes or enhancements)
  • [x] Tests passed (for all non-documentation changes)
  • [x] Fully documented: see #1920
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1933/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Next page

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 41.879ms · About: xarray-datasette