home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

22 rows where comments = 3, repo = 13221727 and user = 35968931 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: draft, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 11
  • pull 11

state 2

  • closed 15
  • open 7

repo 1

  • xarray · 22 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2120340151 PR_kwDOAMm_X85mHqI0 8714 Avoid coercing to numpy in `as_shared_dtypes` TomNicholas 35968931 open 0     3 2024-02-06T09:35:22Z 2024-03-28T18:31:50Z   MEMBER   0 pydata/xarray/pulls/8714
  • [x] Solves the problem in https://github.com/pydata/xarray/pull/8712#issuecomment-1929037299
  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8714/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
2098882374 I_kwDOAMm_X859GmdG 8660 dtype encoding ignored during IO? TomNicholas 35968931 closed 0     3 2024-01-24T18:50:47Z 2024-02-05T17:35:03Z 2024-02-05T17:35:02Z MEMBER      

What happened?

When I set the .encoding['dtype'] attribute before saving a to disk, the actual on-disk representation appears to store a record of the dtype encoding, but when opening it back up in xarray I get the same dtype I had before, not the one specified in the encoding. Is that what's supposed to happen? How does this work? (This happens with both zarr and netCDF.)

What did you expect to happen?

I expected that setting .encoding['dtype'] would mean that once I open the data back up, it would be in the new dtype that I set in the encoding.

Minimal Complete Verifiable Example

```Python air = xr.tutorial.open_dataset('air_temperature')

air['air'].dtype # returns dtype('float32')

air['air'].encoding['dtype'] # returns dtype('int16'), which already seems weird

air.to_zarr('air.zarr') # I would assume here that the encoding actually does something during IO

now if I check the zarr .zarray metadata for the air variable it says

"dtype":"<i2"`

air2 = xr.open_dataset('air.zarr', engine='zarr') # open it back up

air2['air'].dtype # returns dtype('float32'), but I expected dtype('int16')

(the same thing happens also with saving to netCDF instead of Zarr) ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

I know I didn't explicitly cast with .asdtype, but I'm still confused as to what the relation between the dtype encoding is supposed to be here.

I am probably just misunderstanding how this is supposed to work, but then this is arguably a docs issue, because here it says "[the encoding dtype field] controls the type of the data written on disk", which I would have thought also affects the data you get back when you open it up again?

Environment

main branch of xarray

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8660/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
2116695961 I_kwDOAMm_X85-KjeZ 8699 Wrapping a `kerchunk.Array` object directly with xarray TomNicholas 35968931 open 0     3 2024-02-03T22:15:07Z 2024-02-04T21:15:14Z   MEMBER      

What is your issue?

In https://github.com/fsspec/kerchunk/issues/377 the idea came up of using the xarray API to concatenate arrays which represent parts of a zarr store - i.e. using xarray to kerchunk a large set of netCDF files instead of using kerchunk.combine.MultiZarrToZarr.

The idea is to make something like this work for kerchunking sets of netCDF files into zarr stores

```python ds = xr.open_mfdataset( '/my/files*.nc' engine='kerchunk', # kerchunk registers an xarray IO backend that returns zarr.Array objects combine='nested', # 'by_coords' would require actually reading coordinate data parallel=True, # would use dask.delayed to generate reference dicts for each file in parallel )

ds # now wraps a bunch of zarr.Array / kerchunk.Array objects, no need for dask arrays

ds.kerchunk.to_zarr(store='out.zarr') # kerchunk defines an xarray accessor that extracts the zarr arrays and serializes them (which could also be done in parallel if writing to parquet) ```

I had a go at doing this in this notebook, and in doing so discovered a few potential issues with xarray's internals.

For this to work xarray has to: - Wrap a kerchunk.Array object which barely defines any array API methods, including basically not supporting indexing at all, - Store all the information present in a kerchunked Zarr store but without ever loading any data, - Not create any indexes by default during dataset construction or during xr.concat, - Not try to do anything else that can't be defined for a kerchunk.Array. - Possibly we need the Lazy Indexing classes to support concatenation https://github.com/pydata/xarray/issues/4628

It's an interesting exercise in using xarray as an abstraction, with no access to real numerical values at all.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8699/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 1
}
    xarray 13221727 issue
2027528985 PR_kwDOAMm_X85hQBHP 8525 Remove PR labeler bot TomNicholas 35968931 closed 0     3 2023-12-06T02:31:56Z 2023-12-06T02:45:46Z 2023-12-06T02:45:41Z MEMBER   0 pydata/xarray/pulls/8525

RIP

  • [x] Closes #8524
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8525/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1806973709 PR_kwDOAMm_X85VoNVM 7992 Docs page on interoperability TomNicholas 35968931 closed 0     3 2023-07-17T05:02:29Z 2023-10-26T16:08:56Z 2023-10-26T16:04:33Z MEMBER   0 pydata/xarray/pulls/7992

Builds upon #7991 by adding a page to the internals enumerating all the different ways in which xarray is interoperable.

Would be nice if https://github.com/pydata/xarray/pull/6975 were merged so that I could link to it from this new page.

  • [x] Addresses comment in https://github.com/pydata/xarray/pull/6975#issuecomment-1246487152
  • [ ] ~~Tests added~~
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7992/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1812811751 I_kwDOAMm_X85sDU_n 8008 "Deep linking" disparate documentation resources together TomNicholas 35968931 open 0     3 2023-07-19T22:18:55Z 2023-10-12T18:36:52Z   MEMBER      

What is your issue?

Our docs have a general issue with having lots of related resources that are not necessarily linked together in a useful way. This results in users (including myself!) getting "stuck" in one part of the docs and being unaware of material that would help them solve their specific issue.

To give a concrete example, if a user wants to know about coarsen, there is relevant material:

  • In the coarsen class docstring
  • On the reshaping page
  • On the computations page
  • On the "how do I?" page
  • On the tutorial repository

Different types of material are great, but only some of these resources are linked to others. Coarsen is actually pretty well covered overall, but for other functions there might be no useful linking at all, or no examples in the docstrings.


The biggest missed opportunity here is the way all the great content on the tutorial.xarray.dev repository is not linked from anywhere on the main documentation site (I believe). To address that we could either (a) integrate the tutorial.xarray.dev material into the main site or (b) add a lot more cross-linking between the two sites.

Identifying sections that could be linked and adding links would be a great task for new contributors.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8008/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
663235664 MDU6SXNzdWU2NjMyMzU2NjQ= 4243 Manually drop DataArray from memory? TomNicholas 35968931 closed 0     3 2020-07-21T18:54:40Z 2023-09-12T16:17:12Z 2023-09-12T16:17:12Z MEMBER      

Is it possible to deliberately drop data associated with a particular DataArray from memory?

Obviously da.close() exists, but what happens if you did for example python ds = open_dataset(file) da = ds[var] da.compute() # something that loads da into memory da.close() # is the memory freed up again now? ds.something() # what about now?

Also does calling python's built-in garbage collector (i.e. gc.collect()) do anything in this instance?

The context of this question is that I'm trying to resave some massive variables (~65GB each) that were loaded from thousands of files into just a few files for each variable. I would love to use @rabernat 's new rechunker package but I'm not sure how easily I can convert my current netCDF data to Zarr, and I'm interested in this question no matter how I end up solving the problem.

I don't currently have a particularly good understanding of file I/O and memory management in xarray, but would like to improve it. Can anyone recommend a tool I can use to answer this kind of question myself on my own machine? I suppose it would need to be able to tell me the current memory usage of specific objects, not just the total memory usage.

(@johnomotani you might be interested)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4243/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1807782455 I_kwDOAMm_X85rwJI3 7996 Stable docs build not showing latest changes after release TomNicholas 35968931 closed 0     3 2023-07-17T13:24:58Z 2023-07-17T20:48:19Z 2023-07-17T20:48:19Z MEMBER      

What happened?

I released xarray version v2023.07.0 last night, but I'm not seeing changes to the documentation reflected in the https://docs.xarray.dev/en/stable/ build. (In particular the Internals section now should have an entire extra page on wrapping chunked arrays.) I can however see the newest additions on https://docs.xarray.dev/en/latest/ build. Is that how it's supposed to work?

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7996/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1779880070 PR_kwDOAMm_X85UMTE7 7951 Chunked array docs TomNicholas 35968931 closed 0     3 2023-06-28T23:01:42Z 2023-07-05T20:33:33Z 2023-07-05T20:08:19Z MEMBER   0 pydata/xarray/pulls/7951

Builds upon #7911

  • [x] Documentation for #7019
  • [ ] ~~Tests added~~
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7951/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1694956396 I_kwDOAMm_X85lBvts 7813 Task naming for general chunkmanagers TomNicholas 35968931 open 0     3 2023-05-03T22:56:46Z 2023-05-05T10:30:39Z   MEMBER      

What is your issue?

(Follow-up to #7019)

When you create a dask graph of xarray operations, the tasks in the graph get useful names according the name of the DataArray they operate on, or whether they represent an open_dataset call.

Currently for cubed this doesn't work, for example this graph from https://github.com/pangeo-data/distributed-array-examples/issues/2#issuecomment-1533852877:

cc @tomwhite @dcherian

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1512290017 I_kwDOAMm_X85aI7bh 7403 Zarr error when trying to overwrite part of existing store TomNicholas 35968931 open 0     3 2022-12-28T00:40:16Z 2023-01-11T21:26:10Z   MEMBER      

What happened?

to_zarr threw an error when I tried to overwrite part of an existing zarr store.

What did you expect to happen?

With mode w I was expecting it to overwrite part of the store with no complaints.

I expected that because that's what the docstring of to_zarr says:

mode ({"w", "w-", "a", "r+", None}, optional) – Persistence mode: “w” means create (overwrite if exists); “w-” means create (fail if exists); “a” means override existing variables (create if does not exist);

The default mode is "w", so I was expecting it to overwrite.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np np.random.seed(0)

ds = xr.Dataset() ds["data"] = (['x', 'y'], np.random.random((100,100))) ds.to_zarr("test.zarr") print(ds["data"].mean().compute())

returns array(0.49645889) as expected

ds = xr.open_dataset("test.zarr", engine='zarr', chunks={}) ds["data"].mean().compute() print(ds["data"].mean().compute())

still returns array(0.49645889) as expected

ds.to_zarr("test.zarr", mode="a") ```

python <xarray.DataArray 'data' ()> array(0.49645889) <xarray.DataArray 'data' ()> array(0.49645889) Traceback (most recent call last): File "/home/tom/Documents/Work/Code/experimentation/bugs/datatree_nans/mwe_xarray.py", line 16, in <module> ds.to_zarr("test.zarr") File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/core/dataset.py", line 2091, in to_zarr return to_zarr( # type: ignore File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/api.py", line 1628, in to_zarr zstore = backends.ZarrStore.open_group( File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/xarray/backends/zarr.py", line 420, in open_group zarr_group = zarr.open_group(store, **open_kwargs) File "/home/tom/miniconda3/envs/xrdev3.9/lib/python3.9/site-packages/zarr/hierarchy.py", line 1389, in open_group raise ContainsGroupError(path) zarr.errors.ContainsGroupError: path '' contains a group

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

I would like to know what the intended result is supposed to be here, so that I can make sure datatree behaves the same way, see https://github.com/xarray-contrib/datatree/issues/168.

Environment

Main branch of xarray, zarr v2.13.3

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7403/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
906023492 MDExOlB1bGxSZXF1ZXN0NjU3MDYxODI5 5400 Multidimensional histogram TomNicholas 35968931 open 0     3 2021-05-28T20:38:53Z 2022-11-21T22:41:01Z   MEMBER   0 pydata/xarray/pulls/5400

Initial work on integrating the multi-dimensional dask-powered histogram functionality from xhistogram into xarray. Just working on the skeleton to fit around the histogram algorithm for now, to be filled in later.

  • [x] Closes #4610
  • [x] API skeleton
  • [x] Input checking
  • [ ] Internal blockwise algorithm from https://github.com/xgcm/xhistogram/pull/49
  • [x] Redirect plot.hist
  • [x] da.weighted().hist()
  • [ ] Tests added for results
  • [x] Hypothesis tests for different chunking patterns
  • [ ] Examples in documentation
  • [ ] Examples in docstrings
  • [x] Type hints (first time trying these so might be wrong)
  • [ ] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
  • [x] Range argument
  • [ ] Handle multidimensional bins (for a future PR? - See https://github.com/xgcm/xhistogram/pull/59)
  • [ ] Handle np.datetime64 dtypes by refactoring to use np.searchsorted (for a future PR? See discussion)
  • [ ] Fast path for uniform bin widths (for a future PR? See suggestion)

Question: da.hist() or da.histogram()?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5400/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1417378270 PR_kwDOAMm_X85BPGqR 7192 Example using Coarsen.construct to split map into regions TomNicholas 35968931 closed 0     3 2022-10-20T22:14:31Z 2022-10-21T18:14:59Z 2022-10-21T18:14:56Z MEMBER   0 pydata/xarray/pulls/7192

I realised there is very little documentation on Coarsen.construct, so I added this example.

Unsure whether it should instead live in the page on reshaping and reorganising data though, as it is essentially a reshape operation. EDIT: Now on the reshape page

  • [ ] ~~Closes #xxxx~~
  • [ ] ~~Tests added~~
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~

cc @jbusecke @paigem

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7192/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1370416843 PR_kwDOAMm_X84-z6DG 7023 Remove dask_array_type checks TomNicholas 35968931 closed 0     3 2022-09-12T19:31:04Z 2022-09-13T00:35:25Z 2022-09-13T00:35:22Z MEMBER   0 pydata/xarray/pulls/7023
  • [ ] From https://github.com/pydata/xarray/pull/7019#discussion_r968606140
  • [ ] ~~Tests added~~
  • [ ] ~~User visible changes (including notable bug fixes) are documented in whats-new.rst~~
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7023/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
400289716 MDU6SXNzdWU0MDAyODk3MTY= 2686 Is `create_test_data()` public API? TomNicholas 35968931 open 0     3 2019-01-17T14:00:20Z 2022-04-09T01:48:14Z   MEMBER      

We want to encourage people to use and extend xarray, and we already provide testing functions as public API to help with this.

One function I keep using when writing code which uses xarray is xarray.tests.test_dataset.create_test_data(). This is very useful for quickly writing tests for the same reasons that it's useful in xarray's internal tests, but it's not explicitly public API. This means that there's no guarantee it won't change/disappear, which is not ideal if you're trying to write a test suite for separate software. But so many tests in xarray rely on it that presumably it's not going to get changed.

Is there any reason why it shouldn't be public API? Is there something I should use instead?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2686/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1012428149 PR_kwDOAMm_X84shL9H 5834 Combine by coords dataarray bugfix TomNicholas 35968931 closed 0     3 2021-09-30T17:17:00Z 2021-10-29T19:57:36Z 2021-10-29T19:57:36Z MEMBER   0 pydata/xarray/pulls/5834

Also reorganised the logic that deals with combining mixed sets of objects (i.e. named dataarrays, unnamed dataarrays, datasets) that was added in #4696.

TODO - same reorganisation / testing but for combine_nested as well as combine_by_coords. EDIT: I'm going to do this in a separate PR, so that this bugfix can be merged without it.

  • [x] Closes #5833
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] ~~New functions/methods are listed in api.rst~~
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5834/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1020555552 PR_kwDOAMm_X84s6zAH 5846 Change return type of DataArray.chunks and Dataset.chunks to a dict TomNicholas 35968931 closed 0     3 2021-10-08T00:02:20Z 2021-10-26T15:52:00Z 2021-10-26T15:51:59Z MEMBER   1 pydata/xarray/pulls/5846

Rectifies the the issue in #5843 by making DataArray.chunks and Variable.chunks consistent with Dataset.chunks. This would obviously need a deprecation cycle before it were merged.

Currently a WIP - I changed the behaviour but this obviously broke quite a few tests and I haven't looked at them yet.

  • [x] Closes #5843
  • [ ] Tests added
  • [x] Passes pre-commit run --all-files
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5846/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
877944829 MDExOlB1bGxSZXF1ZXN0NjMxODI1Nzky 5274 Update release guide TomNicholas 35968931 closed 0     3 2021-05-06T19:50:53Z 2021-05-13T17:44:47Z 2021-05-13T17:44:47Z MEMBER   0 pydata/xarray/pulls/5274

Updated the release guide to account for what is now automated via github actions, and any other bits I felt could be clearer.

Now only 16 easy steps!

  • Motivated by #5232 and #5244
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5274/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
474247717 MDU6SXNzdWU0NzQyNDc3MTc= 3168 apply_ufunc erroneously operating on an empty array when dask used TomNicholas 35968931 closed 0     3 2019-07-29T20:44:23Z 2020-03-30T15:08:16Z 2020-03-30T15:08:15Z MEMBER      

Problem description

apply_ufunc with dask='parallelized' appears to be trying to act on an empty numpy array when the computation is specified, but before .compute() is called. In other words, a ufunc which just prints the shape of its argument will print (0,0) then print the correct shape once .compute() is called.

Minimum working example

```python import numpy as np import xarray as xr

def example_ufunc(x): print(x.shape) return np.mean(x, axis=-1)

def new_mean(da, dim): result = xr.apply_ufunc(example_ufunc, da, input_core_dims=[[dim]], dask='parallelized', output_dtypes=[da.dtype]) return result

shape = {'t': 2, 'x':3} data = xr.DataArray(data=np.random.rand(*shape.values()), dims=shape.keys()) unchunked = data chunked = data.chunk(shape)

actual = new_mean(chunked, dim='x') # raises the warning print(actual)

print(actual.compute()) # does the computation correctly ```

Result

(0, 0) /home/tnichol/anaconda3/envs/py36/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice. out=out, **kwargs) <xarray.DataArray (t: 2)> dask.array<shape=(2,), dtype=float64, chunksize=(2,)> Dimensions without coordinates: t (2, 3) <xarray.DataArray (t: 2)> array([0.147205, 0.402913]) Dimensions without coordinates: t

Expected result

Same thing without the (0,0) or the numpy warning.

Output of xr.show_versions()

(my xarray is up-to-date with master)

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-862.14.4.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.1 xarray: 0.12.3+23.g1d7bcbd pandas: 0.24.2 numpy: 1.16.4 scipy: 1.3.0 netCDF4: 1.4.2 pydap: None h5netcdf: None h5py: 2.8.0 Nio: None zarr: None cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 2.1.0 distributed: 2.1.0 matplotlib: 3.1.0 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 40.6.2 pip: 18.1 conda: None pytest: 4.0.0 IPython: 7.1.1 sphinx: 1.8.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3168/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
497184021 MDU6SXNzdWU0OTcxODQwMjE= 3334 plot.line fails when plot axis is a 1D coordinate TomNicholas 35968931 closed 0     3 2019-09-23T15:52:48Z 2019-09-26T08:51:59Z 2019-09-26T08:51:59Z MEMBER      

MCVE Code Sample

```python import xarray as xr import numpy as np

x_coord = xr.DataArray(data=[0.1, 0.2], dims=['x']) t_coord = xr.DataArray(data=[10, 20], dims=['t'])

da = xr.DataArray(data=np.array([[0, 1], [5, 9]]), dims=['x', 't'], coords={'x': x_coord, 'time': t_coord}) print(da)

da.transpose('time', 'x') Output: <xarray.DataArray (x: 2, t: 2)> array([[0, 1], [5, 9]]) Coordinates: * x (x) float64 0.1 0.2 time (t) int64 10 20

Traceback (most recent call last): File "mwe.py", line 22, in <module> da.transpose('time', 'x') File "/home/tegn500/Documents/Work/Code/xarray/xarray/core/dataarray.py", line 1877, in transpose "permuted array dimensions (%s)" % (dims, tuple(self.dims)) ValueError: arguments to transpose (('time', 'x')) must be permuted array dimensions (('x', 't')) ```

As 'time' is a coordinate with only one dimension, this is an unambiguous operation that I want to perform. However, because .transpose() currently only accepts dimensions, this fails with that error.

This causes bug in other parts of the code - for example I found this by trying to plot this type of dataarray: python da.plot(x='time', hue='x') which gives the same error.

(You can get a similar error also with da.plot(y='time', hue='x').)

If the code which explicitly checks that the arguments to transpose are dims and not just coordinate dimensions is removed, then both of these examples work as expected.

I would like to generalise the transpose function to also accept dimension coordinates, is there any reason not to do this?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3334/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
399389293 MDExOlB1bGxSZXF1ZXN0MjQ0ODI4NTY3 2678 Hotfix for #2662 TomNicholas 35968931 closed 0     3 2019-01-15T15:11:48Z 2019-02-02T23:50:40Z 2019-01-17T13:05:43Z MEMBER   0 pydata/xarray/pulls/2678
  • [x] Closes #2662

Explained in #2662. Also renamed some variables slightly for clarity.

Not sure how to add a test without refactoring the groupby into a separate function, as in it's current form the problem only manifests as a (huge) slowdown.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2678/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
404383025 MDU6SXNzdWU0MDQzODMwMjU= 2725 Line plot with x=coord putting wrong variables on axes TomNicholas 35968931 closed 0     3 2019-01-29T16:43:18Z 2019-01-30T02:02:22Z 2019-01-30T02:02:22Z MEMBER      

When I try to plot the values in a 1D DataArray against the values in one of its coordinates, it does not behave at all as expected:

```python import numpy as np import matplotlib.pyplot as plt from xarray import DataArray

current = DataArray(name='current', data=np.array([5, 8, 14, 22, 30]), dims=['time'], coords={'time': (['time'], np.array([0.1, 0.2, 0.3, 0.4, 0.5])), 'voltage': (['time'], np.array([100, 200, 300, 400, 500]))})

print(current)

Try to plot current against voltage

current.plot.line(x='voltage') plt.show() ``` Output:

<xarray.DataArray 'current' (time: 5)> array([ 5, 8, 14, 22, 30]) Coordinates: * time (time) float64 0.1 0.2 0.3 0.4 0.5 voltage (time) int64 100 200 300 400 500

Problem description

Not only is 'voltage' not on the x axis, but 'current' isn't on the y axis either!

Expected Output

Based on the documentation (and common sense) I would have expected it to plot voltage on the x axis and current on the y axis.

(using a branch of xarray which is up-to-date with master)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2725/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 34.098ms · About: xarray-datasette