home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where state = "open" and user = 3958036 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 3

state 1

  • open · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1798441216 I_kwDOAMm_X85rMgkA 7978 Clean up colormap code? johnomotani 3958036 open 0     1 2023-07-11T08:49:36Z 2023-07-11T15:47:52Z   CONTRIBUTOR      

In fixing some bugs with color bars (https://github.com/pydata/xarray/pull/3601), we had to do some clunky workarounds because of limitations of matplotlib's API for modifying colormaps - this prompted a matplotlib issue https://github.com/matplotlib/matplotlib/issues/16296#issuecomment-1629755861. That issue has now been closed, and apparently the limitations are now fixed, so it should be possible to tidy up some of the colorbar code (at least at some point, once the oldest xarray-supported matplotlib includes the new API).

I have no time to look into this myself, but opening this issue to flag the new matplotlib features in case someone is looking at refactoring colorbar or colormap code.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7978/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1022478180 I_kwDOAMm_X8488cdk 5852 Surprising behaviour of Dataset/DataArray.interp() with NaN entries johnomotani 3958036 open 0     0 2021-10-11T09:33:11Z 2021-10-11T09:33:11Z   CONTRIBUTOR      

I think this is due to documented 'undefined behaviour' of scipy.interpolate.interp1d, so not really a bug, but I think it would be more user-friendly if xarray gave an error in this case rather than producing an 'incorrect' result.

What happened:

If a DataArray contains a NaN value and is interpolated, output values that do not depend on the entry that was NaN may still be NaN.

What you expected to happen:

The docs for scipy.interpolate.interp1d say

Calling interp1d with NaNs present in input values results in undefined behaviour.

which explain the output below, and presumably mean it is not fixable on the xarray side (short of some ugly work-around). I think it would be good though to check for NaNs in DataArray/Dataset.interp(), and if they are present raise an exception (or possibly a warning?) about 'undefined behaviour'.

scipy.interpolate.interp2d has a similar note, while scipy.interpolate.interpn does not mention it (but has very limited information).

What I'd initially expected was an output would be valid at locations in the array that shouldn't depend on the NaN input: interpolating a 2d DataArray (with dims x and y) in the x-dimension, if only one y-index in the input has a NaN value, that y-index in the output might contain NaNs, but the others should be OK.

Minimal Complete Verifiable Example:

```python import numpy as np import xarray as xr

da = xr.DataArray(np.ones([3, 4]), dims=("x", "y"))

da[0, 0] = float("nan")

newx = np.linspace(0., 3., 5)

interp_da = da.interp(x=newx)

print(interp_da) ```

On my system, this gives output: <xarray.DataArray (x: 5, y: 4)> array([[nan, 1., 1., 1.], [nan, 1., 1., 1.], [ 1., 1., 1., 1.], [nan, nan, nan, nan], [nan, nan, nan, nan]]) Coordinates: * x (x) float64 0.0 0.75 1.5 2.25 3.0 Dimensions without coordinates: y [Surprisingly, I get the same output even using method="nearest".]

You might expect at least the following, with NaN only at y=0: <xarray.DataArray (x: 5, y: 4)> array([[nan, 1., 1., 1.], [nan, 1., 1., 1.], [ 1., 1., 1., 1.], [nan, 1., 1., 1.], [nan, 1., 1., 1.]]) Coordinates: * x (x) float64 0.0 0.75 1.5 2.25 3.0 Dimensions without coordinates: y

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.11.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: ('en_GB', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.8.0 xarray: 0.19.0 pandas: 1.3.1 numpy: 1.21.1 scipy: 1.7.1 netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: 3.3.0 Nio: None zarr: None cftime: 1.5.0 nc_time_axis: 1.3.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.4.2 cartopy: None seaborn: 0.11.1 numbagg: None pint: 0.17 setuptools: 49.6.0.post20210108 pip: 21.2.4 conda: 4.10.3 pytest: 6.2.4 IPython: 7.26.0 sphinx: 4.1.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5852/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
940702754 MDU6SXNzdWU5NDA3MDI3NTQ= 5589 Call .compute() in all plot methods? johnomotani 3958036 open 0     3 2021-07-09T12:03:30Z 2021-07-09T15:57:27Z   CONTRIBUTOR      

I noticed what I think might be a performance bug: should .compute() be called on the input data in all the plotting methods (e.g. plot.pcolormesh()) like it is in .plot() here https://github.com/pydata/xarray/blob/bf27e2c1fd81f5e0afe1ef91c13f651db54b44bb/xarray/plot/plot.py#L166 See also discussion in https://github.com/pydata/xarray/issues/2390.

I was making plots from a large dataset of a quantity that is the output of quite a bit of computation. A script which made an animation of the full time-series (a couple of thousand time points) actually ran significantly faster than a script that made pcolormesh plots of just 3 time points (~2hrs compared to ~5hrs). The difference I can think of is that the animation script called .values before the animation function, but the plotting script called xarray.plot.pcolormesh() without calling .values/.load()/.compute() first. A modified version of the script that calls .load() before any plot calls reduced the run time to 30 mins even though I plotted 18 time points, not just 3.

2d plots might all be covered by adding a darray = darray.compute() call in newplotfunc()?https://github.com/pydata/xarray/blob/bf27e2c1fd81f5e0afe1ef91c13f651db54b44bb/xarray/plot/plot.py#L609 I guess the 1d plot functions would all need to be modified individually.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5589/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 23.879ms · About: xarray-datasette