home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

6 rows where user = 4502 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • open 4
  • closed 2

type 1

  • issue 6

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1732874789 I_kwDOAMm_X85nSZIl 7885 drop_indexes is reversed by assign_coords of unrelated coord mjwillson 4502 closed 0     2 2023-05-30T19:35:48Z 2023-08-29T14:23:31Z 2023-08-29T14:23:31Z NONE      

What happened?

I dropped an index on one coord, then later called assign_coords to change another unrelated coord. I expected the index on the original coord to stay dropped.

What did you expect to happen?

The index was silently created again.

Minimal Complete Verifiable Example

Python import xarray import numpy as np ds = xarray.Dataset( {'foo': (('x','y'), np.ones((3,5)))}, coords={'x': [1,2,3], 'y': [4,5,6,7,8]}) ds = ds.drop_indexes('x') assert 'x' not in ds.indexes ds = ds.assign_coords(y=ds.y+1) assert 'x' not in ds.indexes # Fails

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

In general it would be nice if xarray made it easier to avoid indexes being automatically created.

E.g. right now, as far as I can tell there's no way to avoid an index being created when you construct a DataArray or Dataset with a coordinate of the same name as a dimension.

Admittedly I have a slightly niche use case -- I'm using xarray with wrapped JAX arrays, which can't be converted into pandas indexes. Indexes being (re-)created in these cases isn't just an inconvenience it actually causes a crash.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 (main, Dec 7 2022, 13:47:07) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 6.1.20-2rodete1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.8 libnetcdf: 4.9.0 xarray: 999 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.0 netCDF4: 1.6.2 pydap: None h5netcdf: 1.1.0 h5py: 3.7.0 Nio: None zarr: 2.13.6+ds cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.4 cfgrib: 0.9.10.3 iris: None bottleneck: 1.3.5 dask: None distributed: None matplotlib: 3.6.3 cartopy: None seaborn: None numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip3: None conda: None pytest: 7.2.1 mypy: None IPython: 8.5.0 sphinx: 5.3.0
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7885/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1188262115 I_kwDOAMm_X85G03Dj 6429 FacetGrid padding goes very bad when cartopy projection specified mjwillson 4502 closed 0     2 2022-03-31T15:26:26Z 2023-04-28T13:06:14Z 2023-04-28T13:06:14Z NONE      

What happened?

When doing a faceted plot and specifying a projection (via e.g. subplot_kws=dict(projection=ccrs.PlateCarree())), the padding becomes very weird (often unusable) and strangely unstable to changes in size and aspect. For example this produces very bad results:

python data = xarray.DataArray( dims=('lat', 'lon', 'row', 'col'), data=np.ones((180, 360, 2, 2)), coords={'lon': np.arange(360), 'lat': np.arange(-90, 90)} ) xarray.plot.pcolormesh( data, row='row', col='col', size=5, aspect=1.5, subplot_kws=dict(projection=ccrs.PlateCarree()), )

whereas if you change size from 5 to 4, you suddenly get a much better (although still not quite right) layout:

What did you expect to happen?

I expected a layout closer to what you get if you comment out subplot_kws=dict(projection=ccrs.PlateCarree()), above:

Minimal Complete Verifiable Example

```Python import xarray import cartopy.crs as ccrs import numpy as np

data = xarray.DataArray( dims=('lat', 'lon', 'row', 'col'), data=np.ones((180, 360, 2, 2)), coords={'lon': np.arange(360), 'lat': np.arange(-90, 90)} ) xarray.plot.pcolormesh( data, row='row', col='col', size=5, aspect=1.5, subplot_kws=dict(projection=ccrs.PlateCarree()), ) ```

Relevant log output

In the case where the layout isn't (as) broken, I see a warning:

.../xarray/plot/facetgrid.py:394: UserWarning: Tight layout not applied. tight_layout cannot make axes width small enough to accommodate all axes decorations

It seems that when tight_layout does manage to get applied, things really go wrong. Perhaps at a minimum we could have a way to disable tight_layout? (which currently seems to be mandatory)

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.7.10 (stable, redacted, redacted) [Clang google3-trunk (e5b1b9edb8b6f6cd926c2ba3e1ad1b6f767021d6)] python-bits: 64 OS: Linux OS-release: 4.15.0-smp-920.39.0.0 machine: x86_64 processor: byteorder: little LC_ALL: en_US.UTF-8 LANG: None LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.6.1

xarray: 0.18.2 pandas: 1.1.5 numpy: 1.21.5 scipy: 1.2.1 netCDF4: 1.4.1 pydap: None h5netcdf: 0.11.0 h5py: 3.2.1 Nio: None zarr: 2.7.0 cftime: 1.0.0 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.4 cartopy: 0+unknown seaborn: 0.11.1 numbagg: None pint: None setuptools: unknown pip: None conda: None pytest: None IPython: 3.2.3 sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6429/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1415592761 I_kwDOAMm_X85UYDs5 7189 combine_by_coords allows one overlapping coordinate value, but not more than one mjwillson 4502 open 0     1 2022-10-19T21:13:28Z 2022-10-20T16:40:57Z   NONE      

What happened?

I expected combine_by_coords to explicitly reject all cases where the coordinates overlap, producing a ValueError in cases like the following with coords [0, 1] and [1, 2]:

a = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [0, 1]}) b = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [1, 2]}) xarray.combine_by_coords([a, b])

As well as cases with larger overlaps e.g. [0, 1, 2] and [1, 2, 3].

What did you expect to happen?

It fails to reject the case above with coordinates [0, 1] and [1, 2], despite rejecting cases where the overlap is bigger (e.g. [0, 1, 2] and [1, 2, 3]). Instead it returns a DataArray with duplicate coordinates. (See example below).

Minimal Complete Verifiable Example

```Python

This overlap is caught, as expected

a = xarray.DataArray(dims=('x',), data=np.ones((3,)), coords={'x': [0, 1, 2]}) b = xarray.DataArray(dims=('x',), data=np.ones((3,)), coords={'x': [1, 2, 3]}) xarray.combine_by_coords([a, b]) => ValueError: Resulting object does not have monotonic global indexes along dimension x

This overlap is not caught

a = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [0, 1]}) b = xarray.DataArray(dims=('x',), data=np.ones((2,)), coords={'x': [1, 2]}) xarray.combine_by_coords([a, b]) => <xarray.DataArray (x: 4)> array([1., 1., 1., 1.]) Coordinates: * x (x) int64 0 1 1 2 ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

As far as I can tell this happens because indexes.is_monotonic_increasing or indexes.is_monotonic_decreasing are not checking for strict monotonicity and allow consecutive values to be the same.

I assume it wasn't intentional to allow overlaps like this. If so, do you think anyone is depending on this (I'd hope not...) and would you take a PR to fix it to produce a ValueError in this case?

If this behaviour is intentional or relied upon, could we have an option to do a strict check instead?

Also for performance reasons I'd propose to do some extra upfront checks to catch index overlap (e.g. by checking for index overlap in _infer_concat_order_from_coords), rather than doing a potentially large concat and only detecting duplicates afterwards.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.9.15 (stable, redacted, redacted) [Clang google3-trunk (11897708c0229c92802e747564e7c34b722f045f)] python-bits: 64 OS: Linux OS-release: 5.18.16-1rodete1-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: None xarray: 2022.06.0 pandas: 1.1.5 numpy: 1.23.2 scipy: 1.8.1 netCDF4: None pydap: None h5netcdf: None h5py: 3.2.1 Nio: None zarr: 2.7.0 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.3.4 cartopy: None seaborn: 0.11.2 numbagg: None fsspec: 0.7.4 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: None pip: None conda: None pytest: None IPython: 3.2.3 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7189/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1266308714 I_kwDOAMm_X85LelZq 6680 Datatype for a 'shape specification' of a Dataset / DataArray mjwillson 4502 open 0     2 2022-06-09T15:25:36Z 2022-07-13T18:17:06Z   NONE      

Is your feature request related to a problem?

Often with xarray I find myself having to create a template Dataset or DataArray with dummy data in it just to specify the dimensions/sizes/coordinates/variable names that are required in some situation.

Describe the solution you'd like

It would be very useful to have a datatype that represents a shape specification (dimensions, sizes and coordinates) independently of the data so that we can do things like:

  • Implement xarray equivalents of functions like np.ones, np.zeros, np.random.normal(size=...) that are given a shape specification which the return value should conform to. (I have some more sophisticated / less trivial examples of this too, functions which currently need to be given templates for the return value but only depend on the shape of the template)
  • Test if two DataArrays / Datasets have the same shape
  • Memoize or cache things based on shape (this implies the shape spec would need to be hashable)
  • Make it easier to use xarray with libraries like tree / PyTree that can be used to flatten and unflatten a Dataset into its underlying arrays together with some specification of the shape of the data structure that can be used to unflatten it back again. (Right now I have to implement my own shape specification objects to do this)
  • Manipulate shape specifications e.g. by adding or removing dimensions from them without having to manipulate dummy template data in slightly arbitrary ways (e.g. template.isel(dim_to_be_dropped=0, drop=True)) in order to do this.

Describe alternatives you've considered

I realise that using lazy dask arrays largely removes the performance overhead of manipulating fake data, but (A) it still feels kinda ugly and adds boilerplate to construct the fake data, and (B) not everyone wants to depend on dask.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6680/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1074303184 I_kwDOAMm_X85ACJDQ 6053 A broadcasting sum for xarray.Dataset mjwillson 4502 open 0     3 2021-12-08T11:24:21Z 2022-07-08T11:49:24Z   NONE      

I've found it useful to have a version of Dataset.sum which sums variables in a way that's consistent with what would happen if they were broadcast to the full Dataset dimensions.

The difference is in what it does with variables that don't contain some of the dimensions it's asked to sum over: standard sum just ignores the summation over these dimensions for these variables, whereas a broadcasting_sum will multiply the variable by the product of sizes the missing dimensions, like so:

python def broadcast_sum(dataset, dims): def broadcast_sum_var(var): present_sum_dims = [dim for dim in dims if dim in var.dims] non_present_sum_dims = [dim for dim in dims if dim not in var.dims] return var.sum(present_sum_dims) * np.prod([dataset.sizes[dim] for dim in non_present_sum_dims]) return dataset.map(broadcast_sum_var)

This is consistent with mathematical sum notation, where the sum doesn't become a no-op just because the summand doesn't reference the index being summed over. E.g.:

$\sum_{n=1}^N x = N x$

I've found it useful when you need to do some broadcasting operations across different variables after the sum, and you want the summation done in a way that's consistent with the broadcasting logic that will be applied later.

Would you be open to adding this, and if so any preference how? (A separate method, an option to .sum ?)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6053/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1078656323 I_kwDOAMm_X85ASv1D 6075 Broadcasting doesn't respect scalar coordinates mjwillson 4502 open 0     1 2021-12-13T15:18:22Z 2021-12-13T22:10:14Z   NONE      

Usually if I apply a broadcasting operation to two arrays, the result only includes values for coordinates present in both. A simple example:

python In [160]: data_array = xarray.DataArray(dims=("x",), data=[1,2,3], coords={"x": [1,2,3]}) In [161]: data_array.sel(x=[2]) * data_array Out[161]: <xarray.DataArray (x: 1)> array([4]) Coordinates: * x (x) int64 2 However if I do the same thing but select a scalar value for the x coordinate (.sel(x=1)) yielding a scalar coordinate for x:

python In [164]: data_array.sel(x=2) Out[164]: <xarray.DataArray ()> array(2) Coordinates: x int64 2 In [165]: data_array.sel(x=2) * data_array Out[165]: <xarray.DataArray (x: 3)> array([2, 4, 6]) Coordinates: * x (x) int64 1 2 3

Here the result includes values at all the coordinates [1,2,3], all of which have been broadcast against the scalar value taken at coordinate 2. This doesn't seem correct in general -- values from different coordinates shouldn't be broadcast against eachother by default, even if one of them is a scalar.

I would expect this either to result in an error or warning, or to select only the corresponding value at the scalar coordinate in question to broadcast against, resulting in e.g.:

python <xarray.DataArray ()> array(4) Coordinates: x int64 2

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6075/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 805.389ms · About: xarray-datasette