home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

6 rows where state = "open" and user = 1828519 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 2

  • issue 4
  • pull 2

state 1

  • open · 6 ✖

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2244518111 PR_kwDOAMm_X85suNEO 8946 Fix upcasting with python builtin numbers and numpy 2 djhoese 1828519 open 0     18 2024-04-15T20:07:42Z 2024-04-29T12:38:55Z   CONTRIBUTOR   0 pydata/xarray/pulls/8946

See #8402 for more discussion. Bottom line is that numpy 2 changes the rules for casting between two inputs. Due to this and xarray's preference for promoting python scalars to 0d arrays (scalar arrays), xarray objects are being upcast to higher data types when they previously didn't.

I'm mainly opening this PR for further and more detailed discussion.

CC @dcherian

  • [ ] Closes #8402
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8946/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1974350560 I_kwDOAMm_X851rjLg 8402 `where` dtype upcast with numpy 2 djhoese 1828519 open 0     10 2023-11-02T14:12:49Z 2024-04-15T19:18:49Z   CONTRIBUTOR      

What happened?

I'm testing my code with numpy 2.0 and current main xarray and dask and ran into a change that I guess is expected given the way xarray does things, but want to make sure as it could be unexpected for many users.

Doing DataArray.where with an integer array less than 64-bits and an integer as the new value will upcast the array to 64-bit integers (python's int). With old versions of numpy this would preserve the dtype of the array. As far as I can tell the relevant xarray code hasn't changed so this seems to be more about numpy making things more consistent.

The main problem seems to come down to:

https://github.com/pydata/xarray/blob/d933578ebdc4105a456bada4864f8ffffd7a2ced/xarray/core/duck_array_ops.py#L218

As this converts my scalar input int to a numpy array. If it didn't do this array conversion then numpy works as expected. See the MCVE for the xarray specific example, but here's the numpy equivalent:

```python import numpy as np

a = np.zeros((2, 2), dtype=np.uint16)

what I'm intending to do with my xarray data_arr.where(cond, 2)

np.where(a != 0, a, 2).dtype

dtype('uint16')

equivalent to what xarray does:

np.where(a != 0, a, np.asarray(2)).dtype

dtype('int64')

workaround, cast my scalar to a specific numpy type

np.where(a != 0, a, np.asarray(np.uint16(2))).dtype

dtype('uint16')

```

From a numpy point of view, the second where call makes sense that 2 arrays should be upcast to the same dtype so they can be combined. But from an xarray user point of view, I'm entering a scalar so I expect it to be the same as the first where call above.

What did you expect to happen?

See above.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

data_arr = xr.DataArray(np.array([1, 2], dtype=np.uint16)) print(data_arr.where(data_arr == 2, 3).dtype)

int64

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

Numpy 1.x preserves the dtype.

```python In [1]: import numpy as np

In [2]: np.asarray(2).dtype Out[2]: dtype('int64')

In [3]: a = np.zeros((2, 2), dtype=np.uint16)

In [4]: np.where(a != 0, a, np.asarray(2)).dtype Out[4]: dtype('uint16')

In [5]: np.where(a != 0, a, np.asarray(np.uint16(2))).dtype Out[5]: dtype('uint16') ```

Environment

``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 6.4.6-76060406-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.2.dev21+gfcdc8102 pandas: 2.2.0.dev0+495.gecf449b503 numpy: 2.0.0.dev0+git20231031.42c33f3 scipy: 1.12.0.dev0+1903.18d0a2f netCDF4: 1.6.5 pydap: None h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7.post0.dev7 dask: 2023.10.1+4.g91098a63 distributed: 2023.10.1+5.g76dd8003 matplotlib: 3.9.0.dev0 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: 0.22 sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.14.0 sphinx: 7.1.2 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8402/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1750685808 PR_kwDOAMm_X85SqoXL 7905 Add '.hdf' extension to 'netcdf4' backend djhoese 1828519 open 0     10 2023-06-10T00:45:15Z 2023-06-14T15:25:08Z   CONTRIBUTOR   0 pydata/xarray/pulls/7905

I'm helping @joleenf debug an issue where some old code that uses xr.open_dataset no longer works since the introduction of engines or at least as far as we can tell. The main issue is that she's using code that assumes the NetCDF4 C library was compiled with HDF4 support (ex. conda-forge builds with this functionality enabled). So in this case netCDF4.Dataset("my_file.hdf") can actually read the HDF4 file through the NetCDF4 C library.

However, with xr.open_dataset("my_file.hdf") will fail because xarray (or rather the netcdf4 engine) doesn't know that it could potentially read HDF4 files. This PR adds the .hdf extension to the 'netcdf4' engine to allow this to be automatic without needing engine='netcdf4' to be specified.

What do people think? I didn't want to put any more work into this until others weighed in.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7905/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
573031381 MDU6SXNzdWU1NzMwMzEzODE= 3813 Xarray operations produce read-only array djhoese 1828519 open 0     7 2020-02-28T22:07:59Z 2023-03-22T15:11:14Z   CONTRIBUTOR      

I've turned on testing my Satpy package with unstable or pre-releases of some of our dependencies including numpy and xarray. I've found one error so far where in previous versions of xarray it was possible to assign to the numpy array taken from a DataArray.

MCVE Code Sample

```python import numpy as np import dask.array as da import xarray as xr

data = np.arange(15, 301, 15).reshape(2, 10) data_arr = xr.DataArray(data, dims=('y', 'x'), attrs={'test': 'test'}) data_arr = data_arr.copy() data_arr = data_arr.expand_dims('bands') data_arr['bands'] = ['L'] n_arr = np.asarray(data_arr.data) n_arr[n_arr == 45] = 5

```

Which results in:

```

ValueError Traceback (most recent call last) <ipython-input-12-90dae37dd808> in <module> ----> 1 n_arr = np.asarray(data_arr.data); n_arr[n_arr == 45] = 5

ValueError: assignment destination is read-only ```

Expected Output

A writable array. No error.

Problem Description

If this is expected new behavior then so be it, but wanted to check with the xarray devs before I tried to work around it.

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-7629-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.1.dev21+g20e6236f pandas: 1.1.0.dev0+630.gedcf1c8f8 numpy: 1.19.0.dev0+acba244 scipy: 1.5.0.dev0+f614064 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.11.0+13.gfcc500c2 distributed: 2.11.0+7.g0d7a31ad matplotlib: 3.2.0rc3 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.2.0.post20200209 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: 2.4.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3813/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
341331807 MDU6SXNzdWUzNDEzMzE4MDc= 2288 Add CRS/projection information to xarray objects djhoese 1828519 open 0     45 2018-07-15T16:02:55Z 2022-10-14T20:27:26Z   CONTRIBUTOR      

Problem description

This issue is to start the discussion for a feature that would be helpful to a lot of people. It may not necessarily be best to put it in xarray, but let's figure that out. I'll try to describe things below to the best of my knowledge. I'm typically thinking of raster/image data when it comes to this stuff, but it could probably be used for GIS-like point data.

Geographic data can be projected (uniform grid) or unprojected (nonuniform). Unprojected data typically has longitude and latitude values specified per-pixel. I don't think I've ever seen non-uniform data in a projected space. Projected data can be specified by a CRS (PROJ.4), a number of pixels (shape), and extents/bbox in CRS units (xmin, ymin, xmax, ymax). This could also be specified in different ways like origin (X, Y) and pixel size. Seeing as xarray already computes all coords data it makes sense for extents and array shape to be used. With this information provided in an xarray object any library could check for these properties and know where to place the data on a map.

So the question is: Should these properties be standardized in xarray Dataset/DataArray objects and how?

Related libraries and developers

  • pyresample (me, @mraspaud, @pnuu)
  • verde and gmt-python (@leouieda)
  • metpy (@dopplershift)
  • geo-xarray (@andrewdhicks)
  • rasterio
  • cartopy

I know @WeatherGod also showed interest on gitter.

Complications and things to consider

  1. Other related coordinate systems like ECEF where coordinates are specified in three dimensions (X, Y, Z). Very useful for calculations like nearest neighbor of lon/lat points or for comparisons between two projected coordinate systems.
  2. Specifying what coords arrays are the CRS coordinates or geographic coordinates in general.
  3. If xarray should include these properties, where is the line drawn for what functionality xarray supports? Resampling/gridding, etc?
  4. How is the CRS object represented? PROJ.4 string, PROJ.4 dict, existing libraries CRS object, new CRS object, pyproj.Proj object?
  5. Affine versus geotransforms instead of extents: https://github.com/mapbox/rasterio/blob/master/docs/topics/migrating-to-v1.rst#affineaffine-vs-gdal-style-geotransforms
  6. Similar to 4, I never mentioned "rotation" parameters which some users may want and are specified in the affine/geotransform.
  7. Dynamically generated extents/affine objects so that slicing operations don't have to be handled specially.
  8. Center of pixel coordinates versus outer edge of pixel coordinates.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2288/reactions",
    "total_count": 14,
    "+1": 14,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
449840662 MDU6SXNzdWU0NDk4NDA2NjI= 2996 Checking non-dimensional coordinates for equality djhoese 1828519 open 0     3 2019-05-29T14:24:41Z 2021-03-02T05:08:32Z   CONTRIBUTOR      

Code Sample, a copy-pastable example if possible

I'm working on a proof-of-concept for the geoxarray project where I'd like to store coordinate reference system (CRS) information in the coordinates of a DataArray or Dataset object. I'd like to avoid subclassing objects and instead depend completely on xarray accessors to implement any utilities I need.

I'm having trouble deciding what the best place is for this CRS information so that it benefits the user; .coords made the most sense. My hope was that adding two DataArrays together with two different crs coordinates would cause an error, but found out that since crs is not a dimension it doesn't get treated the same way; even when changing join method to 'exact'.

```python from pyproj import CRS import xarray as xr import dask.array as da

crs1 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=25 +lat_1=25') crs2 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=35 +lat_1=35')

a = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'y': da.arange(1, 6, chunks=3), 'x': da.arange(2, 7, chunks=3), 'crs': crs1, 'test': 1, 'test2': 2})

b = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'y': da.arange(1, 6, chunks=3), 'x': da.arange(2, 7, chunks=3), 'crs': crs2, 'test': 2, 'test2': 2})

a + b

Results in:

<xarray.DataArray 'zeros-e5723e7f9121b7ac546f61c19dabe786' (y: 5, x: 5)>

dask.array<shape=(5, 5), dtype=float64, chunksize=(2, 2)>

Coordinates:

* y (y) int64 1 2 3 4 5

* x (x) int64 2 3 4 5 6

test2 int64 2

```

In the above code I was hoping that because the crs coordinates are different (lat_0 and lat_1 are different and crs1 != crs2) that I could get it to raise an exception.

Any ideas for how I might be able to accomplish something like this? I'm not an expert on xarray/pandas indexes, but could this be another possible solution?

Edit: xr.merge with compat='no_conflicts' does detect this difference.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2996/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 23.754ms · About: xarray-datasette