id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1974350560,I_kwDOAMm_X851rjLg,8402,`where` dtype upcast with numpy 2,1828519,open,0,,,10,2023-11-02T14:12:49Z,2024-04-15T19:18:49Z,,CONTRIBUTOR,,,,"### What happened? I'm testing my code with numpy 2.0 and current `main` xarray and dask and ran into a change that I guess is expected given the way xarray does things, but want to make sure as it could be unexpected for many users. Doing `DataArray.where` with an integer array less than 64-bits and an integer as the new value will upcast the array to 64-bit integers (python's `int`). With old versions of numpy this would preserve the dtype of the array. As far as I can tell the relevant xarray code hasn't changed so this seems to be more about numpy making things more consistent. The main problem seems to come down to: https://github.com/pydata/xarray/blob/d933578ebdc4105a456bada4864f8ffffd7a2ced/xarray/core/duck_array_ops.py#L218 As this converts my scalar input `int` to a numpy array. If it didn't do this array conversion then numpy works as expected. See the MCVE for the xarray specific example, but here's the numpy equivalent: ```python import numpy as np a = np.zeros((2, 2), dtype=np.uint16) # what I'm intending to do with my xarray `data_arr.where(cond, 2)` np.where(a != 0, a, 2).dtype # dtype('uint16') # equivalent to what xarray does: np.where(a != 0, a, np.asarray(2)).dtype # dtype('int64') # workaround, cast my scalar to a specific numpy type np.where(a != 0, a, np.asarray(np.uint16(2))).dtype # dtype('uint16') ``` From a numpy point of view, the second where call makes sense that 2 arrays should be upcast to the same dtype so they can be combined. But from an xarray user point of view, I'm entering a scalar so I expect it to be the same as the first where call above. ### What did you expect to happen? See above. ### Minimal Complete Verifiable Example ```Python import xarray as xr import numpy as np data_arr = xr.DataArray(np.array([1, 2], dtype=np.uint16)) print(data_arr.where(data_arr == 2, 3).dtype) # int64 ``` ### MVCE confirmation - [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray. - [X] Complete example — the example is self-contained, including all data and the text of any traceback. - [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result. - [X] New issue — a search of GitHub Issues suggests this is not a duplicate. - [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies. ### Relevant log output _No response_ ### Anything else we need to know? Numpy 1.x preserves the dtype. ```python In [1]: import numpy as np In [2]: np.asarray(2).dtype Out[2]: dtype('int64') In [3]: a = np.zeros((2, 2), dtype=np.uint16) In [4]: np.where(a != 0, a, np.asarray(2)).dtype Out[4]: dtype('uint16') In [5]: np.where(a != 0, a, np.asarray(np.uint16(2))).dtype Out[5]: dtype('uint16') ``` ### Environment
``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 6.4.6-76060406-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.2.dev21+gfcdc8102 pandas: 2.2.0.dev0+495.gecf449b503 numpy: 2.0.0.dev0+git20231031.42c33f3 scipy: 1.12.0.dev0+1903.18d0a2f netCDF4: 1.6.5 pydap: None h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7.post0.dev7 dask: 2023.10.1+4.g91098a63 distributed: 2023.10.1+5.g76dd8003 matplotlib: 3.9.0.dev0 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: 0.22 sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.14.0 sphinx: 7.1.2 ```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8402/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 573031381,MDU6SXNzdWU1NzMwMzEzODE=,3813,Xarray operations produce read-only array,1828519,open,0,,,7,2020-02-28T22:07:59Z,2023-03-22T15:11:14Z,,CONTRIBUTOR,,,,"I've turned on testing my Satpy package with unstable or pre-releases of some of our dependencies including numpy and xarray. I've found one error so far where in previous versions of xarray it was possible to assign to the numpy array taken from a DataArray. #### MCVE Code Sample ```python import numpy as np import dask.array as da import xarray as xr data = np.arange(15, 301, 15).reshape(2, 10) data_arr = xr.DataArray(data, dims=('y', 'x'), attrs={'test': 'test'}) data_arr = data_arr.copy() data_arr = data_arr.expand_dims('bands') data_arr['bands'] = ['L'] n_arr = np.asarray(data_arr.data) n_arr[n_arr == 45] = 5 ``` Which results in: ``` --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in ----> 1 n_arr = np.asarray(data_arr.data); n_arr[n_arr == 45] = 5 ValueError: assignment destination is read-only ``` #### Expected Output A writable array. No error. #### Problem Description If this is expected new behavior then so be it, but wanted to check with the xarray devs before I tried to work around it. #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:33:48) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.3.0-7629-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.1.dev21+g20e6236f pandas: 1.1.0.dev0+630.gedcf1c8f8 numpy: 1.19.0.dev0+acba244 scipy: 1.5.0.dev0+f614064 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.3 cfgrib: None iris: None bottleneck: None dask: 2.11.0+13.gfcc500c2 distributed: 2.11.0+7.g0d7a31ad matplotlib: 3.2.0rc3 cartopy: 0.17.0 seaborn: None numbagg: None setuptools: 45.2.0.post20200209 pip: 20.0.2 conda: None pytest: 5.3.5 IPython: 7.12.0 sphinx: 2.4.3
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3813/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 341331807,MDU6SXNzdWUzNDEzMzE4MDc=,2288,Add CRS/projection information to xarray objects,1828519,open,0,,,45,2018-07-15T16:02:55Z,2022-10-14T20:27:26Z,,CONTRIBUTOR,,,,"#### Problem description This issue is to start the discussion for a feature that would be helpful to a lot of people. It may not necessarily be best to put it in xarray, but let's figure that out. I'll try to describe things below to the best of my knowledge. I'm typically thinking of raster/image data when it comes to this stuff, but it could probably be used for GIS-like point data. Geographic data can be projected (uniform grid) or unprojected (nonuniform). Unprojected data typically has longitude and latitude values specified per-pixel. I don't think I've ever seen non-uniform data in a projected space. Projected data can be specified by a CRS (PROJ.4), a number of pixels (shape), and extents/bbox in CRS units (xmin, ymin, xmax, ymax). This could also be specified in different ways like origin (X, Y) and pixel size. Seeing as xarray already computes all `coords` data it makes sense for extents and array shape to be used. With this information provided in an xarray object any library could check for these properties and know where to place the data on a map. So the question is: Should these properties be standardized in xarray Dataset/DataArray objects and how? #### Related libraries and developers * pyresample (me, @mraspaud, @pnuu) * verde and gmt-python (@leouieda) * metpy (@dopplershift) * [geo-xarray](https://github.com/andrewdhicks/geo-xarray/wiki) (@andrewdhicks) * rasterio * cartopy I know @WeatherGod also showed interest on gitter. #### Complications and things to consider 1. Other related coordinate systems like [ECEF](https://en.wikipedia.org/wiki/ECEF) where coordinates are specified in three dimensions (X, Y, Z). Very useful for calculations like nearest neighbor of lon/lat points or for comparisons between two projected coordinate systems. 2. Specifying what coords arrays are the CRS coordinates or geographic coordinates in general. 3. If xarray should include these properties, where is the line drawn for what functionality xarray supports? Resampling/gridding, etc? 4. How is the CRS object represented? PROJ.4 string, PROJ.4 dict, existing libraries CRS object, new CRS object, `pyproj.Proj` object? 5. Affine versus geotransforms instead of extents: https://github.com/mapbox/rasterio/blob/master/docs/topics/migrating-to-v1.rst#affineaffine-vs-gdal-style-geotransforms 6. Similar to 4, I never mentioned ""rotation"" parameters which some users may want and are specified in the affine/geotransform. 7. Dynamically generated extents/affine objects so that slicing operations don't have to be handled specially. 8. Center of pixel coordinates versus outer edge of pixel coordinates.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2288/reactions"", ""total_count"": 14, ""+1"": 14, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 449840662,MDU6SXNzdWU0NDk4NDA2NjI=,2996,Checking non-dimensional coordinates for equality,1828519,open,0,,,3,2019-05-29T14:24:41Z,2021-03-02T05:08:32Z,,CONTRIBUTOR,,,,"#### Code Sample, a copy-pastable example if possible I'm working on a proof-of-concept for the [`geoxarray` project](https://github.com/geoxarray/geoxarray) where I'd like to store coordinate reference system (CRS) information in the coordinates of a DataArray or Dataset object. I'd like to avoid subclassing objects and instead depend completely on xarray accessors to implement any utilities I need. I'm having trouble deciding what the best place is for this CRS information so that it benefits the user; `.coords` made the most sense. My hope was that adding two DataArrays together with two different `crs` coordinates would cause an error, but found out that since `crs` is not a dimension it doesn't get treated the same way; even when changing `join` method to `'exact'`. ```python from pyproj import CRS import xarray as xr import dask.array as da crs1 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=25 +lat_1=25') crs2 = CRS.from_string('+proj=lcc +datum=WGS84 +lon_0=-95 +lat_0=35 +lat_1=35') a = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'y': da.arange(1, 6, chunks=3), 'x': da.arange(2, 7, chunks=3), 'crs': crs1, 'test': 1, 'test2': 2}) b = xr.DataArray(da.zeros((5, 5), chunks=2), dims=('y', 'x'), coords={'y': da.arange(1, 6, chunks=3), 'x': da.arange(2, 7, chunks=3), 'crs': crs2, 'test': 2, 'test2': 2}) a + b # Results in: # # dask.array # Coordinates: # * y (y) int64 1 2 3 4 5 # * x (x) int64 2 3 4 5 6 # test2 int64 2 ``` In the above code I was hoping that because the `crs` coordinates are different (lat_0 and lat_1 are different and `crs1 != crs2`) that I could get it to raise an exception. Any ideas for how I might be able to accomplish something like this? I'm not an expert on xarray/pandas indexes, but could this be another possible solution? Edit: `xr.merge` with `compat='no_conflicts'` does detect this difference.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2996/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue