home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where comments = 10 and user = 1828519 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 2
  • pull 1

state 2

  • open 2
  • closed 1

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1974350560 I_kwDOAMm_X851rjLg 8402 `where` dtype upcast with numpy 2 djhoese 1828519 open 0     10 2023-11-02T14:12:49Z 2024-04-15T19:18:49Z   CONTRIBUTOR      

What happened?

I'm testing my code with numpy 2.0 and current main xarray and dask and ran into a change that I guess is expected given the way xarray does things, but want to make sure as it could be unexpected for many users.

Doing DataArray.where with an integer array less than 64-bits and an integer as the new value will upcast the array to 64-bit integers (python's int). With old versions of numpy this would preserve the dtype of the array. As far as I can tell the relevant xarray code hasn't changed so this seems to be more about numpy making things more consistent.

The main problem seems to come down to:

https://github.com/pydata/xarray/blob/d933578ebdc4105a456bada4864f8ffffd7a2ced/xarray/core/duck_array_ops.py#L218

As this converts my scalar input int to a numpy array. If it didn't do this array conversion then numpy works as expected. See the MCVE for the xarray specific example, but here's the numpy equivalent:

```python import numpy as np

a = np.zeros((2, 2), dtype=np.uint16)

what I'm intending to do with my xarray data_arr.where(cond, 2)

np.where(a != 0, a, 2).dtype

dtype('uint16')

equivalent to what xarray does:

np.where(a != 0, a, np.asarray(2)).dtype

dtype('int64')

workaround, cast my scalar to a specific numpy type

np.where(a != 0, a, np.asarray(np.uint16(2))).dtype

dtype('uint16')

```

From a numpy point of view, the second where call makes sense that 2 arrays should be upcast to the same dtype so they can be combined. But from an xarray user point of view, I'm entering a scalar so I expect it to be the same as the first where call above.

What did you expect to happen?

See above.

Minimal Complete Verifiable Example

```Python import xarray as xr import numpy as np

data_arr = xr.DataArray(np.array([1, 2], dtype=np.uint16)) print(data_arr.where(data_arr == 2, 3).dtype)

int64

```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

Numpy 1.x preserves the dtype.

```python In [1]: import numpy as np

In [2]: np.asarray(2).dtype Out[2]: dtype('int64')

In [3]: a = np.zeros((2, 2), dtype=np.uint16)

In [4]: np.where(a != 0, a, np.asarray(2)).dtype Out[4]: dtype('uint16')

In [5]: np.where(a != 0, a, np.asarray(np.uint16(2))).dtype Out[5]: dtype('uint16') ```

Environment

``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.4 | packaged by conda-forge | (main, Jun 10 2023, 18:08:17) [GCC 12.2.0] python-bits: 64 OS: Linux OS-release: 6.4.6-76060406-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: 4.9.2 xarray: 2023.10.2.dev21+gfcdc8102 pandas: 2.2.0.dev0+495.gecf449b503 numpy: 2.0.0.dev0+git20231031.42c33f3 scipy: 1.12.0.dev0+1903.18d0a2f netCDF4: 1.6.5 pydap: None h5netcdf: 1.2.0 h5py: 3.10.0 Nio: None zarr: 2.16.1 cftime: 1.6.3 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7.post0.dev7 dask: 2023.10.1+4.g91098a63 distributed: 2023.10.1+5.g76dd8003 matplotlib: 3.9.0.dev0 cartopy: None seaborn: None numbagg: None fsspec: 2023.6.0 cupy: None pint: 0.22 sparse: None flox: None numpy_groupies: None setuptools: 68.0.0 pip: 23.2.1 conda: None pytest: 7.4.0 mypy: None IPython: 8.14.0 sphinx: 7.1.2 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8402/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1085992113 I_kwDOAMm_X85Auuyx 6092 DeprecationWarning regarding use of distutils Version classes djhoese 1828519 closed 0     10 2021-12-21T16:11:08Z 2023-09-05T06:39:39Z 2021-12-24T14:50:48Z CONTRIBUTOR      

What happened:

While working on some tests that catch and check for warnings in my library I found that xarray with new versions of Python (I think this is the trigger) causes a ton of DeprecationWarnings on import:

python /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/xarray/core/pycompat.py:22: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. duck_array_version = LooseVersion(duck_array_module.__version__)

What you expected to happen:

No warnings.

Minimal Complete Verifiable Example:

```python import warnings warnings.simplefilter("always")

import xarray as xr ```

Results in:

/home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/xarray/core/pycompat.py:22: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. duck_array_version = LooseVersion(duck_array_module.__version__) /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/xarray/core/pycompat.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. duck_array_version = LooseVersion("0.0.0") /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/xarray/core/pycompat.py:37: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. duck_array_version = LooseVersion("0.0.0") /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other) /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other) /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/xarray/core/npcompat.py:82: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(np.__version__) >= "1.20.0": /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other) /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/xarray/core/pdcompat.py:45: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. if LooseVersion(pd.__version__) < "0.25.0": /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. other = LooseVersion(other)

Anything else we need to know?:

Environment:

Output of <tt>xr.show_versions()</tt> ``` /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/site-packages/_distutils_hack/__init__.py:35: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") /home/davidh/miniconda3/envs/satpy_py39/lib/python3.9/asyncio/base_events.py:681: ResourceWarning: unclosed event loop <_UnixSelectorEventLoop running=False closed=False debug=False> _warn(f"unclosed event loop {self!r}", ResourceWarning, source=self) ResourceWarning: Enable tracemalloc to get the object allocation traceback INSTALLED VERSIONS ------------------ commit: None python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.15.5-76051505-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.2 pandas: 1.3.5 numpy: 1.20.3 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: 0.12.0 h5py: 3.6.0 Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.2.10 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2021.12.0 distributed: 2021.12.0 matplotlib: 3.5.1 cartopy: 0.20.1 seaborn: None numbagg: None fsspec: 2021.11.1 cupy: None pint: None sparse: None setuptools: 60.0.3 pip: 21.3.1 conda: None pytest: 6.2.5 IPython: 7.30.1 sphinx: 4.3.2 ```
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6092/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1750685808 PR_kwDOAMm_X85SqoXL 7905 Add '.hdf' extension to 'netcdf4' backend djhoese 1828519 open 0     10 2023-06-10T00:45:15Z 2023-06-14T15:25:08Z   CONTRIBUTOR   0 pydata/xarray/pulls/7905

I'm helping @joleenf debug an issue where some old code that uses xr.open_dataset no longer works since the introduction of engines or at least as far as we can tell. The main issue is that she's using code that assumes the NetCDF4 C library was compiled with HDF4 support (ex. conda-forge builds with this functionality enabled). So in this case netCDF4.Dataset("my_file.hdf") can actually read the HDF4 file through the NetCDF4 C library.

However, with xr.open_dataset("my_file.hdf") will fail because xarray (or rather the netcdf4 engine) doesn't know that it could potentially read HDF4 files. This PR adds the .hdf extension to the 'netcdf4' engine to allow this to be automatic without needing engine='netcdf4' to be specified.

What do people think? I didn't want to put any more work into this until others weighed in.

  • [ ] Closes #xxxx
  • [ ] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7905/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 352.752ms · About: xarray-datasette