home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

3 rows where repo = 13221727, state = "closed" and user = 132147 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 2
  • issue 1

state 1

  • closed · 3 ✖

repo 1

  • xarray · 3 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1773886549 I_kwDOAMm_X85pu1xV 7942 Numpy raises warning in `xarray.coding.times.cast_to_int_if_safe` mx-moth 132147 closed 0     2 2023-06-26T05:03:46Z 2023-09-17T08:15:27Z 2023-09-17T08:15:27Z CONTRIBUTOR      

What happened?

In recent versions of numpy, calling numpy.asarray(arr, dtype=numpy.int64) will raise a warning if the input array contains numpy.nan values. This line of code is used in xarray.coding.times.cast_to_int_if_safe(num):

python def cast_to_int_if_safe(num) -> np.ndarray: int_num = np.asarray(num, dtype=np.int64) if (num == int_num).all(): num = int_num return num

The function still returns the correct True/False values regardless of the warning.

What did you expect to happen?

No warning to be printed

Minimal Complete Verifiable Example

```Python import numpy import xarray

one_day = numpy.timedelta64(1, 'D') nat = numpy.timedelta64('nat')

timedelta_values = (numpy.arange(5) * one_day).astype('timedelta64[ns]') timedelta_values[2] = nat timedelta_values[4] = nat

dataset = xarray.Dataset(data_vars={ 'timedeltas': xarray.DataArray(data=timedelta_values, dims=['x']) }) dataset.to_netcdf('out.nc') ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python $ python3 safe_cast.py /home/hea211/projects/emsarray/.conda/lib/python3.10/site-packages/xarray/coding/times.py:618: RuntimeWarning: invalid value encountered in cast int_num = np.asarray(num, dtype=np.int64)

$ ncdump out.nc netcdf out { dimensions: x = 5 ; variables: double timedeltas(x) ; timedeltas:_FillValue = NaN ; timedeltas:units = "days" ; data:

timedeltas = 0, 1, _, 3, _ ; } ```

Anything else we need to know?

I saw the numpy.can_cast function and tried to use that to solve the issue (see PR #7834), however this function did not do what I expected it to.

A search for other solutions to see whether an array of floating point values is representable as integers turned up Numpy: Check if float array contains whole numbers on Stack Overflow. There are a few solutions given in that question, although each has its drawbacks. The most complete solution appears to be is_integer_ufunc, which is a ufunc written in C. Unfortunately this is not installable via pip/conda, and is not included in numpy.

Environment

In [2]: import xarray as xr ...: xr.show_versions() /home/hea211/projects/emsarray/.conda/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils. warnings.warn("Setuptools is replacing distutils.") INSTALLED VERSIONS ------------------ commit: None python: 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.15.0-73-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_AU.UTF-8 LOCALE: ('en_AU', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.4.2 pandas: 2.0.1 numpy: 1.24.3 scipy: None netCDF4: 1.6.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.7 dask: 2023.4.1 distributed: 2023.4.1 matplotlib: 3.7.1 cartopy: 0.21.1 seaborn: None numbagg: None fsspec: 2023.5.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 22.3.1 conda: None pytest: 7.3.1 mypy: 1.3.0 IPython: 8.12.0 sphinx: 4.3.2
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7942/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1705163672 PR_kwDOAMm_X85QQiiY 7834 Use `numpy.can_cast` instead of casting and checking mx-moth 132147 closed 0     5 2023-05-11T06:36:06Z 2023-06-26T05:06:30Z 2023-06-26T05:06:29Z CONTRIBUTOR   1 pydata/xarray/pulls/7834

In numpy >= 1.24 unsafe casting raises a RuntimeWarning for an operation that xarray does often to check if casting is safe. numpy.can_cast looks like an alternative approach designed for this exact case.

  • [ ] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [ ] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7834/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1071806607 PR_kwDOAMm_X84va6lN 6049 Attempt datetime coding using cftime when pandas fails mx-moth 132147 closed 0     2 2021-12-06T07:12:35Z 2022-01-04T00:28:15Z 2021-12-24T11:48:22Z CONTRIBUTOR   0 pydata/xarray/pulls/6049

A netCDF4 dataset we use has a time variable defined as: double time(time) ; time:axis = "T" ; time:bounds = "time_bnds" ; time:calendar = "gregorian" ; time:long_name = "time" ; time:standard_name = "time" ; time:units = "days since 1970-01-01 00:00:00 00" ;

Note the units attribute, specifically a timezone offset of 00 without any +- sign.

xarray can successfully open this dataset and parse the time units, making a time variable with the expeced values. However, attempting to save this dataset (e.g. after slicing some geographic bounds or selecting a subset of variables), xarray would raise an error trying to reformat the time units.

This fix applies the same logic used in the decoding step to the encoding step - specifically, attempt to use pandas but if that fails then use cftime. The decoding step catches ValueError to do this, but ValueError was not caught in the encode workflow.

  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6049/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 30.856ms · About: xarray-datasette