home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

3 rows where state = "open", type = "issue" and user = 6628425 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 1

  • issue · 3 ✖

state 1

  • open · 3 ✖

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1413075015 I_kwDOAMm_X85UOdBH 7184 Potentially add option to encode times using `longdouble` values spencerkclark 6628425 open 0     0 2022-10-18T11:46:30Z 2022-10-18T11:47:00Z   MEMBER      

By default xarray will exactly roundtrip times saved to disk by encoding them using int64 values. However, if a user specifies time encoding units that prevent this, float64 values will be used, and this has the potential to cause roundtripping differences due to roundoff error. Recently, cftime added the ability to encode times using longdouble values (https://github.com/Unidata/cftime/pull/284). On some platforms this offers greater precision than float64 values (though typically not full quad precision). Nevertheless some users might be interested in encoding their times using such values.

The main thing that longdouble values have going for them is that they enable greater precision when using arbitrary units to encode the dates (with int64 we are constrained to using units that allow for time intervals to be expressed with integers). That said, the more I think about this, the more I feel it may not be the best idea:

  • Since the meaning of longdouble can vary from platform to platform, I wonder what happens if you encode times using longdouble values on one machine and decode them on another?
  • longdouble values cannot be stored with all backends; for example zarr supports it, but netCDF does not.
  • We already provide a robust way to exactly roundtrip any dates--i.e. encode them with int64 values--so adding a less robust (if slightly more flexible in terms of units) option might just cause confusion.

It's perhaps still worth opening this issue for discussion in case others have thoughts that might allay those concerns.

cc: @jswhit @dcherian

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/7184/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1117563249 I_kwDOAMm_X85CnKlx 6204 [Bug]: cannot chunk a DataArray that originated as a coordinate spencerkclark 6628425 open 0     1 2022-01-28T15:56:44Z 2022-03-16T04:18:46Z   MEMBER      

What happened?

If I construct the following DataArray, and try to chunk its "x" coordinate, I get back a NumPy-backed DataArray: ``` In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]])

In [3]: a.x.chunk() Out[3]: <xarray.DataArray 'x' (x: 3)> array([4, 5, 6]) Coordinates: * x (x) int64 4 5 6 If I construct a copy of the `"x"` coordinate, things work as I would expect: In [4]: x = xr.DataArray(a.x, dims=a.x.dims, coords=a.x.coords, name="x")

In [5]: x.chunk() Out[5]: <xarray.DataArray 'x' (x: 3)> dask.array<xarray-\<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray> Coordinates: * x (x) int64 4 5 6 ```

What did you expect to happen?

I would expect the following to happen: ``` In [2]: a = xr.DataArray([1, 2, 3], dims=["x"], coords=[[4, 5, 6]])

In [3]: a.x.chunk() Out[3]: <xarray.DataArray 'x' (x: 3)> dask.array<xarray-\<this-array>, shape=(3,), dtype=int64, chunksize=(3,), chunktype=numpy.ndarray> Coordinates: * x (x) int64 4 5 6 ```

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 15:59:12) [Clang 11.0.1 ] python-bits: 64 OS: Darwin OS-release: 21.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.5 libnetcdf: 4.6.3

xarray: 0.20.1 pandas: 1.3.5 numpy: 1.19.4 scipy: 1.5.4 netCDF4: 1.5.5 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.7.0 cftime: 1.2.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.22.0 distributed: None matplotlib: 3.2.2 cartopy: 0.19.0.post1 seaborn: None numbagg: None fsspec: 2021.06.0 cupy: None pint: 0.15 sparse: None setuptools: 49.6.0.post20210108 pip: 20.2.4 conda: 4.10.1 pytest: 6.0.1 IPython: 7.27.0 sphinx: 3.2.1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6204/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
849771808 MDU6SXNzdWU4NDk3NzE4MDg= 5107 Converting `cftime.datetime` objects to `np.datetime64` values through `astype` spencerkclark 6628425 open 0     0 2021-04-04T01:02:55Z 2021-10-05T00:00:36Z   MEMBER      

The discussion of the use of the indexes property in #5102 got me thinking about this StackOverflow answer. For a while I have thought that my answer there isn't very satisfying, not only because it relies on this somewhat obscure indexes property, but also because it only works on dimension coordinates -- i.e. something that would be backed by an index.

Describe the solution you'd like

It would be better if we could do this conversion with astype, e.g. da.astype("datetime64[ns]"). This would allow conversion to datetime64 values for all cftime.datetime DataArrays -- dask-backed or NumPy-backed, 1D or ND -- through a fairly standard and well-known method. To my surprise, while you do not get the nice calendar-switching warning that CFTimeIndex.to_datetimeindex provides, this actually already kind of seems to work (?!):

``` In [1]: import xarray as xr

In [2]: times = xr.cftime_range("2000", periods=6, calendar="noleap")

In [3]: da = xr.DataArray(times.values.reshape((2, 3)), dims=["a", "b"])

In [4]: da.astype("datetime64[ns]") Out[4]: <xarray.DataArray (a: 2, b: 3)> array([['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000', '2000-01-03T00:00:00.000000000'], ['2000-01-04T00:00:00.000000000', '2000-01-05T00:00:00.000000000', '2000-01-06T00:00:00.000000000']], dtype='datetime64[ns]') Dimensions without coordinates: a, b ```

NumPy obviously does not officially support this -- nor would I expect it to -- so I would be wary of simply documenting this behavior as is. Would it be reasonable for us to modify xarray.core.duck_array_ops.astype to explicitly implement this conversion ourselves for cftime.datetime arrays? This way we could ensure this was always supported, and we could include appropriate errors for out-of-bounds times (the NumPy method currently overflows in that case) and warnings for switching from non-standard calendars.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5107/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 27.932ms · About: xarray-datasette