home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where user = 923438 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, state_reason, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 3
  • open 3

type 1

  • issue 6

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
482543307 MDU6SXNzdWU0ODI1NDMzMDc= 3232 Use pytorch as backend for xarrays fjanoos 923438 open 0     49 2019-08-19T21:45:15Z 2022-07-20T18:01:56Z   NONE      

I would be interested in using pytorch as a backend for xarrays - because: a) pytorch is very similar to numpy - so the conceptual overhead is small b) [most helpful] enable having a GPU as the underlying hardware for compute - which would provide non-trivial speed up c) it would allow seamless integration with deep-learning algorithms and techniques

Any thoughts on what the interest for such a feature might be ? I would be open to implementing parts of it - so any suggestions on where I could start ?

Thanks

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3232/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  reopened xarray 13221727 issue
495382528 MDU6SXNzdWU0OTUzODI1Mjg= 3320 Error saving xr.Dataset with timezone aware time index to netcdf format. fjanoos 923438 open 0     1 2019-09-18T18:20:42Z 2022-01-17T21:23:02Z   NONE      

When I try to save a xr.Dataset that was created from a pandas dataframe with tz-aware time index ( see #3291) - xarray converts the time index into a int64 nanosecs

For example, this is what the converted dataset looks like: <xarray.Dataset> Dimensions: (symbol: 3196, time: 4977) Coordinates: * time (time) object 946933200000000000 ... 1566334800000000000 * symbol (symbol) int64 0 1 2 3 4 5 6 ... 3189 3190 3191 3192 3193 3194 3195 Data variables: var_0 (time, symbol) float32 nan 4301510000.0 nan nan ... nan nan nan nan var_1 (time, symbol) object nan False nan nan nan ... nan nan nan nan nan var_2 (time, symbol) float32 nan 475.0 nan nan nan ... nan nan nan nan var_3 (time, symbol) float32 nan 475.0 nan nan nan ... nan nan nan nan var_5 (time, symbol) float32 nan 475.9 nan nan nan ... nan nan nan nan var_6 (time, symbol) float32 nan 475.9 nan nan nan ... nan nan nan nan var_7 (time, symbol) float32 nan 429.5 nan nan nan ... nan nan nan nan var_8 (time, symbol) float32 nan 429.5 nan nan nan ... nan nan nan nan var_10 (time, symbol) float32 nan -0.06736842 nan nan ... nan nan nan nan var_11 (time, symbol) float32 nan 0.05085102 nan nan ... nan nan nan nan var_12 (time, symbol) float32 nan 0.029103609 nan nan ... nan nan nan nan var_13 (time, symbol) float32 nan 0.048769474 nan nan ... nan nan nan nan var_14 (time, symbol) float32 nan 442.9 nan nan nan ... nan nan nan nan var_15 (time, symbol) float32 nan 442.9 nan nan nan ... nan nan nan nan var_16 (time, symbol) float32 nan nan nan nan nan ... nan nan nan nan nan var_17 (time, symbol) float32 nan nan nan nan nan ... nan nan nan nan nan var_18 (time, symbol) float32 nan nan nan nan nan ... nan nan nan nan nan var_19 (time, symbol) float32 nan nan nan nan nan ... nan nan nan nan nan var_20 (time, symbol) float32 nan nan nan nan nan ... nan nan nan nan nan var_21 (time, symbol) float32 nan 9501900.0 nan nan ... nan nan nan nan var_22 (time, symbol) float32 nan 9501900.0 nan nan ... nan nan nan nan

Now when I try to save this dataset using python pds.to_netcdf( ... )

I get the following error:

Dropping into pdb when this error is hit - it looks like the problem is with the time index.

After converting the time index into a regular int index by: python pds = pds.assign_coords(time=np.arange( len( pds.time )) ) pds.to_netcdf( ... ) this works OK.

And this also works !! python pds = pds.assign_coords(time=pd.to_datetime( pds.time ) ) pds.to_netcdf( ... ) Note pd.to_datetime(pds.time) drops the timezone from the index - so the issue is very much about saving tz-aware time indices.

Any ideas on what I can do about this ?

Thanks! -firdaus

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3320/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
480786385 MDU6SXNzdWU0ODA3ODYzODU= 3218 merge_asof functionality fjanoos 923438 closed 0     6 2019-08-14T16:57:22Z 2021-07-21T18:18:20Z 2021-07-21T18:18:20Z NONE      

Would it be possible to add some functionality to xarray merge that mimics pandas merge_asof ? This would be very useful when aligning timeseries dataarrays where the two arrays are misaligned.

Thanks.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3218/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
496688781 MDU6SXNzdWU0OTY2ODg3ODE= 3330 Feature requests for DataArray.rolling fjanoos 923438 closed 0     1 2019-09-21T18:58:21Z 2021-07-08T16:29:18Z 2021-07-08T16:29:18Z NONE      

In DataArray.rolling it would be really nice to have support for window sizes specified in the units of the dimension (esp. time). For example if da has dimensions (time, space, feature) with time as DatetimeIndex - then it should be possible specificy da.rolling( time=pd.Timedelta( 100, 'D') ) as a valid window

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3330/reactions",
    "total_count": 4,
    "+1": 4,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
490618213 MDU6SXNzdWU0OTA2MTgyMTM= 3291 xr.DataSet.from_dataframe / xr.DataArray.from_series does not preserve DateTimeIndex with timezone fjanoos 923438 open 0     4 2019-09-07T10:10:40Z 2021-04-21T21:00:41Z   NONE      

Problem Description

When using DataSet.from_dataframe (DataArray.from_series) to convert a pandas dataframe with DateTimeIndex having a timezone - xarray convert the datetime into a nanosecond index - rather than keeping it as a datetime-index type.

MCVE Code Sample

python print( df.index ) DatetimeIndex(['2000-01-03 16:00:00-05:00', '2000-01-03 16:00:00-05:00', '2000-01-03 16:00:00-05:00', '2000-01-03 16:00:00-05:00', ... '2019-08-20 16:00:00-05:00', '2019-08-20 16:00:00-05:00'], dtype='datetime64[ns, EST]', name='time', length=12713014, freq=None) python ds = xr.DataSet.from_dataframe( df.head( 1000 ) ) print( ds['time'] ) <xarray.DataArray 'time' (time: 7)> array([946933200000000000, 947019600000000000, 947106000000000000, 947192400000000000, 947278800000000000, 947538000000000000, 947624400000000000, ...], dtype=object) Coordinates: * time (time) object 946933200000000000 ... 947624400000000000

Expected Output

After removing the tz localization from the DateTimeIndex of the dataframe , the conversion to a DataSet preserves the time-index (without converting it to nanoseconds)

python df.index = df.index.tz_convert('UTC').tz_localize(None) ds = xr.DataSet.from_dataframe( df.head(1000) ) print( ds['time] ) <xarray.DataArray 'time' (time: 7)> array(['2000-01-03T21:00:00.000000000', '2000-01-04T21:00:00.000000000', '2000-01-05T21:00:00.000000000', '2000-01-06T21:00:00.000000000', '2000-01-07T21:00:00.000000000', '2000-01-10T21:00:00.000000000', '2000-01-11T21:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2000-01-03T21:00:00 ... 2000-01-11T21:00:00

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.9.0-9-amd64 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: None xarray: 0.12.3+81.g41fecd86 pandas: 0.24.2 numpy: 1.16.2 scipy: 1.2.1 netCDF4: None pydap: None h5netcdf: None h5py: 2.9.0 Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.1.4 distributed: 1.26.0 matplotlib: 3.0.3 cartopy: None seaborn: 0.9.0 numbagg: None setuptools: 40.8.0 pip: 19.0.3 conda: 4.7.11 pytest: 4.3.1 IPython: 7.4.0 sphinx: 1.8.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3291/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
496809167 MDU6SXNzdWU0OTY4MDkxNjc= 3332 Memory usage of `da.rolling().construct` fjanoos 923438 closed 0     5 2019-09-22T17:35:06Z 2021-02-16T15:00:37Z 2021-02-16T15:00:37Z NONE      

If I were to do data_array.rolling( time=1000 ).construct('temp_time') - what is going on under hood ? Does it make a 1000 phyiscal copies of the original dataarray - or is it only returning a view ? I feel like it's the latter - but I'm seeing a memory spike (about 20-30% increase in total process memory consumption) when I use it - so there might be something else going on ? Any ideas / pointers would be appreciated. Thanks!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3332/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 19.971ms · About: xarray-datasette