home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 327089588 and user = 6628425 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • spencerkclark · 8 ✖

issue 1

  • Adding resample functionality to CFTimeIndex · 8 ✖

author_association 1

  • MEMBER 8
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
465289567 https://github.com/pydata/xarray/issues/2191#issuecomment-465289567 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDQ2NTI4OTU2Nw== spencerkclark 6628425 2019-02-19T20:06:15Z 2019-02-19T20:06:15Z MEMBER

@zzheng93 sure thing!

I hope NCAR will support the next release of xarray.

I know you didn't ask for help with this, but I can't resist :) -- I recommend you set up your own Python environment on Cheyenne. This is nice because it gives you full control over the packages you install (so you don't need to wait until someone else installs them for you). A good place to start on how to do this is the "Getting started with Pangeo on HPC" page on the Pangeo website.

A follow-up question is that when we using xarray to manipulate the large dataset such as <xarray.DataArray (time: 14600, lat: 192, lon: 288)> and want to save the results for further machine learning applications (e.g., using sklearn or XGBoost, even deep learning), what will be a good format to store the data on server or local machine that will be easily used by sklearn or XGBoost?

I think with some more specific details regarding what you are looking to do, this could potentially be a good question to ask in the (relatively new) pangeo-data/ml-workflow-examples repo, where they are discussing machine learning workflows connected to xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588
464949490 https://github.com/pydata/xarray/issues/2191#issuecomment-464949490 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDQ2NDk0OTQ5MA== spencerkclark 6628425 2019-02-19T02:04:39Z 2019-02-19T02:04:39Z MEMBER

@zzheng93 welcome! One way to install the development version is to clone this repo, and do an editable install: $ git clone https://github.com/pydata/xarray.git $ cd xarray $ pip install -e . Then using resample with a daily frequency would look something like: ``` In [1]: import xarray as xr

In [2]: times = xr.cftime_range('2000', periods=4, freq='12H')

In [3]: times Out[3]: CFTimeIndex([2000-01-01 00:00:00, 2000-01-01 12:00:00, 2000-01-02 00:00:00, 2000-01-02 12:00:00], dtype='object')

In [4]: da = xr.DataArray(range(4), [('time', times)])

In [5]: da.resample(time='D').mean() Out[5]: <xarray.DataArray (time: 2)> array([0.5, 2.5]) Coordinates: * time (time) object 2000-01-01 00:00:00 2000-01-02 00:00:00 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588
464890837 https://github.com/pydata/xarray/issues/2191#issuecomment-464890837 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDQ2NDg5MDgzNw== spencerkclark 6628425 2019-02-18T21:43:34Z 2019-02-18T21:43:34Z MEMBER

@zzheng93 this will be possible in the next release of xarray, so not quite yet, but soon. If you're in a hurry you could install the development version.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588
460046479 https://github.com/pydata/xarray/issues/2191#issuecomment-460046479 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDQ2MDA0NjQ3OQ== spencerkclark 6628425 2019-02-03T12:16:21Z 2019-02-03T12:16:21Z MEMBER

This has been implemented in #2593 🎉.

{
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588
395082238 https://github.com/pydata/xarray/issues/2191#issuecomment-395082238 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDM5NTA4MjIzOA== spencerkclark 6628425 2018-06-06T14:09:56Z 2018-10-19T19:38:56Z MEMBER

When the time coordinate contains np.datetime64 objects I recommend using resample directly, because the underlying index will be a pandas DatetimeIndex (so you just need some logic to detect if that's the case).

I think the most general workaround for right now would probably look something like the example below. This has the property that it preserves the underlying calendar type of the time index. ```python import pandas as pd import xarray as xr

def resample_ms_freq(ds, dim='time'): """Resample the dataset to 'MS' frequency regardless of the calendar used.

Parameters
----------
ds : Dataset
    Dataset to be resampled
dim : str
    Dimension name associated with the time index

Returns
-------
Dataset
"""
index = ds.indexes[dim]
if isinstance(index, pd.DatetimeIndex):
    return ds.resample(**{dim: 'MS'}).mean(dim)
elif isinstance(index, xr.CFTimeIndex):
    date_type = index.date_type
    month_start = [date_type(date.year, date.month, 1) for date in ds[dim].values]
    ms = xr.DataArray(month_start, coords=ds[dim].coords)
    ds = ds.assign_coords(MS=ms)
    return ds.groupby('MS').mean(dim).rename({'MS': dim})
else:
    raise TypeError(
        'Resampling to month start frequency requires using a time index of either '
        'type pd.DatetimeIndex or xr.CFTimeIndex.')

with xr.set_options(enable_cftimeindex=True): ds = xr.open_mfdataset(files) resampled = resample_ms_freq(ds) ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588
426334003 https://github.com/pydata/xarray/issues/2191#issuecomment-426334003 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDQyNjMzNDAwMw== spencerkclark 6628425 2018-10-02T16:10:51Z 2018-10-02T16:10:51Z MEMBER

Thanks @shoyer for getting things started! @huard your help would be very much appreciated in implementing this. As mentioned in https://github.com/pydata/xarray/issues/2437#issuecomment-424395224, this is one of the biggest remaining gaps in functionality between xarray objects indexed by a CFTimeIndex and xarray objects indexed by a DatetimeIndex.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588
394898828 https://github.com/pydata/xarray/issues/2191#issuecomment-394898828 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDM5NDg5ODgyOA== spencerkclark 6628425 2018-06-06T00:07:10Z 2018-06-06T00:07:10Z MEMBER

Indeed what I had above is quite slow!

python In [6]: %%timeit ...: month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time] ...: 1 loop, best of 3: 588 ms per loop

Iterating over the contents of da.time generates DataArray instances encapsulating single dates. We can iterate over the dates themselves directly, which is much (over 1000x) faster:

python In [7]: %%timeit ...: month_start = [DatetimeNoLeap(date.year, date.month, 1) for date in da.time.values] ...: 1000 loops, best of 3: 302 µs per loop

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588
394839627 https://github.com/pydata/xarray/issues/2191#issuecomment-394839627 https://api.github.com/repos/pydata/xarray/issues/2191 MDEyOklzc3VlQ29tbWVudDM5NDgzOTYyNw== spencerkclark 6628425 2018-06-05T19:56:30Z 2018-06-05T19:56:30Z MEMBER

@naomi-henderson thanks! In the meantime here's a possible workaround, in case you haven't figured one out already: ```python import numpy as np import xarray as xr

from cftime import num2date, DatetimeNoLeap

times = num2date(np.arange(730), calendar='noleap', units='days since 0001-01-01') da = xr.DataArray(np.arange(730), coords=[times], dims=['time'])

month_start = [DatetimeNoLeap(date.dt.year, date.dt.month, 1) for date in da.time] da['MS'] = xr.DataArray(month_start, coords=da.time.coords) resampled = da.groupby('MS').mean('time').rename({'MS': 'time'}) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Adding resample functionality to CFTimeIndex 327089588

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 36.27ms · About: xarray-datasette