home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

2 rows where repo = 13221727, type = "issue" and user = 7360639 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 1
  • open 1

type 1

  • issue · 2 ✖

repo 1

  • xarray · 2 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
480753417 MDU6SXNzdWU0ODA3NTM0MTc= 3216 Feature request: time-based rolling window functionality snbentley 7360639 open 0     11 2019-08-14T15:43:31Z 2023-04-06T20:34:47Z   NONE      

Hi,

I was hoping you would consider extending the rolling window functionality to time windows; as far as I can tell any rolling window functions simply work across a number of nearby points, and not points within, (say) an hour or minute window. This means that I can't even find a reliable rolling mean without writing code myself (and as I am relatively new to Python, this inevitably ends up uselessly slow)

This would extend all rolling functionality to unevenly sampled data, and to buggy data with quality gaps. It would also allow me/others to fix such datagaps by averaging and downsampling where appropriate.

(Context: basically all space physics data and probably other fields too. Really, this would need to be a centred window - I think pandas has a non-centred time window but that doesn't help much.)

Thanks for reading this! And sorry if this is already available - I couldn't find any settings for it.

(PS the multidimensionality of xarray is so useful for me, I have so many vector observations in so many different co-ordinate systems!)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3216/reactions",
    "total_count": 15,
    "+1": 15,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
645443880 MDU6SXNzdWU2NDU0NDM4ODA= 4180 to_netcdf very slow for some single character data types snbentley 7360639 closed 0     2 2020-06-25T10:18:53Z 2020-06-26T09:45:39Z 2020-06-26T09:45:39Z NONE      

Hi,

I'm not entirely sure if this is a bug, my misuse, or a feature request. However, I don't think that this is desired behaviour: saving a dataset with single characters in the initial data type is very slow compared to a less desirable alternative.

Example: make a smaller fake dataset, save it as it is (tester has dtype '<U1'), then change data type and do it again

``` python import xarray as xr import numpy as np import cProfile

ds = xr.Dataset({'tester':('index',np.full((1000000),'.'))},coords = {'index':np.arange(0,1000000)})

cProfile.run("""ds.to_netcdf('somefilename')""")

ds.tester.values = ds.tester.values.astype('|S1') cProfile.run("""ds.to_netcdf('somefilename')""") ```

I find that the first option takes around 8s and the second around 0.076s. This is a massive difference - my own dataset is much larger than this so I am obliged to save it using the |S1 dtype. However, this is much more difficult to use. I am using this field for quality control flags, and now == '.' doesn't work, so I have to wrap saving and loading these datasets with a function changing the datatype.

So I have a workaround but I don't think it makes sense to be this much slower to save single character strings. I also tried pickling the dataset with '<U1' datatype - that is very fast, but netCDF would be better.

Output of <tt>xr.show_versions()</tt> python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.27.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4

xarray: 0.15.1 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.18.1 distributed: 2.18.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None setuptools: 47.1.1.post20200529 pip: 20.1.1 conda: None pytest: None IPython: None sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4180/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 103.347ms · About: xarray-datasette