home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

1 row where state = "closed" and user = 7360639 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 1

state 1

  • closed · 1 ✖

repo 1

  • xarray 1
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
645443880 MDU6SXNzdWU2NDU0NDM4ODA= 4180 to_netcdf very slow for some single character data types snbentley 7360639 closed 0     2 2020-06-25T10:18:53Z 2020-06-26T09:45:39Z 2020-06-26T09:45:39Z NONE      

Hi,

I'm not entirely sure if this is a bug, my misuse, or a feature request. However, I don't think that this is desired behaviour: saving a dataset with single characters in the initial data type is very slow compared to a less desirable alternative.

Example: make a smaller fake dataset, save it as it is (tester has dtype '<U1'), then change data type and do it again

``` python import xarray as xr import numpy as np import cProfile

ds = xr.Dataset({'tester':('index',np.full((1000000),'.'))},coords = {'index':np.arange(0,1000000)})

cProfile.run("""ds.to_netcdf('somefilename')""")

ds.tester.values = ds.tester.values.astype('|S1') cProfile.run("""ds.to_netcdf('somefilename')""") ```

I find that the first option takes around 8s and the second around 0.076s. This is a massive difference - my own dataset is much larger than this so I am obliged to save it using the |S1 dtype. However, this is much more difficult to use. I am using this field for quality control flags, and now == '.' doesn't work, so I have to wrap saving and loading these datasets with a function changing the datatype.

So I have a workaround but I don't think it makes sense to be this much slower to save single character strings. I also tried pickling the dataset with '<U1' datatype - that is very fast, but netCDF would be better.

Output of <tt>xr.show_versions()</tt> python: 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 3.10.0-957.27.2.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.7.4

xarray: 0.15.1 pandas: 1.0.4 numpy: 1.18.5 scipy: 1.4.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.18.1 distributed: 2.18.0 matplotlib: 3.2.1 cartopy: None seaborn: None numbagg: None setuptools: 47.1.1.post20200529 pip: 20.1.1 conda: None pytest: None IPython: None sphinx: None

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4180/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 25.825ms · About: xarray-datasette