home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where user = 5442433 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 3

  • Avoiding duplicate time coordinates when opening multiple files 3
  • Rules for propagating attrs and encoding 1
  • Stack + to_array before to_xarray is much faster that a simple to_xarray 1

user 1

  • brey · 5 ✖

author_association 1

  • NONE 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
648721465 https://github.com/pydata/xarray/issues/2459#issuecomment-648721465 https://api.github.com/repos/pydata/xarray/issues/2459 MDEyOklzc3VlQ29tbWVudDY0ODcyMTQ2NQ== brey 5442433 2020-06-24T09:55:00Z 2020-06-24T09:55:00Z NONE

Hi All. I stumble across the same issue trying to convert a 5000 column dataframe to xarray (it was never going to happen...). I found a workaround and I am posting the test below. Hope it helps.

```python import xarray as xr import pandas as pd import numpy as np

xr.version

'0.15.1'

pd.version

'1.0.5'

df = pd.DataFrame(np.random.randn(200, 500))

%%time one = df.to_xarray()

CPU times: user 29.6 s, sys: 60.4 ms, total: 29.6 s
Wall time: 29.7 s

%%time dic={} for name in df.columns: dic.update({name:(['index'],df[name].values)})

two = xr.Dataset(dic, coords={'index': ('index', df.index.values)})

CPU times: user 17.6 ms, sys: 158 µs, total: 17.8 ms
Wall time: 17.8 ms

one.equals(two)

True

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stack + to_array before to_xarray is much faster that a simple to_xarray 365973662
613525795 https://github.com/pydata/xarray/issues/2108#issuecomment-613525795 https://api.github.com/repos/pydata/xarray/issues/2108 MDEyOklzc3VlQ29tbWVudDYxMzUyNTc5NQ== brey 5442433 2020-04-14T15:55:05Z 2020-04-14T15:55:05Z NONE

I am adding here a comment to keep it alive. In fact, this is more complicated than it seems because in combining files with duplicate times one has to choose how to merge i.e keep first, keep last or even a combination of the two.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoiding duplicate time coordinates when opening multiple files 320838184
389196623 https://github.com/pydata/xarray/issues/2108#issuecomment-389196623 https://api.github.com/repos/pydata/xarray/issues/2108 MDEyOklzc3VlQ29tbWVudDM4OTE5NjYyMw== brey 5442433 2018-05-15T14:53:38Z 2018-05-15T14:53:38Z NONE

Thanks @shoyer. Your approach works better (one line) plus is consistent with the xarray-pandas shared paradigm. Unfortunately, I can't spare the time to do the PR right now. I haven't done it before for xarray and it will require some time overhead. Maybe someone with more experience can oblige.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoiding duplicate time coordinates when opening multiple files 320838184
387343836 https://github.com/pydata/xarray/issues/2108#issuecomment-387343836 https://api.github.com/repos/pydata/xarray/issues/2108 MDEyOklzc3VlQ29tbWVudDM4NzM0MzgzNg== brey 5442433 2018-05-08T09:33:14Z 2018-05-08T09:33:14Z NONE

To partially answer my issue, I came up with the following post-processing option

  1. get the index of the duplicate coordinate values val,idx = np.unique(arr.time, return_index=True)

  2. trim the dataset arr = arr.isel(time=idx)

Maybe this can be integrated somehow...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Avoiding duplicate time coordinates when opening multiple files 320838184
362624658 https://github.com/pydata/xarray/issues/1614#issuecomment-362624658 https://api.github.com/repos/pydata/xarray/issues/1614 MDEyOklzc3VlQ29tbWVudDM2MjYyNDY1OA== brey 5442433 2018-02-02T15:54:41Z 2018-02-02T15:54:41Z NONE

I am also interested. In terms of the table from @jhamman I am in principle ok with. However, there could be an option to refer to the original attrs in order to provide provenance even on operations like reduce and arithmetic. The idea here is reproducibility and tractability. Maybe an 'origin' attribute?

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Rules for propagating attrs and encoding 264049503

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 13.268ms · About: xarray-datasette