home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where state = "closed" and user = 1322974 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 4
  • pull 1

state 1

  • closed · 5 ✖

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
117039129 MDU6SXNzdWUxMTcwMzkxMjk= 659 groupby very slow compared to pandas anntzer 1322974 closed 0     9 2015-11-16T02:43:57Z 2022-05-15T02:38:30Z 2022-05-15T02:38:30Z CONTRIBUTOR      

``` import timeit import numpy as np from pandas import DataFrame from xray import Dataset, DataArray

df = DataFrame({"a": np.r_[np.arange(500.), np.arange(500.)], "b": np.arange(1000.)}) print(timeit.repeat('df.groupby("a").agg("mean")', globals={"df": df}, number=10)) print(timeit.repeat('df.groupby("a").agg(np.mean)', globals={"df": df, "np": np}, number=10))

ds = Dataset({"a": DataArray(np.r_[np.arange(500.), np.arange(500.)]), "b": DataArray(np.arange(1000.))}) print(timeit.repeat('ds.groupby("a").mean()', globals={"ds": ds}, number=10)) ```

This outputs

[0.010462284000823274, 0.009770361997652799, 0.01081446700845845] [0.02622630601399578, 0.024328112005605362, 0.018717073995503597] [2.2804569930012804, 2.1666158599982737, 2.2688316510029836]

i.e. xray's groupby is ~100 times slower than pandas' one (and 200 times slower than passing "mean" to pandas' groupby, which I assume involves some specialization).

(This is the actual order or magnitude of the data size and redundancy I want to handle, i.e. thousands of points with very limited duplication.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/659/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
111795064 MDU6SXNzdWUxMTE3OTUwNjQ= 627 string coordinate gets converted to object coordinate upon addition of variable to dataset anntzer 1322974 closed 0     10 2015-10-16T09:29:58Z 2021-03-27T21:19:33Z 2021-03-27T21:19:33Z CONTRIBUTOR      

With the current HEAD, consider

``` import numpy as np from xray import *

ds = Dataset({"1": DataArray(np.zeros(3), dims=["a"], coords={"a": list("xyz")})}) print(ds) ds["2"] = DataArray(np.zeros(2), dims=["a"], coords={"a": list("xy")}) print(ds) ```

This outputs

<xray.Dataset> Dimensions: (a: 3) Coordinates: * a (a) <U1 'x' 'y' 'z' Data variables: 1 (a) float64 0.0 0.0 0.0 <xray.Dataset> Dimensions: (a: 3) Coordinates: * a (a) object 'x' 'y' 'z' Data variables: 1 (a) float64 0.0 0.0 0.0 2 (a) float64 0.0 0.0 nan

Note that the dtype of the a coordinate got changed after the assignment.

Python3.5, numpy 1.10.1, xray master (6ea7eb2b388075cc838c5ddf0ddaa47020cfcb89)

With 0.6.0 the coordinate is of object dtype both before and after. I forgot why I tried master but I must have had a good reason...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/627/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
114732169 MDU6SXNzdWUxMTQ3MzIxNjk= 643 "naive" iteration is very slow anntzer 1322974 closed 0     2 2015-11-03T02:53:04Z 2019-01-15T21:09:07Z 2019-01-15T21:09:07Z CONTRIBUTOR      

``` $ ipython Python 3.5.0 (default, Sep 20 2015, 11:28:25) Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Using matplotlib backend: Qt4Agg

In [1]: from xray import DataArray

Iteration over a Python list

In [2]: %%timeit t = list(range(10000)) for _ in t: pass ...: 10000 loops, best of 3: 87.3 µs per loop

Iteration over a ndarray

In [3]: %%timeit t = np.arange(10000) for _ in t: pass ...: 1000 loops, best of 3: 472 µs per loop

Iteration over a DataArray

In [4]: %%timeit t = DataArray(np.arange(10000)) for _ in t: pass ...: 1 loops, best of 3: 818 ms per loop ```

I'm not sure how much can be done about this as iterating over a DataArray needs to create a bunch of temporary objects (and I understand the emphasis is as usual on vectorized operations, etc.) but a >1500 fold difference certainly doesn't look good.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/643/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
170458908 MDU6SXNzdWUxNzA0NTg5MDg= 958 Test failure with matplotlib 2.0b3 anntzer 1322974 closed 0     1 2016-08-10T16:21:16Z 2018-10-26T23:12:28Z 2018-10-26T23:12:28Z CONTRIBUTOR      

mpl 2.0b3 / xarray HEAD Arch Linux, Python 3.5.2

``` ============================================================================================= FAILURES ============================================================================================= ____________ TestPlot.test_subplot_kws _____________

self = <xarray.test.test_plot.TestPlot testMethod=test_subplot_kws>

def test_subplot_kws(self):
    a = easy_array((10, 15, 4))
    d = DataArray(a, dims=['y', 'x', 'z'])
    d.coords['z'] = list('abcd')
    g = d.plot(x='x', y='y', col='z', col_wrap=2, cmap='cool',
               subplot_kws=dict(axisbg='r'))
    for ax in g.axes.flat:
      self.assertEqual(ax.get_axis_bgcolor(), 'r')

xarray/test/test_plot.py:148:


self = <xarray.test.test_plot.TestPlot testMethod=test_subplot_kws>, a1 = (1.0, 0.0, 0.0, 1), a2 = 'r'

def assertEqual(self, a1, a2):
  assert a1 == a2 or (a1 != a1 and a2 != a2)

E AssertionError: assert ((1.0, 0.0, 0.0, 1) == 'r' or ((1.0, 0.0, 0.0, 1) != (1.0, 0.0, 0.0, 1)))

xarray/test/init.py:164: AssertionError --------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------- /usr/lib/python3.5/site-packages/matplotlib/cbook.py:137: MatplotlibDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use facecolor instead. warnings.warn(message, mplDeprecation, stacklevel=1) /home/antony/src/extern/xarray/xarray/test/test_plot.py:148: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead. self.assertEqual(ax.get_axis_bgcolor(), 'r') ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/958/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
117297089 MDExOlB1bGxSZXF1ZXN0NTA5MTEzMzQ= 661 Document pandas' better groupby performance. anntzer 1322974 closed 0     1 2015-11-17T07:04:50Z 2015-11-17T09:10:04Z 2015-11-17T08:54:31Z CONTRIBUTOR   0 pydata/xarray/pulls/661

cf. #659.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/661/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 21.813ms · About: xarray-datasette