home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

8 rows where user = 1322974 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 7
  • pull 1

state 2

  • closed 5
  • open 3

repo 1

  • xarray 8
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
117039129 MDU6SXNzdWUxMTcwMzkxMjk= 659 groupby very slow compared to pandas anntzer 1322974 closed 0     9 2015-11-16T02:43:57Z 2022-05-15T02:38:30Z 2022-05-15T02:38:30Z CONTRIBUTOR      

``` import timeit import numpy as np from pandas import DataFrame from xray import Dataset, DataArray

df = DataFrame({"a": np.r_[np.arange(500.), np.arange(500.)], "b": np.arange(1000.)}) print(timeit.repeat('df.groupby("a").agg("mean")', globals={"df": df}, number=10)) print(timeit.repeat('df.groupby("a").agg(np.mean)', globals={"df": df, "np": np}, number=10))

ds = Dataset({"a": DataArray(np.r_[np.arange(500.), np.arange(500.)]), "b": DataArray(np.arange(1000.))}) print(timeit.repeat('ds.groupby("a").mean()', globals={"ds": ds}, number=10)) ```

This outputs

[0.010462284000823274, 0.009770361997652799, 0.01081446700845845] [0.02622630601399578, 0.024328112005605362, 0.018717073995503597] [2.2804569930012804, 2.1666158599982737, 2.2688316510029836]

i.e. xray's groupby is ~100 times slower than pandas' one (and 200 times slower than passing "mean" to pandas' groupby, which I assume involves some specialization).

(This is the actual order or magnitude of the data size and redundancy I want to handle, i.e. thousands of points with very limited duplication.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/659/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
111795064 MDU6SXNzdWUxMTE3OTUwNjQ= 627 string coordinate gets converted to object coordinate upon addition of variable to dataset anntzer 1322974 closed 0     10 2015-10-16T09:29:58Z 2021-03-27T21:19:33Z 2021-03-27T21:19:33Z CONTRIBUTOR      

With the current HEAD, consider

``` import numpy as np from xray import *

ds = Dataset({"1": DataArray(np.zeros(3), dims=["a"], coords={"a": list("xyz")})}) print(ds) ds["2"] = DataArray(np.zeros(2), dims=["a"], coords={"a": list("xy")}) print(ds) ```

This outputs

<xray.Dataset> Dimensions: (a: 3) Coordinates: * a (a) <U1 'x' 'y' 'z' Data variables: 1 (a) float64 0.0 0.0 0.0 <xray.Dataset> Dimensions: (a: 3) Coordinates: * a (a) object 'x' 'y' 'z' Data variables: 1 (a) float64 0.0 0.0 0.0 2 (a) float64 0.0 0.0 nan

Note that the dtype of the a coordinate got changed after the assignment.

Python3.5, numpy 1.10.1, xray master (6ea7eb2b388075cc838c5ddf0ddaa47020cfcb89)

With 0.6.0 the coordinate is of object dtype both before and after. I forgot why I tried master but I must have had a good reason...

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/627/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
125708367 MDU6SXNzdWUxMjU3MDgzNjc= 712 DataArrays should display their coordinates in the natural order anntzer 1322974 open 0     13 2016-01-08T22:33:05Z 2020-11-06T18:48:54Z   CONTRIBUTOR      

Consider

``` from collections import * import numpy as np from xray import *

d1 = DataArray(np.empty((2, 2)), coords=OrderedDict([("foo", [0, 1]), ("bar", [0, 1])])) d2 = DataArray(np.empty((2, 2)), coords=OrderedDict([("bar", [0, 1]), ("foo", [0, 1])]))

ds = Dataset({"d1": d1, "d2": d2})

print(ds.d1) print(ds.d2) ```

This outputs

<xray.DataArray 'd1' (foo: 2, bar: 2)> array([[ 6.91516848e-310, 1.64244654e-316], [ 6.91516881e-310, 6.91516881e-310]]) Coordinates: * foo (foo) int64 0 1 * bar (bar) int64 0 1 <xray.DataArray 'd2' (bar: 2, foo: 2)> array([[ 1.59987863e-316, 6.91516883e-310], [ 6.91515690e-310, 2.12670320e-316]]) Coordinates: * foo (foo) int64 0 1 * bar (bar) int64 0 1

I understand that internally both DataArrays use the same coords object and thus the same coords order, but it would be helpful if, when printing d2 by itself, the coordinates were printed in the natural order ("bar", "foo"). In particular, when working interactively, the list of coordinates at the end of the repr is the most easy thing to spot, and thus most helpful to know how to format the call to array.loc[...].

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/712/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
112254767 MDU6SXNzdWUxMTIyNTQ3Njc= 631 Confusing error (or lack thereof) when coordinate and variable share the same name anntzer 1322974 open 0     5 2015-10-19T23:39:22Z 2019-04-19T15:39:55Z   CONTRIBUTOR      

It probably makes sense to prevent dataset to have variables sharing the names of coordinates (what would dataset.varname return?) but currently

Dataset({"a": DataArray(np.zeros((3, 4)), dims=["a", "b"], coords={"a": list("xyz"), "b": list("xyzt")})})

fails with ValueError: an index variable must be defined with 1-dimensional data, and

Dataset({"a": DataArray(np.zeros(3), coords={"a": list("xyz")})})

actually creates an empty dataset using [0, 0, 0] as values for the a coordinate instead of x y z:

<xray.Dataset> Dimensions: (a: 3) Coordinates: * a (a) float64 0.0 0.0 0.0 Data variables: *empty*

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/631/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
112253425 MDU6SXNzdWUxMTIyNTM0MjU= 630 Whether a DataArray is copied when inserted into a Dataset depends on whether coordinates match exactly anntzer 1322974 open 0     16 2015-10-19T23:27:15Z 2019-01-31T18:40:58Z   CONTRIBUTOR      

Consider

``` import numpy as np from xray import *

ds = Dataset({"a": DataArray(np.zeros((3, 4)))}) ds["b"] = b = DataArray(np.zeros((3, 4))) b[0, 0] = 1 print(ds["b"][0, 0]) # ==> prints 1

ds = Dataset({"a": DataArray(np.zeros((3, 4)))}) ds["b"] = b = DataArray(np.zeros((3, 3))) # !!! we implicitly fill the last column with nans. b[0, 0] = 1 print(ds["b"][0, 0]) # ==> prints 0 ```

In the first case, the dataset was modified when the dataarray was modified, but not in the second case.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/630/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
114732169 MDU6SXNzdWUxMTQ3MzIxNjk= 643 "naive" iteration is very slow anntzer 1322974 closed 0     2 2015-11-03T02:53:04Z 2019-01-15T21:09:07Z 2019-01-15T21:09:07Z CONTRIBUTOR      

``` $ ipython Python 3.5.0 (default, Sep 20 2015, 11:28:25) Type "copyright", "credits" or "license" for more information.

IPython 4.0.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Using matplotlib backend: Qt4Agg

In [1]: from xray import DataArray

Iteration over a Python list

In [2]: %%timeit t = list(range(10000)) for _ in t: pass ...: 10000 loops, best of 3: 87.3 µs per loop

Iteration over a ndarray

In [3]: %%timeit t = np.arange(10000) for _ in t: pass ...: 1000 loops, best of 3: 472 µs per loop

Iteration over a DataArray

In [4]: %%timeit t = DataArray(np.arange(10000)) for _ in t: pass ...: 1 loops, best of 3: 818 ms per loop ```

I'm not sure how much can be done about this as iterating over a DataArray needs to create a bunch of temporary objects (and I understand the emphasis is as usual on vectorized operations, etc.) but a >1500 fold difference certainly doesn't look good.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/643/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
170458908 MDU6SXNzdWUxNzA0NTg5MDg= 958 Test failure with matplotlib 2.0b3 anntzer 1322974 closed 0     1 2016-08-10T16:21:16Z 2018-10-26T23:12:28Z 2018-10-26T23:12:28Z CONTRIBUTOR      

mpl 2.0b3 / xarray HEAD Arch Linux, Python 3.5.2

``` ============================================================================================= FAILURES ============================================================================================= ____________ TestPlot.test_subplot_kws _____________

self = <xarray.test.test_plot.TestPlot testMethod=test_subplot_kws>

def test_subplot_kws(self):
    a = easy_array((10, 15, 4))
    d = DataArray(a, dims=['y', 'x', 'z'])
    d.coords['z'] = list('abcd')
    g = d.plot(x='x', y='y', col='z', col_wrap=2, cmap='cool',
               subplot_kws=dict(axisbg='r'))
    for ax in g.axes.flat:
      self.assertEqual(ax.get_axis_bgcolor(), 'r')

xarray/test/test_plot.py:148:


self = <xarray.test.test_plot.TestPlot testMethod=test_subplot_kws>, a1 = (1.0, 0.0, 0.0, 1), a2 = 'r'

def assertEqual(self, a1, a2):
  assert a1 == a2 or (a1 != a1 and a2 != a2)

E AssertionError: assert ((1.0, 0.0, 0.0, 1) == 'r' or ((1.0, 0.0, 0.0, 1) != (1.0, 0.0, 0.0, 1)))

xarray/test/init.py:164: AssertionError --------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------- /usr/lib/python3.5/site-packages/matplotlib/cbook.py:137: MatplotlibDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use facecolor instead. warnings.warn(message, mplDeprecation, stacklevel=1) /home/antony/src/extern/xarray/xarray/test/test_plot.py:148: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead. self.assertEqual(ax.get_axis_bgcolor(), 'r') ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/958/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
117297089 MDExOlB1bGxSZXF1ZXN0NTA5MTEzMzQ= 661 Document pandas' better groupby performance. anntzer 1322974 closed 0     1 2015-11-17T07:04:50Z 2015-11-17T09:10:04Z 2015-11-17T08:54:31Z CONTRIBUTOR   0 pydata/xarray/pulls/661

cf. #659.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/661/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 23.868ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows