id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 117039129,MDU6SXNzdWUxMTcwMzkxMjk=,659,groupby very slow compared to pandas,1322974,closed,0,,,9,2015-11-16T02:43:57Z,2022-05-15T02:38:30Z,2022-05-15T02:38:30Z,CONTRIBUTOR,,,,"``` import timeit import numpy as np from pandas import DataFrame from xray import Dataset, DataArray df = DataFrame({""a"": np.r_[np.arange(500.), np.arange(500.)], ""b"": np.arange(1000.)}) print(timeit.repeat('df.groupby(""a"").agg(""mean"")', globals={""df"": df}, number=10)) print(timeit.repeat('df.groupby(""a"").agg(np.mean)', globals={""df"": df, ""np"": np}, number=10)) ds = Dataset({""a"": DataArray(np.r_[np.arange(500.), np.arange(500.)]), ""b"": DataArray(np.arange(1000.))}) print(timeit.repeat('ds.groupby(""a"").mean()', globals={""ds"": ds}, number=10)) ``` This outputs ``` [0.010462284000823274, 0.009770361997652799, 0.01081446700845845] [0.02622630601399578, 0.024328112005605362, 0.018717073995503597] [2.2804569930012804, 2.1666158599982737, 2.2688316510029836] ``` i.e. xray's groupby is ~100 times slower than pandas' one (and 200 times slower than passing `""mean""` to pandas' groupby, which I assume involves some specialization). (This is the actual order or magnitude of the data size and redundancy I want to handle, i.e. thousands of points with very limited duplication.) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/659/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 111795064,MDU6SXNzdWUxMTE3OTUwNjQ=,627,string coordinate gets converted to object coordinate upon addition of variable to dataset,1322974,closed,0,,,10,2015-10-16T09:29:58Z,2021-03-27T21:19:33Z,2021-03-27T21:19:33Z,CONTRIBUTOR,,,,"With the current HEAD, consider ``` import numpy as np from xray import * ds = Dataset({""1"": DataArray(np.zeros(3), dims=[""a""], coords={""a"": list(""xyz"")})}) print(ds) ds[""2""] = DataArray(np.zeros(2), dims=[""a""], coords={""a"": list(""xy"")}) print(ds) ``` This outputs ``` Dimensions: (a: 3) Coordinates: * a (a) Dimensions: (a: 3) Coordinates: * a (a) object 'x' 'y' 'z' Data variables: 1 (a) float64 0.0 0.0 0.0 2 (a) float64 0.0 0.0 nan ``` Note that the dtype of the `a` coordinate got changed after the assignment. Python3.5, numpy 1.10.1, xray master (6ea7eb2b388075cc838c5ddf0ddaa47020cfcb89) With 0.6.0 the coordinate is of object dtype both before and after. I forgot why I tried master but I must have had a good reason... ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/627/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 114732169,MDU6SXNzdWUxMTQ3MzIxNjk=,643,"""naive"" iteration is very slow",1322974,closed,0,,,2,2015-11-03T02:53:04Z,2019-01-15T21:09:07Z,2019-01-15T21:09:07Z,CONTRIBUTOR,,,,"``` $ ipython Python 3.5.0 (default, Sep 20 2015, 11:28:25) Type ""copyright"", ""credits"" or ""license"" for more information. IPython 4.0.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Using matplotlib backend: Qt4Agg In [1]: from xray import DataArray # Iteration over a Python list In [2]: %%timeit t = list(range(10000)) for _ in t: pass ...: 10000 loops, best of 3: 87.3 µs per loop # Iteration over a ndarray In [3]: %%timeit t = np.arange(10000) for _ in t: pass ...: 1000 loops, best of 3: 472 µs per loop # Iteration over a DataArray In [4]: %%timeit t = DataArray(np.arange(10000)) for _ in t: pass ...: 1 loops, best of 3: 818 ms per loop ``` I'm not sure how much can be done about this as iterating over a DataArray needs to create a bunch of temporary objects (and I understand the emphasis is as usual on vectorized operations, etc.) but a >1500 fold difference certainly doesn't look good. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/643/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 170458908,MDU6SXNzdWUxNzA0NTg5MDg=,958,Test failure with matplotlib 2.0b3,1322974,closed,0,,,1,2016-08-10T16:21:16Z,2018-10-26T23:12:28Z,2018-10-26T23:12:28Z,CONTRIBUTOR,,,,"mpl 2.0b3 / xarray HEAD Arch Linux, Python 3.5.2 ``` ============================================================================================= FAILURES ============================================================================================= ____________________________________________________________________________________ TestPlot.test_subplot_kws _____________________________________________________________________________________ self = def test_subplot_kws(self): a = easy_array((10, 15, 4)) d = DataArray(a, dims=['y', 'x', 'z']) d.coords['z'] = list('abcd') g = d.plot(x='x', y='y', col='z', col_wrap=2, cmap='cool', subplot_kws=dict(axisbg='r')) for ax in g.axes.flat: > self.assertEqual(ax.get_axis_bgcolor(), 'r') xarray/test/test_plot.py:148: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , a1 = (1.0, 0.0, 0.0, 1), a2 = 'r' def assertEqual(self, a1, a2): > assert a1 == a2 or (a1 != a1 and a2 != a2) E AssertionError: assert ((1.0, 0.0, 0.0, 1) == 'r' or ((1.0, 0.0, 0.0, 1) != (1.0, 0.0, 0.0, 1))) xarray/test/__init__.py:164: AssertionError --------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------- /usr/lib/python3.5/site-packages/matplotlib/cbook.py:137: MatplotlibDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use facecolor instead. warnings.warn(message, mplDeprecation, stacklevel=1) /home/antony/src/extern/xarray/xarray/test/test_plot.py:148: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead. self.assertEqual(ax.get_axis_bgcolor(), 'r') ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/958/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue