id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 117039129,MDU6SXNzdWUxMTcwMzkxMjk=,659,groupby very slow compared to pandas,1322974,closed,0,,,9,2015-11-16T02:43:57Z,2022-05-15T02:38:30Z,2022-05-15T02:38:30Z,CONTRIBUTOR,,,,"``` import timeit import numpy as np from pandas import DataFrame from xray import Dataset, DataArray df = DataFrame({""a"": np.r_[np.arange(500.), np.arange(500.)], ""b"": np.arange(1000.)}) print(timeit.repeat('df.groupby(""a"").agg(""mean"")', globals={""df"": df}, number=10)) print(timeit.repeat('df.groupby(""a"").agg(np.mean)', globals={""df"": df, ""np"": np}, number=10)) ds = Dataset({""a"": DataArray(np.r_[np.arange(500.), np.arange(500.)]), ""b"": DataArray(np.arange(1000.))}) print(timeit.repeat('ds.groupby(""a"").mean()', globals={""ds"": ds}, number=10)) ``` This outputs ``` [0.010462284000823274, 0.009770361997652799, 0.01081446700845845] [0.02622630601399578, 0.024328112005605362, 0.018717073995503597] [2.2804569930012804, 2.1666158599982737, 2.2688316510029836] ``` i.e. xray's groupby is ~100 times slower than pandas' one (and 200 times slower than passing `""mean""` to pandas' groupby, which I assume involves some specialization). (This is the actual order or magnitude of the data size and redundancy I want to handle, i.e. thousands of points with very limited duplication.) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/659/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 111795064,MDU6SXNzdWUxMTE3OTUwNjQ=,627,string coordinate gets converted to object coordinate upon addition of variable to dataset,1322974,closed,0,,,10,2015-10-16T09:29:58Z,2021-03-27T21:19:33Z,2021-03-27T21:19:33Z,CONTRIBUTOR,,,,"With the current HEAD, consider ``` import numpy as np from xray import * ds = Dataset({""1"": DataArray(np.zeros(3), dims=[""a""], coords={""a"": list(""xyz"")})}) print(ds) ds[""2""] = DataArray(np.zeros(2), dims=[""a""], coords={""a"": list(""xy"")}) print(ds) ``` This outputs ``` Dimensions: (a: 3) Coordinates: * a (a) Dimensions: (a: 3) Coordinates: * a (a) object 'x' 'y' 'z' Data variables: 1 (a) float64 0.0 0.0 0.0 2 (a) float64 0.0 0.0 nan ``` Note that the dtype of the `a` coordinate got changed after the assignment. Python3.5, numpy 1.10.1, xray master (6ea7eb2b388075cc838c5ddf0ddaa47020cfcb89) With 0.6.0 the coordinate is of object dtype both before and after. I forgot why I tried master but I must have had a good reason... ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/627/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 125708367,MDU6SXNzdWUxMjU3MDgzNjc=,712,DataArrays should display their coordinates in the natural order,1322974,open,0,,,13,2016-01-08T22:33:05Z,2020-11-06T18:48:54Z,,CONTRIBUTOR,,,,"Consider ``` from collections import * import numpy as np from xray import * d1 = DataArray(np.empty((2, 2)), coords=OrderedDict([(""foo"", [0, 1]), (""bar"", [0, 1])])) d2 = DataArray(np.empty((2, 2)), coords=OrderedDict([(""bar"", [0, 1]), (""foo"", [0, 1])])) ds = Dataset({""d1"": d1, ""d2"": d2}) print(ds.d1) print(ds.d2) ``` This outputs ``` array([[ 6.91516848e-310, 1.64244654e-316], [ 6.91516881e-310, 6.91516881e-310]]) Coordinates: * foo (foo) int64 0 1 * bar (bar) int64 0 1 array([[ 1.59987863e-316, 6.91516883e-310], [ 6.91515690e-310, 2.12670320e-316]]) Coordinates: * foo (foo) int64 0 1 * bar (bar) int64 0 1 ``` I understand that internally both DataArrays use the same coords object and thus the same coords order, but it would be helpful if, when printing d2 by itself, the coordinates were printed in the natural order (""bar"", ""foo""). In particular, when working interactively, the list of coordinates at the end of the repr is the most easy thing to spot, and thus most helpful to know how to format the call to `array.loc[...]`. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/712/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 112254767,MDU6SXNzdWUxMTIyNTQ3Njc=,631,Confusing error (or lack thereof) when coordinate and variable share the same name,1322974,open,0,,,5,2015-10-19T23:39:22Z,2019-04-19T15:39:55Z,,CONTRIBUTOR,,,,"It probably makes sense to prevent dataset to have variables sharing the names of coordinates (what would `dataset.varname` return?) but currently ``` Dataset({""a"": DataArray(np.zeros((3, 4)), dims=[""a"", ""b""], coords={""a"": list(""xyz""), ""b"": list(""xyzt"")})}) ``` fails with `ValueError: an index variable must be defined with 1-dimensional data`, and ``` Dataset({""a"": DataArray(np.zeros(3), coords={""a"": list(""xyz"")})}) ``` actually creates an empty dataset using `[0, 0, 0]` as values for the `a` coordinate instead of `x y z`: ``` Dimensions: (a: 3) Coordinates: * a (a) float64 0.0 0.0 0.0 Data variables: *empty* ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/631/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 112253425,MDU6SXNzdWUxMTIyNTM0MjU=,630,Whether a DataArray is copied when inserted into a Dataset depends on whether coordinates match exactly,1322974,open,0,,,16,2015-10-19T23:27:15Z,2019-01-31T18:40:58Z,,CONTRIBUTOR,,,,"Consider ``` import numpy as np from xray import * ds = Dataset({""a"": DataArray(np.zeros((3, 4)))}) ds[""b""] = b = DataArray(np.zeros((3, 4))) b[0, 0] = 1 print(ds[""b""][0, 0]) # ==> prints 1 ds = Dataset({""a"": DataArray(np.zeros((3, 4)))}) ds[""b""] = b = DataArray(np.zeros((3, 3))) # !!! we implicitly fill the last column with nans. b[0, 0] = 1 print(ds[""b""][0, 0]) # ==> prints 0 ``` In the first case, the dataset was modified when the dataarray was modified, but not in the second case. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/630/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 114732169,MDU6SXNzdWUxMTQ3MzIxNjk=,643,"""naive"" iteration is very slow",1322974,closed,0,,,2,2015-11-03T02:53:04Z,2019-01-15T21:09:07Z,2019-01-15T21:09:07Z,CONTRIBUTOR,,,,"``` $ ipython Python 3.5.0 (default, Sep 20 2015, 11:28:25) Type ""copyright"", ""credits"" or ""license"" for more information. IPython 4.0.0 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Using matplotlib backend: Qt4Agg In [1]: from xray import DataArray # Iteration over a Python list In [2]: %%timeit t = list(range(10000)) for _ in t: pass ...: 10000 loops, best of 3: 87.3 µs per loop # Iteration over a ndarray In [3]: %%timeit t = np.arange(10000) for _ in t: pass ...: 1000 loops, best of 3: 472 µs per loop # Iteration over a DataArray In [4]: %%timeit t = DataArray(np.arange(10000)) for _ in t: pass ...: 1 loops, best of 3: 818 ms per loop ``` I'm not sure how much can be done about this as iterating over a DataArray needs to create a bunch of temporary objects (and I understand the emphasis is as usual on vectorized operations, etc.) but a >1500 fold difference certainly doesn't look good. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/643/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 170458908,MDU6SXNzdWUxNzA0NTg5MDg=,958,Test failure with matplotlib 2.0b3,1322974,closed,0,,,1,2016-08-10T16:21:16Z,2018-10-26T23:12:28Z,2018-10-26T23:12:28Z,CONTRIBUTOR,,,,"mpl 2.0b3 / xarray HEAD Arch Linux, Python 3.5.2 ``` ============================================================================================= FAILURES ============================================================================================= ____________________________________________________________________________________ TestPlot.test_subplot_kws _____________________________________________________________________________________ self = def test_subplot_kws(self): a = easy_array((10, 15, 4)) d = DataArray(a, dims=['y', 'x', 'z']) d.coords['z'] = list('abcd') g = d.plot(x='x', y='y', col='z', col_wrap=2, cmap='cool', subplot_kws=dict(axisbg='r')) for ax in g.axes.flat: > self.assertEqual(ax.get_axis_bgcolor(), 'r') xarray/test/test_plot.py:148: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , a1 = (1.0, 0.0, 0.0, 1), a2 = 'r' def assertEqual(self, a1, a2): > assert a1 == a2 or (a1 != a1 and a2 != a2) E AssertionError: assert ((1.0, 0.0, 0.0, 1) == 'r' or ((1.0, 0.0, 0.0, 1) != (1.0, 0.0, 0.0, 1))) xarray/test/__init__.py:164: AssertionError --------------------------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------------------------- /usr/lib/python3.5/site-packages/matplotlib/cbook.py:137: MatplotlibDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use facecolor instead. warnings.warn(message, mplDeprecation, stacklevel=1) /home/antony/src/extern/xarray/xarray/test/test_plot.py:148: MatplotlibDeprecationWarning: The get_axis_bgcolor function was deprecated in version 2.0. Use get_facecolor instead. self.assertEqual(ax.get_axis_bgcolor(), 'r') ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/958/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 117297089,MDExOlB1bGxSZXF1ZXN0NTA5MTEzMzQ=,661,Document pandas' better groupby performance.,1322974,closed,0,,,1,2015-11-17T07:04:50Z,2015-11-17T09:10:04Z,2015-11-17T08:54:31Z,CONTRIBUTOR,,0,pydata/xarray/pulls/661,"cf. #659. ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/661/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull