id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 502130982,MDU6SXNzdWU1MDIxMzA5ODI=,3370,Hundreds of Sphinx errors,6213168,closed,0,,,14,2019-10-03T15:17:09Z,2022-04-17T20:33:05Z,2022-04-17T20:33:05Z,MEMBER,,,,"sphinx-build emits a ton of errors that need to be polished out: https://readthedocs.org/projects/xray/builds/ -> latest -> open last step Options for the long term: - Change the ""Docs"" azure pipelines job to crash if there are new failures. From past experience though, this should come together with a sensible way to whitelist errors that can't be fixed. This will severely slow down development as PRs will systematically fail on such a check. - Add a task in the release process where, immediately before closing a release, the maintainer needs to manually go through the sphinx-build log and fix any new issues. This would be a major extra piece of work for the maintainer. I am honestly not excited by either of the above. Alternative suggestions are welcome.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3370/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 466750687,MDU6SXNzdWU0NjY3NTA2ODc=,3092,black formatting,6213168,closed,0,,,14,2019-07-11T08:43:55Z,2019-08-08T22:34:53Z,2019-08-08T22:34:53Z,MEMBER,,,,"I, like many others, have irreversibly fallen in love with black. Can we apply it to the existing codebase and as an enforced CI test? The only (big) problem is that developers will need to manually apply it to any open branches and then merge from master - and even then, merging likely won't be trivial. How did the dask project tackle the issue?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/3092/reactions"", ""total_count"": 3, ""+1"": 3, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 166439490,MDU6SXNzdWUxNjY0Mzk0OTA=,906,unstack() sorts data alphabetically,6213168,closed,0,,,14,2016-07-19T21:25:26Z,2019-02-23T12:47:00Z,2019-02-23T12:47:00Z,MEMBER,,,,"DataArray.unstack() sorts the data alphabetically by label. Besides being poor for performance, this is very problematic whenever the order matters, and the labels are not in alphabetical order to begin with. ``` python import xarray import pandas index = [ ['x1', 'first' ], ['x1', 'second'], ['x1', 'third' ], ['x1', 'fourth'], ['x0', 'first' ], ['x0', 'second'], ['x0', 'third' ], ['x0', 'fourth'], ] index = pandas.MultiIndex.from_tuples(index, names=['x', 'count']) s = pandas.Series(list(range(8)), index) a = xarray.DataArray(s) a ``` ``` array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int64) Coordinates: * dim_0 (dim_0) object ('x1', 'first') ('x1', 'second') ('x1', 'third') ... ``` ``` python a.unstack('dim_0') ``` ``` array([[4, 7, 5, 6], [0, 3, 1, 2]], dtype=int64) Coordinates: * x (x) object 'x0' 'x1' * count (count) object 'first' 'fourth' 'second' 'third' ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/906/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 252541496,MDU6SXNzdWUyNTI1NDE0OTY=,1521,open_mfdataset reads coords from disk multiple times,6213168,closed,0,,,14,2017-08-24T09:29:57Z,2017-10-09T21:15:31Z,2017-10-09T21:15:31Z,MEMBER,,,,"I have 200x of the below dataset, split on the 'scenario' axis: ``` Dimensions: (fx_id: 39, instr_id: 16095, scenario: 2501) Coordinates: currency (instr_id) object 'GBP' 'USD' 'GBP' 'GBP' 'GBP' 'EUR' 'CHF' ... * fx_id (fx_id) object 'USD' 'EUR' 'JPY' 'ARS' 'AUD' 'BRL' 'CAD' ... * instr_id (instr_id) object 'property_standard_gbp' ... * scenario (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ... type (instr_id) object 'Common Stock' 'Fixed Amortizing Bond' ... Data variables: fx_rates (fx_id, scenario) float64 1.236 1.191 1.481 1.12 1.264 ... instruments (instr_id, scenario) float64 1.0 1.143 0.9443 1.013 1.176 ... Attributes: base_currency: GBP ``` I individually dump them to disk with Dataset.to_netcdf(fname, engine='h5netcdf'). Then I try loading them back up with open_mfdataset, but it's mortally slow: ``` %%time xarray.open_mfdataset('*.nc', engine='h5netcdf') Wall time: 30.3 s ``` The problem is caused by the coords being read from disk multiple times. Workaround: ``` %%time def load_coords(ds): for coord in ds.coords.values(): coord.load() return ds xarray.open_mfdataset('*.nc', engine='h5netcdf', preprocess=load_coords) Wall time: 12.3 s ``` Proposed solutions: 1. Implement the above workaround directly inside open_mfdataset() 2. change open_dataset() to always eagerly load the coords to memory, regardless of the chunks parameter. Is there any valid use case where lazy coords are actually desirable? An additional, more radical observation is that, very frequently, a user knows in advance that all coords are aligned. In this use case, the user could explicitly request xarray to blindly trust this assumption, and thus skip loading the coords not based on concat_dim in all datasets beyond the first.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1521/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue