id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 1288323549,I_kwDOAMm_X85MykHd,6736,better handling of invalid files in open_mfdataset,731499,open,0,,,2,2022-06-29T08:00:18Z,2023-07-09T23:49:36Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem? Suppose I'm trying to read a large number of netCDF files with ```open_mfdataset```. Now suppose that one of those files is for some reason incorrect -- for instance there was a problem during the creation of that particular file, and its file size is zero, or it is not valid netCDF. The file exists, but it is invalid. Currently ```open_mfdataset``` will raise an exception with the message ```ValueError: did not find a match in any of xarray's currently installed IO backends``` As far as I can tell, there is currently no way to identify which one(s) of the files being read is the source of the problem. If there are several hundreds of those, finding the problematic files is a task by itself, even though xarray probably knows them. ### Describe the solution you'd like It would be most useful to this particular user if the error message could somehow identify the file(s) responsible for the exception. Apart from better reporting, I would find it very useful if I could pass to ```open_mfdataset``` some kind of argument that would make it ignore invalid files altogether (```ignore_invalid=False``` comes to mind). ### Describe alternatives you've considered _No response_ ### Additional context _No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6736/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 1295939038,I_kwDOAMm_X85NPnXe,6758,simple groupby_bins 10x slower than numpy,731499,closed,0,,,8,2022-07-06T14:36:26Z,2022-07-07T08:26:26Z,2022-07-06T17:24:27Z,CONTRIBUTOR,,,,"I am finding that groupby_bins is 10x slower than numpy in what I consider to be a simple implementation. In the screenshot below, you can see me opening a netCDF file containing two variables with the same single dimension. One variable is the latitude. I want to aggregate (sum) the other variable in bins of latitude. The xarray approach using groupby_bins takes ~314ms per loop, the numpy approach less than 30ms per loop. I need to do this kind of computation on many more variables, on data spanning several years, and following the xarray approach leads to many more hours of processing :-/ Am I doing something wrong here? ![Capture d’écran 2022-07-06 à 16 28 23](https://user-images.githubusercontent.com/731499/177574951-6fe8a4c5-e6a8-4231-ad7b-40157dd2eb6b.png) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6758/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 205414496,MDU6SXNzdWUyMDU0MTQ0OTY=,1249,confusing dataset creation process,731499,open,0,,,6,2017-02-05T09:52:44Z,2022-06-26T15:07:59Z,,CONTRIBUTOR,,,,"In another issue I create a simple dataset like so: ```python lat = np.random.rand(50000) * 180 - 90 lon = np.random.rand(50000) * 360 - 180 d = xr.Dataset({'latitude':lat, 'longitude':lon}) ``` I expected `d` to contain two variables (`latitude` and `longitude`) with no coordinates. Instead `d` appears to contain two coordinates and no variables: ``` In [5]: d Out[5]: Dimensions: (latitude: 50000, longitude: 50000) Coordinates: * latitude (latitude) float64 -76.0 -84.36 26.69 66.44 -37.85 50.13 ... * longitude (longitude) float64 -148.7 -74.82 18.37 117.7 80.63 12.25 ... Data variables: *empty* ``` Is this desired behavior?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1249/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 200364693,MDU6SXNzdWUyMDAzNjQ2OTM=,1201,pass projection argument to plt.subplot when faceting with cartopy transform,731499,closed,0,,,10,2017-01-12T13:18:52Z,2020-03-29T16:30:29Z,2020-03-29T16:30:29Z,CONTRIBUTOR,,,,"I have a `data` 3D DataArray with `Time`, `Latitude` and `Longitude` coordinates. I want to plot maps of this dataset, faceted by Time. The following code ``` import cartopy.crs as ccrs proj = ccrs.PlateCarree() data.plot(transform=proj, col='Time', col_wrap=3, robust=True) ``` fails with ``` ValueError: Axes should be an instance of GeoAxes, got ``` this is because to plot with a transform, the axes must be a GeoAxes, which is done with something like `plt.subplot(111, projection=proj)`. The implicit subplotting done when faceting does not do that. To make the faceting works, I had to do ``` import cartopy.crs as ccrs proj = ccrs.PlateCarree() data.plot(transform=proj, col='Time', col_wrap=3, robust=True, subplot_kws={'projection':proj}) ``` I propose that, when plot faceting is requested with a `transform` kw, the content of that keyword should be passed to the subplot function as a `projection` argument automatically by default. If a projection is provided explicitely like in the call above, use that one.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1201/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 274298111,MDU6SXNzdWUyNzQyOTgxMTE=,1719,open_mfdataset crashes when files are present with datasets of dimension = 0,731499,closed,0,,,2,2017-11-15T20:44:17Z,2020-03-09T00:50:17Z,2020-03-09T00:50:17Z,CONTRIBUTOR,,,,"I have a bunch of netCDF files that I want to read through open_mfdataset. Each file was created with xarray through to_netcdf() and contains several variables with a single `time` dimension. Most files look like this: ``` netcdf CAL_LID_L2_05kmCLay-Standard-V4-10.2008-07-01T01-54-46ZD.hdf_extract { dimensions: time = 112 ; variables: float lat(time) ; lat:_FillValue = NaNf ; float lon(time) ; lon:_FillValue = NaNf ; float elev(time) ; elev:_FillValue = NaNf ; double daynight(time) ; daynight:_FillValue = NaN ; double surf(time) ; surf:_FillValue = NaN ; // global attributes: :_NCProperties = ""version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17"" ; } ``` if one of the files is empty, i.e. the length of the 'time' dimension is zero: ``` netcdf CAL_LID_L2_05kmCLay-Standard-V4-10.2008-01-01T00-37-48ZD.hdf_extract { dimensions: time = UNLIMITED ; // (0 currently) variables: float lat(time) ; lat:_FillValue = NaNf ; float lon(time) ; lon:_FillValue = NaNf ; float elev(time) ; elev:_FillValue = NaNf ; double daynight(time) ; daynight:_FillValue = NaN ; double surf(time) ; surf:_FillValue = NaN ; // global attributes: :_NCProperties = ""version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17"" ; } ``` then open_mfdataset crashes with ```python File ""./test_map_elev_month_gl2.py"", line 22, in main data = xr.open_mfdataset(files, concat_dim='time', autoclose=True) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 505, in open_mfdataset **kwargs) for p in paths] File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 505, in **kwargs) for p in paths] File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 301, in open_dataset return maybe_decode_store(store, lock) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 243, in maybe_decode_store lock=lock) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py"", line 1094, in chunk for k, v in self.variables.items()]) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py"", line 1094, in for k, v in self.variables.items()]) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py"", line 1089, in maybe_chunk return var.chunk(chunks, name=name2, lock=lock) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/variable.py"", line 540, in chunk data = da.from_array(data, chunks, name=name, lock=lock) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 1798, in from_array chunks = normalize_chunks(chunks, x.shape) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 1758, in normalize_chunks for s, c in zip(shape, chunks)), ()) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 1758, in for s, c in zip(shape, chunks)), ()) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 881, in blockdims_from_blockshape for d, bd in zip(shape, chunks)) File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 881, in for d, bd in zip(shape, chunks)) ZeroDivisionError: integer division or modulo by zero ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1719/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 200593854,MDExOlB1bGxSZXF1ZXN0MTAxNDIwNzky,1205,transfer projection to implied subplots when faceting,731499,closed,0,,,8,2017-01-13T10:16:30Z,2019-07-13T20:54:14Z,2019-07-13T20:54:14Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1205,"this catches `transform` kw passed to `plot()` when faceting, and pass the associated projection to the subplots. This does what was suggested in issue #1201. (sorry for the irrelevant change to reshaping.rst, it seems I've stuck myself in a git hole and can't get out)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1205/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 200376941,MDExOlB1bGxSZXF1ZXN0MTAxMjY2MTM5,1203,add info in doc on how to facet with cartopy,731499,closed,0,,,7,2017-01-12T14:13:32Z,2019-03-28T12:48:09Z,2017-01-13T16:29:14Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1203,"This change explains - how to pass projection arguments to faceted subplots - how to access the Axes of the created subplots (not relevant only to cartopy), relevant to issue #1202 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1203/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 205215815,MDU6SXNzdWUyMDUyMTU4MTU=,1247,numpy function very slow on DataArray compared to DataArray.values,731499,closed,0,,,5,2017-02-03T17:12:08Z,2019-01-23T17:34:22Z,2019-01-23T17:34:22Z,CONTRIBUTOR,,,,"First I create some fake latitude and longitude points. I stash them in a dataset, and compute a 2d histogram on those. ```python #!/usr/bin/env python import xarray as xr import numpy as np lat = np.random.rand(50000) * 180 - 90 lon = np.random.rand(50000) * 360 - 180 d = xr.Dataset({'latitude':lat, 'longitude':lon}) latbins = np.r_[-90:90:2.] lonbins = np.r_[-180:180:2.] h, xx, yy = np.histogram2d(d['longitude'], d['latitude'], bins=(lonbins, latbins)) ``` When I run this I get some underwhelming performance: ``` > time ./test_with_xarray.py real 0m28.152s user 0m27.201s sys 0m0.630s ``` If I change the last line to ```python h, xx, yy = np.histogram2d(d['longitude'].values, d['latitude'].values, bins=(lonbins, latbins)) ``` (i.e. I pass the numpy arrays directly to the histogram2d function), things are very different: ``` > time ./test_with_xarray.py real 0m0.996s user 0m0.569s sys 0m0.253s ``` It's ~28 times slower to call histogram2d on the DataArrays, compared to calling it on the underlying numpy arrays. I ran into this issue while histogramming quite large lon/lat vectors from multiple netCDF files. I got tired waiting for the computation to end, added the `.values` to the call and went through very quickly. It seems problematic that using xarray can slow down your code by 28 times with no real way for you to know about it...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1247/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 329483009,MDU6SXNzdWUzMjk0ODMwMDk=,2216,let the user specify figure dpi when plotting,731499,closed,0,,,3,2018-06-05T14:28:52Z,2019-01-13T01:40:11Z,2019-01-13T01:40:11Z,CONTRIBUTOR,,,,"when using a DataArray `plot` function, it is already possible to specify the figure size and aspect ratio. I think it would make sense to also be able to specify the dpi, e.g. `x.plot(dpi=109)`. It would save one call to `plt.figure()`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2216/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 208903781,MDU6SXNzdWUyMDg5MDM3ODE=,1279,Rolling window operation does not work with dask arrays,731499,closed,0,,,12,2017-02-20T14:59:59Z,2017-09-14T17:19:51Z,2017-09-14T17:19:51Z,CONTRIBUTOR,,,,"As the title says :-) This would be very useful to downsample long time series read from multiple consecutive netcdf files. Note that I was able to apply the rolling window by converting my variable to a pandas series with `to_series()`. I then could use panda's own rolling window methods. I guess that when converting to a pandas series the dask array is read in memory?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1279/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 211391408,MDExOlB1bGxSZXF1ZXN0MTA4NzU5MzIw,1291,Guess the complementary dimension when only one is passed to pcolormesh,731499,closed,0,,,10,2017-03-02T13:31:16Z,2017-03-07T15:46:58Z,2017-03-07T14:56:13Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1291, - [x] closes #1290 ,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1291/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 211380196,MDU6SXNzdWUyMTEzODAxOTY=,1290,guess the second coordinate when only one is passed to pcolormesh(),731499,closed,0,,,0,2017-03-02T12:42:37Z,2017-03-07T14:56:13Z,2017-03-07T14:56:13Z,CONTRIBUTOR,,,,"Say I have a DataArray `z `with dimensions `('x', 'y')`, in that order. If I call `z.plot()`, it will create a pcolormesh with `x` the vertical dimension and `y` the horizontal one. If I want to invert the axes, I need to call `z.plot(x='x', y='y')`. If I supply only one of the dimensions, i.e. `z.plot(x='x')`, I get the error message ```python ValueError: cannot supply only one of x and y ``` I think that when calling the plot function on a 2d DataArray and passing only one coordinate, as in `z.plot(x='x')`, xarray could guess we want the unpassed coordinate to be the other one. I don't see why this would be unsafe.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1290/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 200369077,MDU6SXNzdWUyMDAzNjkwNzc=,1202,way to apply functions to subplots when faceting,731499,closed,0,,,2,2017-01-12T13:39:07Z,2017-01-13T18:48:32Z,2017-01-13T18:48:32Z,CONTRIBUTOR,,,,"When plotting maps with cartopy, it is common to request plotting additional information over the map, e.g. coastlines using `ax.coastlines()`. When faceting maps (as in issue #1201), AFAICS there is no way to add coastlines to each of the faceted subplots. I do not know if or how this can be done, but in my view not being able to do it severely limits the interest of faceting when dealing with maps.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1202/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 199809433,MDExOlB1bGxSZXF1ZXN0MTAwODY0NzY0,1196,small typo,731499,closed,0,,,1,2017-01-10T12:32:51Z,2017-01-10T18:10:21Z,2017-01-10T18:10:19Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1196,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1196/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 187661575,MDExOlB1bGxSZXF1ZXN0OTI1NDc5ODg=,1088,missing return value in sample function calls (I think),731499,closed,0,,,3,2016-11-07T09:25:40Z,2016-11-16T02:14:33Z,2016-11-16T02:14:28Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1088,"sorry if I messed up, I'm not a github master","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1088/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 187990259,MDExOlB1bGxSZXF1ZXN0OTI3NzY0NjI=,1096,fix typo in doc,731499,closed,0,,,1,2016-11-08T13:22:31Z,2016-11-08T15:55:32Z,2016-11-08T15:55:28Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1096,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1096/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull 187661822,MDExOlB1bGxSZXF1ZXN0OTI1NDgxNTI=,1089,fix typo,731499,closed,0,,,1,2016-11-07T09:26:56Z,2016-11-07T13:59:10Z,2016-11-07T13:59:10Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1089,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1089/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull