id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1288323549,I_kwDOAMm_X85MykHd,6736,better handling of invalid files in open_mfdataset,731499,open,0,,,2,2022-06-29T08:00:18Z,2023-07-09T23:49:36Z,,CONTRIBUTOR,,,,"### Is your feature request related to a problem?

Suppose I'm trying to read a large number of netCDF files with ```open_mfdataset```.

Now suppose that one of those files is for some reason incorrect -- for instance there was a problem during the creation of that particular file, and its file size is zero, or it is not valid netCDF. The file exists, but it is invalid.

Currently ```open_mfdataset``` will raise an exception with the message
```ValueError: did not find a match in any of xarray's currently installed IO backends```

As far as I can tell, there is currently no way to identify which one(s) of the files being read is the source of the problem. If there are several hundreds of those, finding the problematic files is a task by itself, even though xarray probably knows them.

### Describe the solution you'd like

It would be most useful to this particular user if the error message could somehow identify the file(s) responsible for the exception.

Apart from better reporting, I would find it very useful if I could pass to ```open_mfdataset``` some kind of argument that would make it ignore invalid files altogether (```ignore_invalid=False``` comes to mind). 

### Describe alternatives you've considered

_No response_

### Additional context

_No response_","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6736/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1295939038,I_kwDOAMm_X85NPnXe,6758,simple groupby_bins 10x slower than numpy,731499,closed,0,,,8,2022-07-06T14:36:26Z,2022-07-07T08:26:26Z,2022-07-06T17:24:27Z,CONTRIBUTOR,,,,"I am finding that groupby_bins is 10x slower than numpy in what I consider to be a simple implementation.

In the screenshot below, you can see me opening a netCDF file containing two variables with the same single dimension. One variable is the latitude. I want to aggregate (sum) the other variable in bins of latitude. The xarray approach using groupby_bins takes ~314ms per loop, the numpy approach less than 30ms per loop. 

I need to do this kind of computation on many more variables, on data spanning several years, and following the xarray approach leads to many more hours of processing :-/

Am I doing something wrong here?

![Capture d’écran 2022-07-06 à 16 28 23](https://user-images.githubusercontent.com/731499/177574951-6fe8a4c5-e6a8-4231-ad7b-40157dd2eb6b.png)

","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/6758/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
205414496,MDU6SXNzdWUyMDU0MTQ0OTY=,1249,confusing dataset creation process,731499,open,0,,,6,2017-02-05T09:52:44Z,2022-06-26T15:07:59Z,,CONTRIBUTOR,,,,"In another issue I create a simple dataset like so:

```python
lat = np.random.rand(50000) * 180 - 90
lon = np.random.rand(50000) * 360 - 180
d = xr.Dataset({'latitude':lat, 'longitude':lon})
```

I expected `d` to contain two variables (`latitude` and `longitude`) with no coordinates. Instead `d` appears to contain two coordinates and no variables:

```
In [5]: d
Out[5]: 
<xarray.Dataset>
Dimensions:    (latitude: 50000, longitude: 50000)
Coordinates:
  * latitude   (latitude) float64 -76.0 -84.36 26.69 66.44 -37.85 50.13 ...
  * longitude  (longitude) float64 -148.7 -74.82 18.37 117.7 80.63 12.25 ...
Data variables:
    *empty*
```

Is this desired behavior?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1249/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
200364693,MDU6SXNzdWUyMDAzNjQ2OTM=,1201,pass projection argument to plt.subplot when faceting with cartopy transform,731499,closed,0,,,10,2017-01-12T13:18:52Z,2020-03-29T16:30:29Z,2020-03-29T16:30:29Z,CONTRIBUTOR,,,,"I have a `data` 3D DataArray with `Time`, `Latitude` and `Longitude` coordinates.

I want to plot maps of this dataset, faceted by Time. The following code

```
import cartopy.crs as ccrs
proj = ccrs.PlateCarree()
data.plot(transform=proj, col='Time', col_wrap=3, robust=True)
```

fails with

``` 
ValueError: Axes should be an instance of GeoAxes, got <class 'matplotlib.axes._subplots.AxesSubplot'> 
```

this is because to plot with a transform, the axes must be a GeoAxes, which is done with something like `plt.subplot(111, projection=proj)`. The implicit subplotting done when faceting does not do that. To make the faceting works, I had to do

```
import cartopy.crs as ccrs
proj = ccrs.PlateCarree()
data.plot(transform=proj, col='Time', col_wrap=3, robust=True, subplot_kws={'projection':proj})
```

I propose that, when plot faceting is requested with a `transform` kw, the content of that keyword should be passed to the subplot function as a `projection` argument automatically by default. If a projection is provided explicitely like in the call above, use that one.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1201/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
274298111,MDU6SXNzdWUyNzQyOTgxMTE=,1719,open_mfdataset crashes when files are present with datasets of dimension = 0,731499,closed,0,,,2,2017-11-15T20:44:17Z,2020-03-09T00:50:17Z,2020-03-09T00:50:17Z,CONTRIBUTOR,,,,"I have a bunch of netCDF files that I want to read through open_mfdataset. Each file was created with xarray through to_netcdf() and contains several variables with a single `time` dimension. Most files look like this:

```
netcdf CAL_LID_L2_05kmCLay-Standard-V4-10.2008-07-01T01-54-46ZD.hdf_extract {
dimensions:
	time = 112 ;
variables:
	float lat(time) ;
		lat:_FillValue = NaNf ;
	float lon(time) ;
		lon:_FillValue = NaNf ;
	float elev(time) ;
		elev:_FillValue = NaNf ;
	double daynight(time) ;
		daynight:_FillValue = NaN ;
	double surf(time) ;
		surf:_FillValue = NaN ;

// global attributes:
		:_NCProperties = ""version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17"" ;
}
```

if one of the files is empty, i.e. the length of the 'time' dimension is zero:

```
netcdf CAL_LID_L2_05kmCLay-Standard-V4-10.2008-01-01T00-37-48ZD.hdf_extract {
dimensions:
	time = UNLIMITED ; // (0 currently)
variables:
	float lat(time) ;
		lat:_FillValue = NaNf ;
	float lon(time) ;
		lon:_FillValue = NaNf ;
	float elev(time) ;
		elev:_FillValue = NaNf ;
	double daynight(time) ;
		daynight:_FillValue = NaN ;
	double surf(time) ;
		surf:_FillValue = NaN ;

// global attributes:
		:_NCProperties = ""version=1|netcdflibversion=4.4.1|hdf5libversion=1.8.17"" ;
}
```

then open_mfdataset crashes with 

```python
File ""./test_map_elev_month_gl2.py"", line 22, in main
    data = xr.open_mfdataset(files, concat_dim='time', autoclose=True)
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 505, in open_mfdataset
    **kwargs) for p in paths]
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 505, in <listcomp>
    **kwargs) for p in paths]
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 301, in open_dataset
    return maybe_decode_store(store, lock)
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/backends/api.py"", line 243, in maybe_decode_store
    lock=lock)
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py"", line 1094, in chunk
    for k, v in self.variables.items()])
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py"", line 1094, in <listcomp>
    for k, v in self.variables.items()])
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/dataset.py"", line 1089, in maybe_chunk
    return var.chunk(chunks, name=name2, lock=lock)
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/xarray/core/variable.py"", line 540, in chunk
    data = da.from_array(data, chunks, name=name, lock=lock)
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 1798, in from_array
    chunks = normalize_chunks(chunks, x.shape)
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 1758, in normalize_chunks
    for s, c in zip(shape, chunks)), ())
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 1758, in <genexpr>
    for s, c in zip(shape, chunks)), ())
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 881, in blockdims_from_blockshape
    for d, bd in zip(shape, chunks))
  File ""/home/noel/.conda/envs/python3/lib/python3.6/site-packages/dask/array/core.py"", line 881, in <genexpr>
    for d, bd in zip(shape, chunks))
ZeroDivisionError: integer division or modulo by zero
```
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1719/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
200593854,MDExOlB1bGxSZXF1ZXN0MTAxNDIwNzky,1205,transfer projection to implied subplots when faceting,731499,closed,0,,,8,2017-01-13T10:16:30Z,2019-07-13T20:54:14Z,2019-07-13T20:54:14Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1205,"this catches `transform` kw passed to `plot()` when faceting, and pass the associated projection to the subplots. This does what was suggested in issue #1201.

(sorry for the irrelevant change to reshaping.rst, it seems I've stuck myself in a git hole and can't get out)","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1205/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
200376941,MDExOlB1bGxSZXF1ZXN0MTAxMjY2MTM5,1203,add info in doc on how to facet with cartopy,731499,closed,0,,,7,2017-01-12T14:13:32Z,2019-03-28T12:48:09Z,2017-01-13T16:29:14Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1203,"This change explains
- how to pass projection arguments to faceted subplots
- how to access the Axes of the created subplots (not relevant only to cartopy), relevant to issue #1202 ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1203/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
205215815,MDU6SXNzdWUyMDUyMTU4MTU=,1247,numpy function very slow on DataArray compared to DataArray.values,731499,closed,0,,,5,2017-02-03T17:12:08Z,2019-01-23T17:34:22Z,2019-01-23T17:34:22Z,CONTRIBUTOR,,,,"First I create some fake latitude and longitude points. I stash them in a dataset, and compute a 2d histogram on those.

```python
#!/usr/bin/env python

import xarray as xr
import numpy as np

lat = np.random.rand(50000) * 180 - 90
lon = np.random.rand(50000) * 360 - 180
d = xr.Dataset({'latitude':lat, 'longitude':lon})

latbins = np.r_[-90:90:2.]
lonbins = np.r_[-180:180:2.]
h, xx, yy = np.histogram2d(d['longitude'], d['latitude'], bins=(lonbins, latbins))
```

When I run this I get some underwhelming performance:

```
> time ./test_with_xarray.py

real	0m28.152s
user	0m27.201s
sys	0m0.630s
```

If I change the last line to 

```python
h, xx, yy = np.histogram2d(d['longitude'].values, d['latitude'].values, bins=(lonbins, latbins))
```

(i.e. I pass the numpy arrays directly to the histogram2d function), things are very different:

```
> time ./test_with_xarray.py

real	0m0.996s
user	0m0.569s
sys	0m0.253s
```

It's ~28 times slower to call histogram2d on the DataArrays, compared to calling it on the underlying numpy arrays. I ran into this issue while histogramming quite large lon/lat vectors from multiple netCDF files. I got tired waiting for the computation to end, added the `.values` to the call and went through very quickly. 

It seems problematic that using xarray can slow down your code by 28 times with no real way for you to know about it...","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1247/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
329483009,MDU6SXNzdWUzMjk0ODMwMDk=,2216,let the user specify figure dpi when plotting,731499,closed,0,,,3,2018-06-05T14:28:52Z,2019-01-13T01:40:11Z,2019-01-13T01:40:11Z,CONTRIBUTOR,,,,"when using a DataArray `plot` function, it is already possible to specify the figure size and aspect ratio. 

I think it would make sense to also be able to specify the dpi, e.g. `x.plot(dpi=109)`. It would save one call to `plt.figure()`.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2216/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
208903781,MDU6SXNzdWUyMDg5MDM3ODE=,1279,Rolling window operation does not work with dask arrays,731499,closed,0,,,12,2017-02-20T14:59:59Z,2017-09-14T17:19:51Z,2017-09-14T17:19:51Z,CONTRIBUTOR,,,,"As the title says :-)

This would be very useful to downsample long time series read from multiple consecutive netcdf files.

Note that I was able to apply the rolling window by converting my variable to a pandas series with `to_series()`. I then could use panda's own rolling window methods. I guess that when converting to a pandas series the dask array is read in memory?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1279/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
211391408,MDExOlB1bGxSZXF1ZXN0MTA4NzU5MzIw,1291,Guess the complementary dimension when only one is passed to pcolormesh,731499,closed,0,,,10,2017-03-02T13:31:16Z,2017-03-07T15:46:58Z,2017-03-07T14:56:13Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1291, - [x] closes #1290 ,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1291/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
211380196,MDU6SXNzdWUyMTEzODAxOTY=,1290,guess the second coordinate when only one is passed to pcolormesh(),731499,closed,0,,,0,2017-03-02T12:42:37Z,2017-03-07T14:56:13Z,2017-03-07T14:56:13Z,CONTRIBUTOR,,,,"Say I have a DataArray `z `with dimensions `('x', 'y')`, in that order. If I call `z.plot()`, it will create a pcolormesh with `x` the vertical dimension and `y` the horizontal one. If I want to invert the axes, I need to call `z.plot(x='x', y='y')`. If I supply only one of the dimensions, i.e. `z.plot(x='x')`, I get the error message

```python
ValueError: cannot supply only one of x and y
```

I think that when calling the plot function on a 2d DataArray and passing only one coordinate, as in `z.plot(x='x')`, xarray could guess we want the unpassed coordinate to be the other one. I don't see why this would be unsafe.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1290/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
200369077,MDU6SXNzdWUyMDAzNjkwNzc=,1202,way to apply functions to subplots when faceting,731499,closed,0,,,2,2017-01-12T13:39:07Z,2017-01-13T18:48:32Z,2017-01-13T18:48:32Z,CONTRIBUTOR,,,,"When plotting maps with cartopy, it is common to request plotting additional information over the map, e.g. coastlines using `ax.coastlines()`.

When faceting maps (as in issue #1201), AFAICS there is no way to add coastlines to each of the faceted subplots. I do not know if or how this can be done, but in my view not being able to do it severely limits the interest of faceting when dealing with maps.","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1202/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
199809433,MDExOlB1bGxSZXF1ZXN0MTAwODY0NzY0,1196,small typo,731499,closed,0,,,1,2017-01-10T12:32:51Z,2017-01-10T18:10:21Z,2017-01-10T18:10:19Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1196,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1196/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
187661575,MDExOlB1bGxSZXF1ZXN0OTI1NDc5ODg=,1088,missing return value in sample function calls (I think),731499,closed,0,,,3,2016-11-07T09:25:40Z,2016-11-16T02:14:33Z,2016-11-16T02:14:28Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1088,"sorry if I messed up, I'm not a github master","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1088/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
187990259,MDExOlB1bGxSZXF1ZXN0OTI3NzY0NjI=,1096,fix typo in doc,731499,closed,0,,,1,2016-11-08T13:22:31Z,2016-11-08T15:55:32Z,2016-11-08T15:55:28Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1096,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1096/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
187661822,MDExOlB1bGxSZXF1ZXN0OTI1NDgxNTI=,1089,fix typo,731499,closed,0,,,1,2016-11-07T09:26:56Z,2016-11-07T13:59:10Z,2016-11-07T13:59:10Z,CONTRIBUTOR,,0,pydata/xarray/pulls/1089,,"{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1089/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull