id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1391738128,I_kwDOAMm_X85S9D0Q,7109,Multiprocessing unable to pickle Dataset opened with open_mfdataset,18426352,closed,0,,,4,2022-09-30T02:43:43Z,2022-10-11T16:44:36Z,2022-10-11T16:44:35Z,CONTRIBUTOR,,,,"### What happened?

When passing a Dataset object opened using `open_mfdataset` to a function via Python's mutliprocessing.Pool module, I received the following error:
`AttributeError: Can't pickle local object 'open_mfdataset.<locals>.multi_file_closer`

### What did you expect to happen?

I expected the Dataset to be handed off to the function via multiprocessing without error. I can remove the error by using variable subsetting or other reduction, like via `where`, so I don't understand why the original Dataset object returned from open_mfdataset cannot be used.

### Minimal Complete Verifiable Example

```Python
#!/usr/bin/env python
  
import xarray as xr
import numpy as np
import glob
import multiprocessing

# Create toy DataArrays
temperature = np.array([[273.15,220.2,255.5],[221.1,260.1,270.5]])
humidity = np.array([[70.2,85.4,29.6],[30.3,55.4,100.0]])
da1 = xr.DataArray(temperature,dims=['y0','x0'],coords={'y0':np.array([0,1]),'x0':np.array([0,1,2])})
da2 = xr.DataArray(humidity,dims=['y0','x0'],coords={'y0':np.array([0,1]),'x0':np.array([0,1,2])})

# Create a toy Dataset
ds = xr.Dataset({'TEMP_K':da1,'RELHUM':da2})

# Write the toy Dataset to disk
ds.to_netcdf('xarray_pickle_dataset.nc')

# Function to use with open_mfdataset
def preprocess(ds):
  ds = ds.rename({'TEMP_K':'temp_k'})
  return(ds)

# Function for using with multiprocessing
def calc_stats(ds,stat_name):
  if stat_name=='mean':
    return(ds.mean(dim=['y0']).to_dataframe())

# Get a pool of workers
mp = multiprocessing.Pool(5)

# Glob for the file
ncfiles = glob.glob('xarray*.nc')

# Can we call open_mfdataset() on a ds in memory?
#datasets = [xr.open_dataset(x) for x in ncfiles]
datasets = [xr.open_mfdataset([x],preprocess=preprocess) for x in ncfiles]

# TEST 1: ERROR
results = mp.starmap(calc_stats,[(ds,'mean') for ds in datasets])
print(results)

# TEST 2: PASS
#results = mp.starmap(calc_stats,[(ds[['temp_k','RELHUM']],'mean') for ds in datasets])
#print(results)

# TEST 3: ERROR
#results = mp.starmap(calc_stats,[(ds.isel(x0=0),'mean') for ds in datasets])
#print(results)

# TEST 4: PASS
#results = mp.starmap(calc_stats,[(ds.where(ds.RELHUM>80.0),'mean') for ds in datasets])
#print(results)

# TEST 5: ERROR
#results = mp.starmap(calc_stats,[(ds.sel(x0=slice(0,1,1)),'mean') for ds in datasets])
#print(results)
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

```Python
Traceback (most recent call last):
  File ""/d1/git/xarray_pickle_dataset.py"", line 35, in <module>
    results = mp.starmap(calc_stats,[(ds,'mean') for ds in datasets])
  File ""/home/.conda/envs/icing/lib/python3.9/multiprocessing/pool.py"", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File ""/home/.conda/envs/icing/lib/python3.9/multiprocessing/pool.py"", line 771, in get
    raise self._value
  File ""/home/.conda/envs/icing/lib/python3.9/multiprocessing/pool.py"", line 537, in _handle_tasks
    put(task)
  File ""/home/.conda/envs/icing/lib/python3.9/multiprocessing/connection.py"", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File ""/home/.conda/envs/icing/lib/python3.9/multiprocessing/reduction.py"", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'open_mfdataset.<locals>.multi_file_closer'
```


### Anything else we need to know?

Not shown in the verifiable example was another way I was able to get it to work, which looked like this:
```
results = mp.starmap(calc_stats,[(ds.sel(x0=ds.xvalues,y0=ds.yvalues),'mean') for ds in datasets])
print(results)
```
I can only assume that under the hood passing `ds.xvalues` (a 1D DataArray within the Dataset) to `sel` is transforming the Dataset enough to avoid the pickling error.

The error does NOT occur when using `open_dataset`, eg:
```datasets = [xr.open_dataset(x) for x in ncfiles]```
will work. However, in my workflow I would prefer to use `open_mfdataset` to perform some preprocessing using `preprocess` even though I am only opening one Dataset at a time.

### Environment

<details>
xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.12 | packaged by conda-forge | (main, Mar 24 2022, 23:25:59) 
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 4.19.0-21-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.3.0
pandas: 1.4.2
numpy: 1.22.3
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.05.0
distributed: 2022.5.0
matplotlib: 3.5.1
cartopy: 0.20.2
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: 0.19.2
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: None
IPython: None
sphinx: None
</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7109/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
774553196,MDU6SXNzdWU3NzQ1NTMxOTY=,4733,Xarray with cfgrib backend errors with .where() when drop=True,18426352,closed,0,,,4,2020-12-24T20:26:58Z,2021-01-02T08:17:37Z,2021-01-02T08:17:37Z,CONTRIBUTOR,,,,"<!-- Please include a self-contained copy-pastable example that generates the issue if possible.

Please be concise with code posted. See guidelines below on how to provide a good bug report:

- Craft Minimal Bug Reports: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports
- Minimal Complete Verifiable Examples: https://stackoverflow.com/help/mcve

Bug reports that follow these guidelines are easier to diagnose, and so are often handled much more quickly.
-->

**What happened**:
When loading a HRRR GRIBv2 file in this manner:
`ds = xr.open_dataset(gribfile_path,engine='cfgrib',backend_kwargs={'filter_by_keys':{'typeOfLevel':'hybrid'}})`

I have trouble using the `.where()` method when `drop=True`. If I set `drop=False`, it works fine. I am attempting to subset via latitude and longitude like this:

`ds_sub = ds.where((ds.latitude>=40.0)&(ds.latitude<=50.0)&(ds.longitude>=252.0)&(ds.longitude<=280.0),drop=True)`

However I receive the following errors:
```
Traceback (most recent call last):
  File ""read_interp_format_grib.py"", line 77, in <module>
    test = ds.where(mask_lat & mask_lon,drop=True)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/common.py"", line 1268, in where
    return ops.where_method(self, cond, other)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/ops.py"", line 193, in where_method
    return apply_ufunc(
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py"", line 1092, in apply_ufunc
    return apply_dataset_vfunc(
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py"", line 410, in apply_dataset_vfunc
    result_vars = apply_dict_of_variables_vfunc(
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py"", line 356, in apply_dict_of_variables_vfunc
    result_vars[name] = func(*variable_args)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py"", line 606, in apply_variable_ufunc
    input_data = [
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py"", line 607, in <listcomp>
    broadcast_compat_data(arg, broadcast_dims, core_dims)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/computation.py"", line 522, in broadcast_compat_data
    data = variable.data
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/variable.py"", line 359, in data
    return self.values
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/variable.py"", line 510, in values
    return _as_array_or_item(self._data)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/variable.py"", line 272, in _as_array_or_item
    data = np.asarray(data)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py"", line 685, in __array__
    self._ensure_cached()
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py"", line 682, in _ensure_cached
    self.array = NumpyIndexingAdapter(np.asarray(self.array))
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py"", line 655, in __array__
    return np.asarray(self.array, dtype=dtype)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/numpy/core/_asarray.py"", line 83, in asarray
    return array(a, dtype, copy=False, order=order)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py"", line 560, in __array__
    return np.asarray(array[self.key], dtype=None)
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/backends/cfgrib_.py"", line 23, in __getitem__
    return indexing.explicit_indexing_adapter(
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py"", line 848, in explicit_indexing_adapter
    result = NumpyIndexingAdapter(np.asarray(result))[numpy_indices]
  File ""/d1/anaconda3/envs/era5/lib/python3.8/site-packages/xarray/core/indexing.py"", line 1294, in __getitem__
    return array[key]
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
```

**What you expected to happen**:
I expect the dataset to be reduced in the x and y (latitude/longitude) dimensions where `cond=False` from `.where()` when `drop=True`.

**Minimal Complete Verifiable Example**:

```python
# Put your MCVE code here
```

**Anything else we need to know?**:
I was able to confirm cfgrib is where the issue lies by doing the following:
```
ds = xr.open_dataset(gribfile_path,engine='cfgrib',backend_kwargs={'filter_by_keys':{'typeOfLevel':'hybrid'}})
ds.to_netcdf('test.nc')
dsnc = xr.open_dataset('test.nc')
ds_sub = dsnc.where((dsnc.latitude>=40.0)&(dsnc.latitude<=50.0)&(dsnc.longitude>=252.0)&(dsnc.longitude<=280.0),drop=True)
```

That correctly gives me:
`Dimensions:     (hybrid: 50, x: 792, y: 414)`

Originally x=1799 and y=1059.

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct  7 2020, 19:08:05) 
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.9.0-14-amd64
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.1
pandas: 1.1.3
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.5.1
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.5
iris: None
bottleneck: None
dask: 2.30.0
distributed: 2.30.0
matplotlib: 3.3.2
cartopy: 0.17.0
seaborn: None
numbagg: None
pint: 0.16.1
setuptools: 49.6.0.post20200917
pip: 20.2.3
conda: None
pytest: None
IPython: None
sphinx: None


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4733/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue