id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
2250654663,I_kwDOAMm_X86GJkPH,8957,netCDF encoding and decoding issues.,1492047,open,0,,,6,2024-04-18T13:06:49Z,2024-04-19T13:12:04Z,,CONTRIBUTOR,,,,"### What happened?

Reading or writing netCDF variables containing scale_factor and/or fill_value might raise the following error:
```python
UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
```
This problem might be related to the following changes: #7654.

### What did you expect to happen?

I'm expecting it to work like it did before xarray 2024.03.0!

### Minimal Complete Verifiable Example

```Python
# Example 1, decoding problem.

import netCDF4 as nc
import numpy as np
import xarray as xr

with nc.Dataset(""test1.nc"", mode=""w"") as ncds:
    ncds.createDimension(dimname=""d"")
    ncx = ncds.createVariable(
        varname=""x"",
        datatype=np.int64,
        dimensions=(""d"",),
        fill_value=-1,
    )

    ncx.scale_factor = 1e-3
    ncx.units = ""seconds""

    ncx[:] = np.array([0.001, 0.002, 0.003])

# This will raise the error
xr.load_dataset(""test1.nc"")


# Example 2, encoding problem.

import netCDF4 as nc
import numpy as np
import xarray as xr

with nc.Dataset(""test2.nc"", mode=""w"") as ncds:
    ncds.createDimension(dimname=""d"")
    ncx = ncds.createVariable(varname=""x"", datatype=np.int8, dimensions=(""d"",))

    ncx.scale_factor = 1000

    ncx[:] = np.array([1000, 2000, 3000])

# Reading it does work
data = xr.load_dataset(""test2.nc"")

# Writing read data does not work
data.to_netcdf(""text2x.nc"")
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [X] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

### Relevant log output

```Python
# Example 1 error
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Cell In[38], line 1
----> 1 xr.load_dataset(""test2.nc"")

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/api.py:280, in load_dataset(filename_or_obj, **kwargs)
    277     raise TypeError(""cache has no effect in this context"")
    279 with open_dataset(filename_or_obj, **kwargs) as ds:
--> 280     return ds.load()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/dataset.py:855, in Dataset.load(self, **kwargs)
    853 for k, v in self.variables.items():
    854     if k not in lazy_data:
--> 855         v.load()
    857 return self

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/variable.py:961, in Variable.load(self, **kwargs)
    944 def load(self, **kwargs):
    945     """"""Manually trigger loading of this variable's data from disk or a
    946     remote source into memory and return this variable.
    947 
   (...)
    959     dask.array.compute
    960     """"""
--> 961     self._data = to_duck_array(self._data, **kwargs)
    962     return self

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/namedarray/pycompat.py:134, in to_duck_array(data, **kwargs)
    131     return loaded_data
    133 if isinstance(data, ExplicitlyIndexed):
--> 134     return data.get_duck_array()  # type: ignore[no-untyped-call, no-any-return]
    135 elif is_duck_array(data):
    136     return data

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:809, in MemoryCachedArray.get_duck_array(self)
    808 def get_duck_array(self):
--> 809     self._ensure_cached()
    810     return self.array.get_duck_array()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:803, in MemoryCachedArray._ensure_cached(self)
    802 def _ensure_cached(self):
--> 803     self.array = as_indexable(self.array.get_duck_array())

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:760, in CopyOnWriteArray.get_duck_array(self)
    759 def get_duck_array(self):
--> 760     return self.array.get_duck_array()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/indexing.py:630, in LazilyIndexedArray.get_duck_array(self)
    625 # self.array[self.key] is now a numpy array when
    626 # self.array is a BackendArray subclass
    627 # and self.key is BasicIndexer((slice(None, None, None),))
    628 # so we need the explicit check for ExplicitlyIndexed
    629 if isinstance(array, ExplicitlyIndexed):
--> 630     array = array.get_duck_array()
    631 return _wrap_numpy_scalars(array)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:81, in _ElementwiseFunctionArray.get_duck_array(self)
     80 def get_duck_array(self):
---> 81     return self.func(self.array.get_duck_array())

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:81, in _ElementwiseFunctionArray.get_duck_array(self)
     80 def get_duck_array(self):
---> 81     return self.func(self.array.get_duck_array())

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:399, in _scale_offset_decoding(data, scale_factor, add_offset, dtype)
    397 data = data.astype(dtype=dtype, copy=True)
    398 if scale_factor is not None:
--> 399     data *= scale_factor
    400 if add_offset is not None:
    401     data += add_offset

UFuncTypeError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

# Example 2 error
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
Cell In[42], line 1
----> 1 data.to_netcdf(""text1x.nc"")

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/core/dataset.py:2298, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   2295     encoding = {}
   2296 from xarray.backends.api import to_netcdf
-> 2298 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   2299     self,
   2300     path,
   2301     mode=mode,
   2302     format=format,
   2303     group=group,
   2304     engine=engine,
   2305     encoding=encoding,
   2306     unlimited_dims=unlimited_dims,
   2307     compute=compute,
   2308     multifile=False,
   2309     invalid_netcdf=invalid_netcdf,
   2310 )

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/api.py:1339, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1334 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1335 # to avoid this mess of conditionals
   1336 try:
   1337     # TODO: allow this work (setting up the file for writing array data)
   1338     # to be parallelized with dask
-> 1339     dump_to_store(
   1340         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1341     )
   1342     if autoclose:
   1343         store.close()

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/api.py:1386, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1383 if encoder:
   1384     variables, attrs = encoder(variables, attrs)
-> 1386 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/common.py:393, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    390 if writer is None:
    391     writer = ArrayWriter()
--> 393 variables, attributes = self.encode(variables, attributes)
    395 self.set_attributes(attributes)
    396 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/backends/common.py:482, in WritableCFDataStore.encode(self, variables, attributes)
    479 def encode(self, variables, attributes):
    480     # All NetCDF files get CF encoded by default, without this attempting
    481     # to write times, for example, would fail.
--> 482     variables, attributes = cf_encoder(variables, attributes)
    483     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    484     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/conventions.py:795, in cf_encoder(variables, attributes)
    792 # add encoding for time bounds variables if present.
    793 _update_bounds_encoding(variables)
--> 795 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    797 # Remove attrs from bounds variables (issue #2921)
    798 for var in new_vars.values():

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/conventions.py:196, in encode_cf_variable(var, needs_copy, name)
    183 ensure_not_multiindex(var, name=name)
    185 for coder in [
    186     times.CFDatetimeCoder(),
    187     times.CFTimedeltaCoder(),
   (...)
    194     variables.BooleanCoder(),
    195 ]:
--> 196     var = coder.encode(var, name=name)
    198 # TODO(kmuehlbauer): check if ensure_dtype_not_object can be moved to backends:
    199 var = ensure_dtype_not_object(var, name=name)

File .../conda/envs/ong312_local/lib/python3.12/site-packages/xarray/coding/variables.py:476, in CFScaleOffsetCoder.encode(self, variable, name)
    474     data -= pop_to(encoding, attrs, ""add_offset"", name=name)
    475 if ""scale_factor"" in encoding:
--> 476     data /= pop_to(encoding, attrs, ""scale_factor"", name=name)
    478 return Variable(dims, data, attrs, encoding, fastpath=True)

UFuncTypeError: Cannot cast ufunc 'divide' output from dtype('float64') to dtype('int64') with casting rule 'same_kind'
```


### Anything else we need to know?

_No response_

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-92-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2024.3.0
pandas: 2.2.2
numpy: 1.26.4
scipy: 1.13.0
netCDF4: 1.6.5
pydap: None
h5netcdf: 1.3.0
h5py: 3.11.0
Nio: None
zarr: 2.17.2
cftime: 1.6.3
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.4.1
distributed: 2024.4.1
matplotlib: 3.8.4
cartopy: 0.23.0
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: 0.23
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.5.1
pip: 24.0
conda: 24.3.0
pytest: 8.1.1
mypy: 1.9.0
IPython: 8.22.2
sphinx: 7.3.5

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/8957/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue
1575938277,I_kwDOAMm_X85d7ujl,7516,Dataset.where performances regression.,1492047,open,0,,,10,2023-02-08T11:19:34Z,2023-05-03T12:58:14Z,,CONTRIBUTOR,,,,"### What happened?

Hello,

I'm using the **Dataset.where** function to select data based on some fields values and it takes way to much time!
The dask dashboard seems to show some tasks repeating themselves many times.

The provided example uses a 1D array for which the selection could be done with **Dataset.sel** but with our real usecase we make selections on 2D variables.

This problem seems to have appeared with the **2022.6.0** xarray release, the **2022.3.0 is working as expected**.

### What did you expect to happen?

Using the 2022.3 release, this selection takes 1.37 seconds.
Using the 2022.6.0 up to the 2023.2.0 (the one from yesterday), this selection takes 8.47 seconds.

This example is a very simple and small one, with real data and use case we simply cannot use this function anymore.

### Minimal Complete Verifiable Example

```Python
import dask.array as da
import distributed as dist
import xarray as xr


client = dist.Client()

# Using small chunks emphasis the problem
ds = xr.Dataset(
    {""field"": xr.DataArray(data=da.empty(shape=10000, chunks=10), dims=(""x""))}
)
sel = ds[""field""] > 0

ds.where(sel, drop=True)
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

Problematic version
<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:20:04) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2023.2.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.8.1
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.13.6
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.4
cfgrib: 0.9.10.3
iris: None
bottleneck: None
dask: 2023.1.1
distributed: 2023.1.1
matplotlib: 3.6.3
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: 0.20.1
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.1.0
pip: 23.0
conda: 22.11.1
pytest: 7.2.1
mypy: None
IPython: 8.7.0
sphinx: 5.3.0

</details>

Working version
<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:20:04) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: ('fr_FR', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: 4.8.1

xarray: 2022.3.0
pandas: 1.5.3
numpy: 1.23.5
scipy: 1.8.1
netCDF4: 1.6.2
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: 2.13.6
cftime: 1.6.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.4
cfgrib: 0.9.10.3
iris: None
bottleneck: None
dask: 2023.1.1
distributed: 2023.1.1
matplotlib: 3.6.3
cartopy: 0.21.1
seaborn: None
numbagg: None
fsspec: 2023.1.0
cupy: None
pint: 0.20.1
sparse: None
setuptools: 67.1.0
pip: 23.0
conda: 22.11.1
pytest: 7.2.1
IPython: 8.7.0
sphinx: 5.3.0

</details>","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7516/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,reopened,13221727,issue
372848074,MDU6SXNzdWUzNzI4NDgwNzQ=,2501,open_mfdataset usage and limitations.,1492047,closed,0,,,22,2018-10-23T07:31:42Z,2021-01-27T18:06:16Z,2021-01-27T18:06:16Z,CONTRIBUTOR,,,,"I'm trying to understand and use the open_mfdataset function to open a huge amount of files.
I thought this function would be quit similar to dask.dataframe.from_delayed and allow to ""load"" and work on an amount of data only limited by the number of Dask workers (or ""unlimited"" considering it could be ""lazily loaded"").

But my tests showed something quit different.
It seems xarray requires the index to be copied back to the Dask client in order to ""auto_combine"" data.


Doing some tests on a small portion of my data I have something like this.

Each file has these dimensions: time: ~2871, xx_ind: 40, yy_ind: 128.
The concatenation of these files is made on the time dimension and my understanding is that only the time is loaded and brought back to the client (other dimensions are constant).

Parallel tests are made with 200 dask workers.


```python
=================== Loading 1002 files ===================

xr.open_mfdataset('*1002*.nc')

peak memory: 1660.59 MiB, increment: 1536.25 MiB
Wall time: 1min 29s


xr.open_mfdataset('*1002*.nc', parallel=True)

peak memory: 1745.14 MiB, increment: 1602.43 MiB
Wall time: 53 s

=================== Loading 5010 files ===================

xr.open_mfdataset('*5010*.nc')

peak memory: 7419.99 MiB, increment: 7315.36 MiB
Wall time: 8min 33s


xr.open_mfdataset('*5010*.nc', parallel=True)

peak memory: 8249.75 MiB, increment: 8112.07 MiB
Wall time: 4min 48s
```

As you can see, the amount of memory used for this operation is significant and I won't be able to do this on much more files.
When using the parallel option, the loading of files take a few seconds (judging from what the Dask dashboard is showing) and I'm guessing the rest of the time is for the ""auto_combine"".

So I'm wondering if I'm doing something wrong, if there other way to load data or if I cannot use xarray directly for this quantity of data and have to use Dask directly.

Thanks in advance.


<details>


INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-34-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
xarray: 0.10.9+32.g9f4474d.dirty
pandas: 0.23.4
numpy: 1.15.2
scipy: 1.1.0
netCDF4: 1.4.1
h5netcdf: 0.6.2
h5py: 2.8.0
Nio: None
zarr: 2.2.0
cftime: 1.0.1
PseudonetCDF: None
rasterio: None
iris: None
bottleneck: None
cyordereddict: None
dask: 0.19.4
distributed: 1.23.3
matplotlib: 3.0.0
cartopy: None
seaborn: None
setuptools: 40.4.3
pip: 18.1
conda: None
pytest: 3.9.1
IPython: 7.0.1
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2501/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
676696822,MDExOlB1bGxSZXF1ZXN0NDY1OTYxOTIw,4333,Support explicitly setting a dimension order with to_dataframe(),1492047,closed,0,,,11,2020-08-11T08:46:45Z,2020-08-19T20:37:38Z,2020-08-14T18:28:26Z,CONTRIBUTOR,,0,pydata/xarray/pulls/4333,"<!-- Feel free to remove check-list items aren't relevant to your change -->

 - [x] Closes #4331
 - [x] Tests added
 - [x] Passes `isort . && black . && mypy . && flake8`
 - [x] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4333/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,pull
347895055,MDU6SXNzdWUzNDc4OTUwNTU=,2346,Dataset/DataArray to_dataframe() dimensions order mismatch.,1492047,closed,0,,,4,2018-08-06T12:03:00Z,2020-08-10T17:45:43Z,2020-08-08T07:10:28Z,CONTRIBUTOR,,,,"#### Code Sample

```python
import xarray as xr
import numpy as np

data = xr.DataArray(np.random.randn(2, 3), coords={'x': ['a', 'b']}, dims=('y', 'x'))
ds = xr.Dataset({'foo': data})

# Applied on the Dataset
ds.to_dataframe()

#          foo
#x y          
#a 0  0.348519
#  1 -0.322634
#  2 -0.683181
#b 0  0.197501
#  1  0.504810
#  2 -1.871626

# Applied to the DataArray
ds['foo'].to_dataframe()

#          foo
#y x          
#0 a  0.348519
#  b  0.197501
#1 a -0.322634
#  b  0.504810
#2 a -0.683181
#  b -1.871626
```
#### Problem description

The **to_dataframe** method applied to a DataArray will respect the dimensions order whereas the same method applied to a Dataset will use an alphabetically sorted order.

In both situation **to_dataframe** calls **_to_dataframe()** with an argument.
The DataArray uses an **OrderedDict** but the Dataset uses **self.dims** (which is a **SortedKeyDict**) as argument.

#### Output of ``xr.show_versions()``

<details>
INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-23-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8
xarray: 0.10.8
pandas: 0.23.4
numpy: 1.14.5
scipy: 1.1.0
netCDF4: 1.4.0
h5netcdf: None
h5py: 2.8.0
Nio: None
zarr: 2.2.0
bottleneck: None
cyordereddict: None
dask: 0.18.2
distributed: 1.22.1
matplotlib: 2.2.2
cartopy: None
seaborn: None
setuptools: 40.0.0
pip: 18.0
conda: None
pytest: 3.7.1
IPython: 6.5.0
sphinx: None

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2346/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue