id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type
1722417436,I_kwDOAMm_X85mqgEc,7868,"`open_dataset` with `chunks=""auto""` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING`",19285200,closed,0,,,8,2023-05-23T16:23:07Z,2023-11-17T15:26:01Z,2023-11-17T15:26:01Z,NONE,,,,"### What is your issue?

I noticed that `open_dataset` with `chunks=""auto""` fails when netCDF4 variables/coordinates are encoded as `NC_STRING`.
The reason is that xarray reads netCDF4 `NC_STRING` as `object` type, and `dask` cannot estimate the size of a `object` dtype. 

As a workaround, the user must currently rewrite the netCDF4 and specify the string DataArray(s) `encoding`(s) as a fixed-length string type (i.e `""S2""` if max string length is 2) so that the data are written as `NC_CHAR` and xarray read it back as byte-encoded fixed-length string type.

**Here below I provide a reproducible example** 

```
import xarray as xr
import numpy as np

# Define string datarray
arr = np.array([""M6"", ""M3""], dtype=str)
print(arr.dtype)  # <U2
da = xr.DataArray(data=arr, dims=(""time""))
data_vars = {""str_arr"": da}

# Create dataset
ds_nc_string = xr.Dataset(data_vars=data_vars)

# Set chunking to see behaviour at read-time
ds_nc_string[""str_arr""] = ds_nc_string[""str_arr""].chunk(1)  # chunks ((1,1),)
 
# Write dataset with NC_STRING
ds_nc_string[""str_arr""].encoding[""dtype""] = str
ds_nc_string.to_netcdf(""/tmp/nc_string.nc"")

# Write dataset with NC_CHAR
ds_nc_char = xr.Dataset(data_vars=data_vars)
ds_nc_char[""str_arr""].encoding[""dtype""] = ""S2""
ds_nc_char.to_netcdf(""/tmp/nc_char.nc"")

# When NC_STRING, chunks=""auto"" does not work when string are saved as 
# --> NC STRING is read as object, and dask can not estimate chunk size !
# If chunks={} it reads the NC_STRING array in a single dask chunk !!!
ds_nc_string = xr.open_dataset(""/tmp/nc_string.nc"", chunks=""auto"") # NotImplementedError
ds_nc_string = xr.open_dataset(""/tmp/nc_string.nc"", chunks={})     # Works
ds_nc_string.chunks  # chunks (2,)

# With NC_CHAR, chunks={} and chunks=""auto"" works and returns the same result!
ds_nc_char = xr.open_dataset(""/tmp/nc_char.nc"", chunks={})   
ds_nc_char.chunks # chunks (2,)
ds_nc_char = xr.open_dataset(""/tmp/nc_char.nc"", chunks=""auto"")
ds_nc_char.chunks # chunks (2,)

# NC_STRING is read back as object 
ds_nc_string = xr.open_dataset(""/tmp/nc_string.nc"", chunks=None)
ds_nc_string[""str_arr""].dtype  #  object 

# NC_CHAR is read back as fixed length byte-string representation (S2) 
ds_nc_char = xr.open_dataset(""/tmp/nc_char.nc"", chunks=None)
ds_nc_char[""str_arr""].dtype            #  S2 
ds_nc_char[""str_arr""].data.astype(str) #  U2 
```

Questions: 
- `open_dataset` should not take care of automatically deserializing the `NC_CHAR` fixed-length byte-string representation into a `Unicode string`? 
- `open_dataset` should not take care of automatically reading `NC_STRING` as `Unicode string` (converting `object` to `str`)? 

Related issues are: 
- https://github.com/pydata/xarray/issues/7652
- https://github.com/pydata/xarray/issues/2059
- https://github.com/pydata/xarray/pull/7654
- https://github.com/pydata/xarray/issues/2040
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7868/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
1368027148,I_kwDOAMm_X85RinAM,7014,xarray imshow and pcolormesh behave badly when the array does not contain values larger the BoundaryNorm vmax,19285200,closed,0,,,10,2022-09-09T15:59:31Z,2023-03-28T09:18:02Z,2023-03-28T09:18:02Z,NONE,,,,"### What happened?

If `cmap.set_over` is specified, the array color mapping and the colorbar behave badly if the array does not contain values above the `norm.vmax`. 
 
Let's take an array and apply a  colormap and norm (see code below)
![image](https://user-images.githubusercontent.com/19285200/189390679-fc5203c8-c921-419f-8262-0fce2896d993.png)
Now, if in the array I change the array values larger than the `norm.vmax` (the 2 bottom right pixels) with other values inside the norm: 
- Using matplotlib I get the expected results
![image](https://user-images.githubusercontent.com/19285200/189390698-857670c5-b92a-44ac-8651-5e09f403cfc0.png)
- Using xarray I get this weird behavior.
![image](https://user-images.githubusercontent.com/19285200/189390708-8f12ceb9-fc9f-4b01-8536-39100b39bc07.png)


### What did you expect to happen?

The colorbar should not ""shift"" and the array should be colormapped correctly 
This is possibily related also to https://github.com/pydata/xarray/issues/4061

### Minimal Complete Verifiable Example

```Python
import matplotlib.colors
import numpy as np
import xarray as xr 
import matplotlib as mpl 
import matplotlib.pyplot as plt

# Define DataArray 
arr = np.array([[0, 10, 15, 20],
                [ np.nan, 40, 50, 100], 
                [150, 158, 160, 161],
               ])
lon = np.arange(arr.shape[1])
lat = np.arange(arr.shape[0])[::-1]
lons, lats = np.meshgrid(lon, lat)
da = xr.DataArray(arr,
                  dims=[""y"", ""x""],
                  coords={""lon"": ((""y"",""x""), lons),
                          ""lat"": ((""y"",""x""), lats),
                          }
                  )
da

# Define colormap  
color_list = [""#9c7e94"", ""#640064"",   ""#009696"", ""#C8FF00"",  ""#FF7D00""]
levels =  [0.05, 1, 10, 20, 150, 160]
cmap = mpl.colors.LinearSegmentedColormap.from_list(""cmap"", color_list, len(levels) - 1)
norm = mpl.colors.BoundaryNorm(levels, cmap.N)
cmap.set_over(""darkred"")   # color for above 160
cmap.set_under(""none"")     # color for below 0.05
cmap.set_bad(""gray"", 0.2)  # color for nan 

# Define colorbar settings 
ticks = levels 
cbar_kwargs = {     
    'extend': ""max"",  
     
}    

# Correct plot 
p = da.plot.pcolormesh(x=""lon"", y=""lat"", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs)
plt.show()

# Remove values larger than the norm.vmax level
da1 = da.copy() 
da1.data[da1.data>=norm.vmax] = norm.vmax - 1 # could be replaced with any value inside the norm 

# With matplotlib.pcolormesh [OK]
p = plt.pcolormesh(da1[""lon""].data, 
                   da1[""lat""],
                   da1.data, 
                   cmap=cmap, norm=norm)
plt.colorbar(p, **cbar_kwargs)
plt.show()

# With matplotlib.imshow [OK]
p = plt.imshow(da1.data, 
               cmap=cmap, norm=norm)
plt.colorbar(p, **cbar_kwargs)
plt.show()

# With xarray.pcolormesh [BUG]
# --> The colorbar shift !!!  
da1.plot.pcolormesh(x=""lon"", y=""lat"", cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs)
plt.show()

# With xarray.imshow [BUG]
# --> The colorbar shift !!!
da1.plot.imshow(cmap=cmap, norm=norm, cbar_kwargs=cbar_kwargs, origin=""upper"")
```


### MVCE confirmation

- [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [X] Complete example — the example is self-contained, including all data and the text of any traceback.
- [X] Verifiable example — the example copy & pastes into an IPython prompt or [Binder notebook](https://mybinder.org/v2/gh/pydata/xarray/main?urlpath=lab/tree/doc/examples/blank_template.ipynb), returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.

### Relevant log output

_No response_

### Anything else we need to know?

_No response_

### Environment

<details>

INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21) 
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-124-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1

xarray: 2022.6.0
pandas: 1.4.3
numpy: 1.22.4
scipy: 1.9.0
netCDF4: 1.6.0
pydap: None
h5netcdf: 1.0.2
h5py: 3.7.0
Nio: None
zarr: 2.12.0
cftime: 1.6.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.3.0
cfgrib: None
iris: None
bottleneck: 1.3.5
dask: 2022.7.1
distributed: 2022.7.1
matplotlib: 3.5.2
cartopy: 0.20.3
seaborn: 0.11.2
numbagg: None
fsspec: 2022.7.1
cupy: None
pint: 0.19.2
sparse: None
flox: None
numpy_groupies: None
setuptools: 63.3.0
pip: 22.2.2
conda: None
pytest: None
IPython: 7.33.0
sphinx: 5.1.1
/home/ghiggi/anaconda3/envs/gpm_geo/lib/python3.9/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn(""Setuptools is replacing distutils."")

</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/7014/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue
749924639,MDU6SXNzdWU3NDk5MjQ2Mzk=,4607,"set_index(..., append=True) act as with append=False with 'Dimensions without coordinates'",19285200,open,0,,,0,2020-11-24T17:59:49Z,2020-11-24T19:37:04Z,,NONE,,,,"**What happened**:

I get into this strange behaviour when trying to recreate a stacked (MultiIndex) coordinate using `set_index(...,append=True)`. 

Since it is not possible to save Dataset to netCDF or Zarr containing stacked / MultiIndex coordinates, before writing to disk I used `reset_index(<stacked_coordinate>)`. When reading such data, I need to use `set_index(.., append=True)` to recreate such stacked coordinate.

**What you expected to happen**:

I would expect that `set_index(..., append=True)` would recreate the MultiIndex stacked coordinate. However, this does not occur if the dimension coordinate specified within set_index() is a 'dimension without coordinate'.
In such situation, `set_index(..., append=True)` behaves as `set_index(, append=False)`.

**Minimal Complete Verifiable Example**:

```python
import xarray as xr 
import numpy as np

### Create Datasets 
arr1 = np.random.rand(4, 5).reshape(4,5)
arr2 = np.random.rand(4, 5).reshape(4,5)
da1 = xr.DataArray(arr1, 
                   dims=['nodes','time'],
                   coords={""time"": [1,2,3,4,5],
                           ""nodes"": [1,2,3,4]},
                   name='var1')
da2 = xr.DataArray(arr2, 
                   dims=['nodes','time'],
                   coords={""time"": [1,2,3,4,5],
                           ""nodes"": [1,2,3,4]},
                   name='var2')
ds_unstacked = xr.Dataset({'var1':da1,'var2':da2})
print(ds_unstacked) 

# - Stack variables across a new dimension 
da_stacked = ds_unstacked.to_stacked_array(new_dim=""variables"", variable_dim='variable', 
                                           sample_dims=['nodes','time'], name=""Stacked_Variables"") 
ds_stacked = da_stacked.to_dataset()

# - Look at the stacked MultiIndex coordinate 'variables'
print(ds_stacked) 
print(da_stacked.variables.indexes)

### Remove MultiIndex (to save Dataset to netCDF/Zarr, ...)
ds_stacked_disk = ds_stacked.reset_index('variables')
print(ds_stacked_disk)
 
### Try to recreate MultiIndex  
print(ds_stacked_disk.set_index(variables=['variable'], append=False)) # GOOD ! Replace 'variable'  coordinate with 'variables'
print(ds_stacked_disk.set_index(variables=['variable'], append=True)) # BUG ! Do not create the expected MultiIndex !  

### Current workaround to obtain a MultiIndex stacked coordinate
tmp_ds = ds_stacked_disk.assign_coords(variables=(np.arange(0,2)))
ds_stacked1 = tmp_ds.set_index(variables=['variable'], append=True)   
print(ds_stacked1)  # But with level 0 - 'variables_level_0'

### Unstack back  
# - If the BUG is solved, no need to specify the level argument  
ds_stacked1['Stacked_Variables'].to_unstacked_dataset(dim='variables', level='variable')

```

**Environment**:

<details><summary>Output of <tt>xr.show_versions()</tt></summary>

INSTALLED VERSIONS
------------------
commit: None
python: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:55:52) 
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-48-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

xarray: 0.16.1
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.5.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.4
iris: None
bottleneck: 1.3.2
dask: 2.27.0
distributed: 2.27.0
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20200917
pip: 20.2.3
conda: None
pytest: None
IPython: 7.18.1
sphinx: 3.2.1


</details>
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4607/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue