id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 755105132,MDU6SXNzdWU3NTUxMDUxMzI=,4641,Wrong hue assignment in scatter plot,42246615,closed,0,,,7,2020-12-02T09:34:40Z,2021-01-13T23:02:33Z,2021-01-13T23:02:33Z,NONE,,,," **What happened**: When using the hue keyword in a scatter plot to color the points based on a string variable, the color assignment in the plot is wrong (whereas the legend is correct). **What you expected to happen**: In the example, data of category `""A""` ranges between 0 and 2 in u-direction and 0 and 0.5 in v-direction. Points in that square should be orange (the color for ""A"") but currently are blue. **Minimal Complete Verifiable Example**: ```python import xarray as xr import numpy as np u = np.random.rand(50, 2) * np.array([1, 2]) v = np.random.rand(50, 2) * np.array([1, 0.5]) ds = xr.Dataset( { ""u"": ((""x"", ""category""), u), ""v"": ((""x"", ""category""), v), }, coords={""category"": [""B"", ""A""],} ) g = ds.plot.scatter( y=""u"", x=""v"", hue=""category"", ); ``` **Anything else we need to know?**: I think that this might be related to sorting at some point. If the variable by which I color is sorted alphabetically (`[""A"", ""B""]` instead of `[""B"", ""A""]`), the color assignment is correct. Not sure if this issue is related to https://github.com/pydata/xarray/issues/4126, bit it looks different to me (the problem is not the legend, but the colors in the plot itself). **Environment**:
Output of xr.show_versions() INSTALLED VERSIONS ------------------ commit: None python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-122-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.2 numpy: 1.17.5 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.26.0 distributed: 2.26.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.11.0 numbagg: None pint: None setuptools: 49.6.0.post20200814 pip: 20.2.3 conda: 4.8.3 pytest: 6.0.1 IPython: 7.18.1 sphinx: None
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/4641/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 352909556,MDU6SXNzdWUzNTI5MDk1NTY=,2376,File written by to_netcdf() not closed when Dataset is generated from dask delayed object using a dask Client(),42246615,closed,0,,,2,2018-08-22T11:21:05Z,2018-10-09T04:13:41Z,2018-10-09T04:13:41Z,NONE,,,,"#### Code Sample ```python import numpy as np import xarray as xr import dask.array as da import dask from dask.distributed import Client @dask.delayed def run_sim(n_time): result = np.array([np.random.randn(n_time)]) return result client = Client() # Parameters n_sims = 5 n_time = 100 output_file = 'out.nc' # if I use this as output, computing the data after reopening the file #produces an error out = da.stack([da.from_delayed(run_sim(n_time), (1,n_time,),np.float64) for i in range(n_sims)]) # If I use this as output, reopening the netcdf file is no problem #out = np.random.randn(n_sims,2,n_time) ds = xr.Dataset({'var1': (['realization', 'time'], out[:,0,:])}, coords={'realization': np.arange(n_sims), 'time': np.arange(n_time)*.1}) # Save to a netcdf file -> at this point, computations will be carried out ds.to_netcdf(output_file, engine='netcdf4') # Reopen the file with xr.open_dataset(output_file, chunks={'realization': 2}, engine='netcdf4')as ds: # Now acces the data ds.compute() ``` #### Problem description When I generate a Dataset using a dask delayed object and save the Dataset to a netcdf file, it seems that the file is not properly closed. When trying to reopen it, I get an error (see below). Also, `ncdump -h` fails on the output file after it has been written. However, after the first unsuccessful attempt to open the file, the file seems to be closed. I can run `ncdump -h` on it and a second attempt to open it works. Note that the problem _only_ arises if I - store output form a dask delayed object in the Dataset (not if I store a simple numpy array of random numbers) - start a dask.distributed.Client() This issue is related to my question on [stackoverflow](https://stackoverflow.com/questions/51930488/problems-reopening-netcdf-file-written-with-xarray-dask/51959512#51959512). Traceback of the python code: ```python-traceback --------------------------------------------------------------------------- OSError Traceback (most recent call last) in () 36 with xr.open_dataset(output_file, chunks={'realization': 2}, engine='netcdf4')as ds: 37 # Now acces the data ---> 38 ds.compute() ~/miniconda3/lib/python3.6/site-packages/xarray/core/dataset.py in compute(self, **kwargs) 592 """""" 593 new = self.copy(deep=False) --> 594 return new.load(**kwargs) 595 596 def _persist_inplace(self, **kwargs): ~/miniconda3/lib/python3.6/site-packages/xarray/core/dataset.py in load(self, **kwargs) 489 490 # evaluate all the dask arrays simultaneously --> 491 evaluated_data = da.compute(*lazy_data.values(), **kwargs) 492 493 for k, data in zip(lazy_data, evaluated_data): ~/miniconda3/lib/python3.6/site-packages/dask/base.py in compute(*args, **kwargs) 400 keys = [x.__dask_keys__() for x in collections] 401 postcomputes = [x.__dask_postcompute__() for x in collections] --> 402 results = schedule(dsk, keys, **kwargs) 403 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)]) 404 ~/miniconda3/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, **kwargs) 2191 try: 2192 results = self.gather(packed, asynchronous=asynchronous, -> 2193 direct=direct) 2194 finally: 2195 for f in futures.values(): ~/miniconda3/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct, asynchronous) 1566 return self.sync(self._gather, futures, errors=errors, 1567 direct=direct, local_worker=local_worker, -> 1568 asynchronous=asynchronous) 1569 1570 @gen.coroutine ~/miniconda3/lib/python3.6/site-packages/distributed/client.py in sync(self, func, *args, **kwargs) 651 return future 652 else: --> 653 return sync(self.loop, func, *args, **kwargs) 654 655 def __repr__(self): ~/miniconda3/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, *args, **kwargs) 275 e.wait(10) 276 if error[0]: --> 277 six.reraise(*error[0]) 278 else: 279 return result[0] ~/miniconda3/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 691 if value.__traceback__ is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None ~/miniconda3/lib/python3.6/site-packages/distributed/utils.py in f() 260 if timeout is not None: 261 future = gen.with_timeout(timedelta(seconds=timeout), future) --> 262 result[0] = yield future 263 except Exception as exc: 264 error[0] = sys.exc_info() ~/miniconda3/lib/python3.6/site-packages/tornado/gen.py in run(self) 1131 1132 try: -> 1133 value = future.result() 1134 except Exception: 1135 self.had_exception = True ~/miniconda3/lib/python3.6/site-packages/tornado/gen.py in run(self) 1139 if exc_info is not None: 1140 try: -> 1141 yielded = self.gen.throw(*exc_info) 1142 finally: 1143 # Break up a reference to itself ~/miniconda3/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1445 six.reraise(type(exception), 1446 exception, -> 1447 traceback) 1448 if errors == 'skip': 1449 bad_keys.add(key) ~/miniconda3/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 690 value = tp() 691 if value.__traceback__ is not tb: --> 692 raise value.with_traceback(tb) 693 raise value 694 finally: ~/miniconda3/lib/python3.6/site-packages/dask/array/core.py in getter() 87 c = a[b] 88 if asarray: ---> 89 c = np.asarray(c) 90 finally: 91 if lock: ~/miniconda3/lib/python3.6/site-packages/numpy/core/numeric.py in asarray() 490 491 """""" --> 492 return array(a, dtype, copy=False, order=order) 493 494 ~/miniconda3/lib/python3.6/site-packages/xarray/core/indexing.py in __array__() 600 601 def __array__(self, dtype=None): --> 602 return np.asarray(self.array, dtype=dtype) 603 604 def __getitem__(self, key): ~/miniconda3/lib/python3.6/site-packages/numpy/core/numeric.py in asarray() 490 491 """""" --> 492 return array(a, dtype, copy=False, order=order) 493 494 ~/miniconda3/lib/python3.6/site-packages/xarray/core/indexing.py in __array__() 506 def __array__(self, dtype=None): 507 array = as_indexable(self.array) --> 508 return np.asarray(array[self.key], dtype=None) 509 510 def transpose(self, order): ~/miniconda3/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in __getitem__() 62 getitem = operator.getitem 63 ---> 64 with self.datastore.ensure_open(autoclose=True): 65 try: 66 array = getitem(self.get_array(), key.tuple) ~/miniconda3/lib/python3.6/contextlib.py in __enter__() 79 def __enter__(self): 80 try: ---> 81 return next(self.gen) 82 except StopIteration: 83 raise RuntimeError(""generator didn't yield"") from None ~/miniconda3/lib/python3.6/site-packages/xarray/backends/common.py in ensure_open() 502 if not self._isopen: 503 try: --> 504 self._ds = self._opener() 505 self._isopen = True 506 yield ~/miniconda3/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in _open_netcdf4_group() 229 import netCDF4 as nc4 230 --> 231 ds = nc4.Dataset(filename, mode=mode, **kwargs) 232 233 with close_on_error(ds): netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__() netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success() OSError: [Errno -101] NetCDF: HDF error: b'/home/user/code/test/out.nc' ``` Output of `ncdump -h` after writing the file (before reopening): ``` HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 139952254916352: #000: H5F.c line 511 in H5Fopen(): unable to open file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1519 in H5F_open(): unable to lock the file major: File accessibilty minor: Unable to open file #002: H5FD.c line 1650 in H5FD_lock(): driver lock request failed major: Virtual File Layer minor: Can't update object #003: H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable' major: File accessibilty minor: Bad file ID accessed ncdump: out.nc: NetCDF: HDF error ``` #### Expected Output The netcdf-file is closed after writing it with to_netcdf(). #### Output of ``xr.show_versions()``
INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-133-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.3 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.2 cartopy: None seaborn: 0.9.0 setuptools: 40.0.0 pip: 18.0 conda: 4.5.10 pytest: 3.6.4 IPython: 6.5.0 sphinx: 1.7.5
","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/2376/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue