home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where type = "issue" and user = 42246615 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 2 ✖

state 1

  • closed 2

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
755105132 MDU6SXNzdWU3NTUxMDUxMzI= 4641 Wrong hue assignment in scatter plot astoeriko 42246615 closed 0     7 2020-12-02T09:34:40Z 2021-01-13T23:02:33Z 2021-01-13T23:02:33Z NONE      

What happened: When using the hue keyword in a scatter plot to color the points based on a string variable, the color assignment in the plot is wrong (whereas the legend is correct).

What you expected to happen: In the example, data of category "A" ranges between 0 and 2 in u-direction and 0 and 0.5 in v-direction. Points in that square should be orange (the color for "A") but currently are blue.

Minimal Complete Verifiable Example:

```python import xarray as xr import numpy as np

u = np.random.rand(50, 2) * np.array([1, 2]) v = np.random.rand(50, 2) * np.array([1, 0.5])

ds = xr.Dataset( { "u": (("x", "category"), u), "v": (("x", "category"), v), }, coords={"category": ["B", "A"],} )

g = ds.plot.scatter( y="u", x="v", hue="category", ); ```

Anything else we need to know?: I think that this might be related to sorting at some point. If the variable by which I color is sorted alphabetically (["A", "B"] instead of ["B", "A"]), the color assignment is correct.

Not sure if this issue is related to https://github.com/pydata/xarray/issues/4126, bit it looks different to me (the problem is not the legend, but the colors in the plot itself).

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 4.15.0-122-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.16.0 pandas: 1.1.2 numpy: 1.17.5 scipy: 1.5.2 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.26.0 distributed: 2.26.0 matplotlib: 3.3.2 cartopy: None seaborn: 0.11.0 numbagg: None pint: None setuptools: 49.6.0.post20200814 pip: 20.2.3 conda: 4.8.3 pytest: 6.0.1 IPython: 7.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4641/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
352909556 MDU6SXNzdWUzNTI5MDk1NTY= 2376 File written by to_netcdf() not closed when Dataset is generated from dask delayed object using a dask Client() astoeriko 42246615 closed 0     2 2018-08-22T11:21:05Z 2018-10-09T04:13:41Z 2018-10-09T04:13:41Z NONE      

Code Sample

```python import numpy as np import xarray as xr import dask.array as da import dask from dask.distributed import Client

@dask.delayed def run_sim(n_time): result = np.array([np.random.randn(n_time)]) return result

client = Client()

Parameters

n_sims = 5 n_time = 100 output_file = 'out.nc'

if I use this as output, computing the data after reopening the file

produces an error

out = da.stack([da.from_delayed(run_sim(n_time), (1,n_time,),np.float64) for i in range(n_sims)])

If I use this as output, reopening the netcdf file is no problem

out = np.random.randn(n_sims,2,n_time)

ds = xr.Dataset({'var1': (['realization', 'time'], out[:,0,:])}, coords={'realization': np.arange(n_sims), 'time': np.arange(n_time)*.1})

Save to a netcdf file -> at this point, computations will be carried out

ds.to_netcdf(output_file, engine='netcdf4')

Reopen the file

with xr.open_dataset(output_file, chunks={'realization': 2}, engine='netcdf4')as ds: # Now acces the data ds.compute()

```

Problem description

When I generate a Dataset using a dask delayed object and save the Dataset to a netcdf file, it seems that the file is not properly closed. When trying to reopen it, I get an error (see below). Also, ncdump -h fails on the output file after it has been written. However, after the first unsuccessful attempt to open the file, the file seems to be closed. I can run ncdump -h on it and a second attempt to open it works.

Note that the problem only arises if I

  • store output form a dask delayed object in the Dataset (not if I store a simple numpy array of random numbers)
  • start a dask.distributed.Client()

This issue is related to my question on stackoverflow.

Traceback of the python code:

```python-traceback


OSError Traceback (most recent call last) <ipython-input-2-83478559c186> in <module>() 36 with xr.open_dataset(output_file, chunks={'realization': 2}, engine='netcdf4')as ds: 37 # Now acces the data ---> 38 ds.compute()

~/miniconda3/lib/python3.6/site-packages/xarray/core/dataset.py in compute(self, kwargs) 592 """ 593 new = self.copy(deep=False) --> 594 return new.load(kwargs) 595 596 def _persist_inplace(self, **kwargs):

~/miniconda3/lib/python3.6/site-packages/xarray/core/dataset.py in load(self, kwargs) 489 490 # evaluate all the dask arrays simultaneously --> 491 evaluated_data = da.compute(*lazy_data.values(), kwargs) 492 493 for k, data in zip(lazy_data, evaluated_data):

~/miniconda3/lib/python3.6/site-packages/dask/base.py in compute(args, kwargs) 400 keys = [x.dask_keys() for x in collections] 401 postcomputes = [x.dask_postcompute() for x in collections] --> 402 results = schedule(dsk, keys, kwargs) 403 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 404

~/miniconda3/lib/python3.6/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, **kwargs) 2191 try: 2192 results = self.gather(packed, asynchronous=asynchronous, -> 2193 direct=direct) 2194 finally: 2195 for f in futures.values():

~/miniconda3/lib/python3.6/site-packages/distributed/client.py in gather(self, futures, errors, maxsize, direct, asynchronous) 1566 return self.sync(self._gather, futures, errors=errors, 1567 direct=direct, local_worker=local_worker, -> 1568 asynchronous=asynchronous) 1569 1570 @gen.coroutine

~/miniconda3/lib/python3.6/site-packages/distributed/client.py in sync(self, func, args, kwargs) 651 return future 652 else: --> 653 return sync(self.loop, func, args, **kwargs) 654 655 def repr(self):

~/miniconda3/lib/python3.6/site-packages/distributed/utils.py in sync(loop, func, args, kwargs) 275 e.wait(10) 276 if error[0]: --> 277 six.reraise(error[0]) 278 else: 279 return result[0]

~/miniconda3/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 691 if value.traceback is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None

~/miniconda3/lib/python3.6/site-packages/distributed/utils.py in f() 260 if timeout is not None: 261 future = gen.with_timeout(timedelta(seconds=timeout), future) --> 262 result[0] = yield future 263 except Exception as exc: 264 error[0] = sys.exc_info()

~/miniconda3/lib/python3.6/site-packages/tornado/gen.py in run(self) 1131 1132 try: -> 1133 value = future.result() 1134 except Exception: 1135 self.had_exception = True

~/miniconda3/lib/python3.6/site-packages/tornado/gen.py in run(self) 1139 if exc_info is not None: 1140 try: -> 1141 yielded = self.gen.throw(*exc_info) 1142 finally: 1143 # Break up a reference to itself

~/miniconda3/lib/python3.6/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1445 six.reraise(type(exception), 1446 exception, -> 1447 traceback) 1448 if errors == 'skip': 1449 bad_keys.add(key)

~/miniconda3/lib/python3.6/site-packages/six.py in reraise(tp, value, tb) 690 value = tp() 691 if value.traceback is not tb: --> 692 raise value.with_traceback(tb) 693 raise value 694 finally:

~/miniconda3/lib/python3.6/site-packages/dask/array/core.py in getter() 87 c = a[b] 88 if asarray: ---> 89 c = np.asarray(c) 90 finally: 91 if lock:

~/miniconda3/lib/python3.6/site-packages/numpy/core/numeric.py in asarray() 490 491 """ --> 492 return array(a, dtype, copy=False, order=order) 493 494

~/miniconda3/lib/python3.6/site-packages/xarray/core/indexing.py in array() 600 601 def array(self, dtype=None): --> 602 return np.asarray(self.array, dtype=dtype) 603 604 def getitem(self, key):

~/miniconda3/lib/python3.6/site-packages/numpy/core/numeric.py in asarray() 490 491 """ --> 492 return array(a, dtype, copy=False, order=order) 493 494

~/miniconda3/lib/python3.6/site-packages/xarray/core/indexing.py in array() 506 def array(self, dtype=None): 507 array = as_indexable(self.array) --> 508 return np.asarray(array[self.key], dtype=None) 509 510 def transpose(self, order):

~/miniconda3/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in getitem() 62 getitem = operator.getitem 63 ---> 64 with self.datastore.ensure_open(autoclose=True): 65 try: 66 array = getitem(self.get_array(), key.tuple)

~/miniconda3/lib/python3.6/contextlib.py in enter() 79 def enter(self): 80 try: ---> 81 return next(self.gen) 82 except StopIteration: 83 raise RuntimeError("generator didn't yield") from None

~/miniconda3/lib/python3.6/site-packages/xarray/backends/common.py in ensure_open() 502 if not self._isopen: 503 try: --> 504 self._ds = self._opener() 505 self._isopen = True 506 yield

~/miniconda3/lib/python3.6/site-packages/xarray/backends/netCDF4_.py in _open_netcdf4_group() 229 import netCDF4 as nc4 230 --> 231 ds = nc4.Dataset(filename, mode=mode, **kwargs) 232 233 with close_on_error(ds):

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.init()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -101] NetCDF: HDF error: b'/home/user/code/test/out.nc'

`` Output ofncdump -h` after writing the file (before reopening):

``` HDF5-DIAG: Error detected in HDF5 (1.10.2) thread 139952254916352: #000: H5F.c line 511 in H5Fopen(): unable to open file major: File accessibilty minor: Unable to open file #001: H5Fint.c line 1519 in H5F_open(): unable to lock the file major: File accessibilty minor: Unable to open file #002: H5FD.c line 1650 in H5FD_lock(): driver lock request failed major: Virtual File Layer minor: Can't update object #003: H5FDsec2.c line 941 in H5FD_sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable' major: File accessibilty minor: Bad file ID accessed ncdump: out.nc: NetCDF: HDF error

```

Expected Output

The netcdf-file is closed after writing it with to_netcdf().

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.5.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-133-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.10.8 pandas: 0.23.3 numpy: 1.14.5 scipy: 1.1.0 netCDF4: 1.4.1 h5netcdf: 0.6.2 h5py: 2.8.0 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.18.2 distributed: 1.22.1 matplotlib: 2.2.2 cartopy: None seaborn: 0.9.0 setuptools: 40.0.0 pip: 18.0 conda: 4.5.10 pytest: 3.6.4 IPython: 6.5.0 sphinx: 1.7.5
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2376/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 23.169ms · About: xarray-datasette