home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 357729048 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • shoyer 1
  • granders19 1
  • brynpickering 1

author_association 2

  • NONE 2
  • MEMBER 1

issue 1

  • Volatile error: `unsupported dtype for netCDF4 variable: object` · 3 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
569344638 https://github.com/pydata/xarray/issues/2404#issuecomment-569344638 https://api.github.com/repos/pydata/xarray/issues/2404 MDEyOklzc3VlQ29tbWVudDU2OTM0NDYzOA== granders19 7776656 2019-12-27T20:58:53Z 2019-12-27T21:01:21Z NONE

hi, I have hit this (or a similar issue) in my xarray work, and I think I was able to find at least one MWE. In this case coordinates of a dataset have been built with underlying type numpy.str_. In my example below, I construct a pandas DataFrame, then convert it to a Dataset. The dtypes of the offending coordinate is displayed as object, but on writing to netcdf, I get a ValueError.

``` import pandas as pd import numpy as np import xarray as xr # xr.__version --> 0.13.0 import os import itertools

make a multi index (where one level is np.str_ type)

x = list(np.array([np.str_('idx_%i') %i for i in range(1,11)], dtype=np.str_)) y = list(np.arange(10)) combo = list(itertools.product(x, y)) x,y = zip(*combo)

the below is an odd way to construct a DataFrame, but the np.str_ type is preserved if done this way

data_df = np.random.randn(len(x)) df = pd.DataFrame(data=data_df, columns=['test']) df['x'] = x df['y'] = y df = df.set_index(['x', 'y']) ds = xr.Dataset.from_dataframe(df)

```

You can see from the below that the underlying type is np.str_

type(ds.coords['x'].values[0])

Writing to netcdf gives the below error:

```

ds.to_netcdf('/tmp/netcdf_repro.cdf', engine='h5netcdf')

ValueError Traceback (most recent call last) <ipython-input-19-bca194d02da3> in <module> ----> 1 ds.to_netcdf('/tmp/netcdf_repro.cdf', engine='h5netcdf')

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1538 unlimited_dims=unlimited_dims, 1539 compute=compute, -> 1540 invalid_netcdf=invalid_netcdf, 1541 ) 1542

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1072 # to be parallelized with dask 1073 dump_to_store( -> 1074 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1075 ) 1076 if autoclose:

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1118 variables, attrs = encoder(variables, attrs) 1119 -> 1120 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1121 1122

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 301 self.set_dimensions(variables, unlimited_dims=unlimited_dims) 302 self.set_variables( --> 303 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 304 ) 305

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 339 check = vn in check_encoding_set 340 target, source = self.prepare_variable( --> 341 name, v, check, unlimited_dims=unlimited_dims 342 ) 343

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims) 190 191 attrs = variable.attrs.copy() --> 192 dtype = _get_datatype(variable, raise_on_invalid_encoding=check_encoding) 193 194 fillvalue = attrs.pop("_FillValue", None)

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _get_datatype(var, nc_format, raise_on_invalid_encoding) 118 def _get_datatype(var, nc_format="NETCDF4", raise_on_invalid_encoding=False): 119 if nc_format == "NETCDF4": --> 120 datatype = _nc4_dtype(var) 121 else: 122 if "dtype" in var.encoding:

~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _nc4_dtype(var) 141 dtype = var.dtype 142 else: --> 143 raise ValueError("unsupported dtype for netCDF4 variable: {}".format(var.dtype)) 144 return dtype 145

ValueError: unsupported dtype for netCDF4 variable: object

``` I believe the issue is in:

``` xarray/conventions.py

def _infer_dtype(array, name=None): """Given an object array with no missing values, infer its dtype from its first element """ if array.dtype.kind != "O": raise TypeError("infer_type must be called on a dtype=object array")

if array.size == 0:
    return np.dtype(float)

element = array[(0,) * array.ndim]
if isinstance(element, (bytes, str)):
    return strings.create_vlen_dtype(type(element))

dtype = np.array(element).dtype
if dtype.kind != "O":
    return dtype

raise ValueError(
    "unable to infer dtype on variable {!r}; xarray "
    "cannot serialize arbitrary Python objects".format(name)
)

``` Because we have a np.str_ which fails the comparison to (bytes, str). It seems that perhaps the failure is in letting coordinates have a np.str_ type in the first place without being coerced to str.

I have not contributed to the xarray library yet, but use it frequently and really love it. If, with a bit of guidance, this is something you think I could help fix, I am happy to give it a go.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Volatile error: `unsupported dtype for netCDF4 variable: object` 357729048
419392403 https://github.com/pydata/xarray/issues/2404#issuecomment-419392403 https://api.github.com/repos/pydata/xarray/issues/2404 MDEyOklzc3VlQ29tbWVudDQxOTM5MjQwMw== brynpickering 17178478 2018-09-07T10:12:09Z 2018-09-07T10:12:09Z NONE

I'll work on an MWE, but need to strip a bunch of data from it before I can share it. We encode using {'zlib': True, 'complevel': 4} for every data variable in the Dataset. Removing that doesn't change the error occurrence.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Volatile error: `unsupported dtype for netCDF4 variable: object` 357729048
419206989 https://github.com/pydata/xarray/issues/2404#issuecomment-419206989 https://api.github.com/repos/pydata/xarray/issues/2404 MDEyOklzc3VlQ29tbWVudDQxOTIwNjk4OQ== shoyer 1217238 2018-09-06T19:04:46Z 2018-09-06T19:04:46Z MEMBER

Unfortunately, I can't seem to reproduce your example: ```python import xarray import numpy as np

name = 'loc_techs_export' data = np.array(['foo::bar'], dtype=object) data_array = xarray.DataArray(data, dims=[name], name=name, coords={name: data}) print(data_array) # looks identical to your example data_array.to_netcdf('test.nc', engine='netcdf4') # works ```

Can you share how you save the file to disk? A reproducible example would help greatly here.

Is there anything in the encoding attribute for this variable, or in the encoding keyword argument to to_netcdf?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Volatile error: `unsupported dtype for netCDF4 variable: object` 357729048

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.732ms · About: xarray-datasette