html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2404#issuecomment-569344638,https://api.github.com/repos/pydata/xarray/issues/2404,569344638,MDEyOklzc3VlQ29tbWVudDU2OTM0NDYzOA==,7776656,2019-12-27T20:58:53Z,2019-12-27T21:01:21Z,NONE,"hi,
I have hit this (or a similar issue) in my xarray work, and I think I was able to find at least one MWE. In this case coordinates of a dataset have been built with underlying type numpy.str_. In my example below, I construct a pandas DataFrame, then convert it to a Dataset. The dtypes of the offending coordinate is displayed as object, but on writing to netcdf, I get a ValueError.
```
import pandas as pd
import numpy as np
import xarray as xr # xr.__version --> 0.13.0
import os
import itertools
# make a multi index (where one level is np.str_ type)
x = list(np.array([np.str_('idx_%i') %i for i in range(1,11)], dtype=np.str_))
y = list(np.arange(10))
combo = list(itertools.product(x, y))
x,y = zip(*combo)
# the below is an odd way to construct a DataFrame, but the np.str_ type is preserved if done this way
data_df = np.random.randn(len(x))
df = pd.DataFrame(data=data_df, columns=['test'])
df['x'] = x
df['y'] = y
df = df.set_index(['x', 'y'])
ds = xr.Dataset.from_dataframe(df)
```
You can see from the below that the underlying type is np.str_
```type(ds.coords['x'].values[0])```
Writing to netcdf gives the below error:
```
ds.to_netcdf('/tmp/netcdf_repro.cdf', engine='h5netcdf')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 ds.to_netcdf('/tmp/netcdf_repro.cdf', engine='h5netcdf')
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
1538 unlimited_dims=unlimited_dims,
1539 compute=compute,
-> 1540 invalid_netcdf=invalid_netcdf,
1541 )
1542
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
1072 # to be parallelized with dask
1073 dump_to_store(
-> 1074 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
1075 )
1076 if autoclose:
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
1118 variables, attrs = encoder(variables, attrs)
1119
-> 1120 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
1121
1122
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
301 self.set_dimensions(variables, unlimited_dims=unlimited_dims)
302 self.set_variables(
--> 303 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
304 )
305
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/common.py in set_variables(self, variables, check_encoding_set, writer, unlimited_dims)
339 check = vn in check_encoding_set
340 target, source = self.prepare_variable(
--> 341 name, v, check, unlimited_dims=unlimited_dims
342 )
343
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/h5netcdf_.py in prepare_variable(self, name, variable, check_encoding, unlimited_dims)
190
191 attrs = variable.attrs.copy()
--> 192 dtype = _get_datatype(variable, raise_on_invalid_encoding=check_encoding)
193
194 fillvalue = attrs.pop(""_FillValue"", None)
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _get_datatype(var, nc_format, raise_on_invalid_encoding)
118 def _get_datatype(var, nc_format=""NETCDF4"", raise_on_invalid_encoding=False):
119 if nc_format == ""NETCDF4"":
--> 120 datatype = _nc4_dtype(var)
121 else:
122 if ""dtype"" in var.encoding:
~/.conda/envs/apac/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _nc4_dtype(var)
141 dtype = var.dtype
142 else:
--> 143 raise ValueError(""unsupported dtype for netCDF4 variable: {}"".format(var.dtype))
144 return dtype
145
ValueError: unsupported dtype for netCDF4 variable: object
```
I believe the issue is in:
```
xarray/conventions.py
def _infer_dtype(array, name=None):
""""""Given an object array with no missing values, infer its dtype from its
first element
""""""
if array.dtype.kind != ""O"":
raise TypeError(""infer_type must be called on a dtype=object array"")
if array.size == 0:
return np.dtype(float)
element = array[(0,) * array.ndim]
if isinstance(element, (bytes, str)):
return strings.create_vlen_dtype(type(element))
dtype = np.array(element).dtype
if dtype.kind != ""O"":
return dtype
raise ValueError(
""unable to infer dtype on variable {!r}; xarray ""
""cannot serialize arbitrary Python objects"".format(name)
)
```
Because we have a np.str_ which fails the comparison to (bytes, str). It seems that perhaps the failure is in letting coordinates have a np.str_ type in the first place without being coerced to str.
I have not contributed to the xarray library yet, but use it frequently and really love it. If, with a bit of guidance, this is something you think I could help fix, I am happy to give it a go.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357729048
https://github.com/pydata/xarray/issues/2404#issuecomment-419392403,https://api.github.com/repos/pydata/xarray/issues/2404,419392403,MDEyOklzc3VlQ29tbWVudDQxOTM5MjQwMw==,17178478,2018-09-07T10:12:09Z,2018-09-07T10:12:09Z,NONE,"I'll work on an MWE, but need to strip a bunch of data from it before I can share it. We encode using `{'zlib': True, 'complevel': 4}` for every data variable in the Dataset. Removing that doesn't change the error occurrence. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357729048
https://github.com/pydata/xarray/issues/2404#issuecomment-419206989,https://api.github.com/repos/pydata/xarray/issues/2404,419206989,MDEyOklzc3VlQ29tbWVudDQxOTIwNjk4OQ==,1217238,2018-09-06T19:04:46Z,2018-09-06T19:04:46Z,MEMBER,"Unfortunately, I can't seem to reproduce your example:
```python
import xarray
import numpy as np
name = 'loc_techs_export'
data = np.array(['foo::bar'], dtype=object)
data_array = xarray.DataArray(data, dims=[name], name=name, coords={name: data})
print(data_array) # looks identical to your example
data_array.to_netcdf('test.nc', engine='netcdf4') # works
```
Can you share how you save the file to disk? A reproducible example would help greatly here.
Is there anything in the `encoding` attribute for this variable, or in the `encoding` keyword argument to `to_netcdf`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,357729048