issues: 433703713
This data as json
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
433703713 | MDU6SXNzdWU0MzM3MDM3MTM= | 2899 | Explicit fixed width 'S1' arrays re-encoded creating extra dimension | 5793360 | open | 0 | 2 | 2019-04-16T10:27:57Z | 2019-04-17T16:15:29Z | NONE | ```python from collections import OrderedDict import xarray as xr import numpy as np sensor_string_np = np.zeros([12, 100], dtype='|S1') data_vars = {} data_vars['sensorName'] = xr.DataArray(data=sensor_string_np.copy(), attrs=OrderedDict([('_FillValue', ' '),]), name="sensorName", dims=("sensor", "string")) scanfile = xr.Dataset(data_vars=data_vars) scanfile.sensorName[0, :len("test")] = np.frombuffer("test".encode(), dtype='|S1') scanfile.to_netcdf('test.nc') ```
Problem descriptionI'm not entirely sure if this is a bug or user error. The above code is a minimal example of an issue we've been having in the latest version of xarray (or since about version 11). We are trying to preserve the old fixed width char array style of strings for backwards compatibility purposes. However the above code adds in the extra 'string1' dimension when saving to NetCDF. From what I can understand this is a feature of encoding described at http://xarray.pydata.org/en/stable/io.html#string-encoding. I think xarray is treating each byte of the S1 array as a 'string' which it is then encoding again by splitting each character into byte arrays of one byte each. What I would expect is that since I'm explicitly working with a char arrays rather than strings is for the array to be written to the disk as is. I can work around this by setting the encoding for the variable to be 'str' and removing the _FillValue:
However this seems like a painful work around. Is there another way I should be doing this? Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2899/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
13221727 | issue |