issue_comments
7 rows where author_association = "MEMBER", issue = 1632718954 and user = 5821660 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- Saving and loading an array of strings changes datatype to object · 7 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1481120920 | https://github.com/pydata/xarray/issues/7652#issuecomment-1481120920 | https://api.github.com/repos/pydata/xarray/issues/7652 | IC_kwDOAMm_X85YSByY | kmuehlbauer 5821660 | 2023-03-23T12:36:17Z | 2023-03-23T12:36:17Z | MEMBER |
I've added a bit to this over at #7654. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Saving and loading an array of strings changes datatype to object 1632718954 | |
1479080036 | https://github.com/pydata/xarray/issues/7652#issuecomment-1479080036 | https://api.github.com/repos/pydata/xarray/issues/7652 | IC_kwDOAMm_X85YKPhk | kmuehlbauer 5821660 | 2023-03-22T08:12:28Z | 2023-03-22T08:58:47Z | MEMBER | OK, I've finally gotten to the bottom of this, so I'm writing my findings here:
This works with
My suggestion would be, just use |
{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
Saving and loading an array of strings changes datatype to object 1632718954 | |
1479002634 | https://github.com/pydata/xarray/issues/7652#issuecomment-1479002634 | https://api.github.com/repos/pydata/xarray/issues/7652 | IC_kwDOAMm_X85YJ8oK | kmuehlbauer 5821660 | 2023-03-22T06:49:42Z | 2023-03-22T06:49:42Z | MEMBER | Great, much appreciated, thanks! Let's iterate over there then. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Saving and loading an array of strings changes datatype to object 1632718954 | |
1477606313 | https://github.com/pydata/xarray/issues/7652#issuecomment-1477606313 | https://api.github.com/repos/pydata/xarray/issues/7652 | IC_kwDOAMm_X85YEnup | kmuehlbauer 5821660 | 2023-03-21T10:36:43Z | 2023-03-21T13:43:04Z | MEMBER |
@basnijholt I'd appreciate if you could test #7654 for that particular case. Update: added another commit which handles the vlen string case. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Saving and loading an array of strings changes datatype to object 1632718954 | |
1477467808 | https://github.com/pydata/xarray/issues/7652#issuecomment-1477467808 | https://api.github.com/repos/pydata/xarray/issues/7652 | IC_kwDOAMm_X85YEF6g | kmuehlbauer 5821660 | 2023-03-21T08:55:37Z | 2023-03-21T10:09:12Z | MEMBER |
That's an issue with netcdf file format, too, it has no bool-dtype. XRef: https://github.com/pydata/xarray/issues/1500
TypeError Traceback (most recent call last) Cell In[42], line 4 2 with nc.Dataset("test-bool-netcdf4.nc", mode="w") as ds: 3 ds.createDimension("x", size=1) ----> 4 var = ds.createVariable("da", data.dtype.str, dimensions=("x")) 5 var[:] = data File src/netCDF4/_netCDF4.pyx:2945, in netCDF4._netCDF4.Dataset.createVariable() File src/netCDF4/_netCDF4.pyx:4121, in netCDF4._netCDF4.Variable.init() TypeError: illegal primitive data type, must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got bool ``` Update:
Xarray is forwarding the information to the file, by adding a dtype-attribute. It looks like this information is not correctly distributed back to Reason:
While decoding the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Saving and loading an array of strings changes datatype to object 1632718954 | |
1477473412 | https://github.com/pydata/xarray/issues/7652#issuecomment-1477473412 | https://api.github.com/repos/pydata/xarray/issues/7652 | IC_kwDOAMm_X85YEHSE | kmuehlbauer 5821660 | 2023-03-21T08:58:51Z | 2023-03-21T08:58:51Z | MEMBER |
Can't reproduce this one with my environment. See above for details. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Saving and loading an array of strings changes datatype to object 1632718954 | |
1477447875 | https://github.com/pydata/xarray/issues/7652#issuecomment-1477447875 | https://api.github.com/repos/pydata/xarray/issues/7652 | IC_kwDOAMm_X85YEBDD | kmuehlbauer 5821660 | 2023-03-21T08:37:48Z | 2023-03-21T08:37:48Z | MEMBER | @basnijholt For the string issue this is somehwat kind of netcdf/numpy based issue with VLEN types. XRef: https://unidata.github.io/netcdf4-python/#dealing-with-strings
And numpy will create a VLEN string array if no dtype is given, like in your case. At least netCDF4 and h5netcdf backends are consistent in their writing (creating similar hdf5-files) and reading back (object-dtype): plain netCDF4```python import netCDF4 as nc import numpy as np data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) auto = False with nc.Dataset("test-plain-netcdf4.nc", mode="w") as ds: print("Write NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) ds.createDimension("x", size=2) ds.createDimension("y", size=2) var = ds.createVariable("da", data.dtype.str, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with nc.Dataset("test-plain-netcdf4.nc") as ds: print("\nRead NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-netcdf4 { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-netcdf4.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } } } ```plain h5netcdf```python import h5netcdf.legacyapi as h5nc import h5py data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) with h5nc.Dataset("test-plain-h5netcdf.nc", mode="w") as ds: print("Write NC-File") ds.createDimension("x", 2) ds.createDimension("y", 2) dtype = h5py.string_dtype() print("Source dtype:", dtype) var = ds.createVariable("da", dtype, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with h5nc.Dataset("test-plain-h5netcdf.nc") as ds: print("\nRead NC-File") da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Source dtype: object Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-h5netcdf { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-h5netcdf.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } ATTRIBUTE "_Netcdf4Dimid" { DATATYPE H5T_STD_I32LE DATASPACE SCALAR DATA { (0): 0 } } } } ```Both get written out as:
If you use fixed length strings (eg. ```python import xarray as xr Make an xarray with an array of fixed-length stringsdata = np.array([["a", "b"], ["c", "d"]], dtype="|S1") da = xr.DataArray( data=data, dims=["x", "y"], coords={"x": [0, 1], "y": [0, 1]}, ) da.to_netcdf("test.nc", mode='w') Load the xarray back inda_loaded = xr.load_dataarray("test.nc") assert da.dtype == da_loaded.dtype, "Dtypes don't match" ``` Versions``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.46-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.11.0 sphinx: None ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Saving and loading an array of strings changes datatype to object 1632718954 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1