home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "MEMBER", issue = 1632718954 and user = 5821660 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • kmuehlbauer · 7 ✖

issue 1

  • Saving and loading an array of strings changes datatype to object · 7 ✖

author_association 1

  • MEMBER · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1481120920 https://github.com/pydata/xarray/issues/7652#issuecomment-1481120920 https://api.github.com/repos/pydata/xarray/issues/7652 IC_kwDOAMm_X85YSByY kmuehlbauer 5821660 2023-03-23T12:36:17Z 2023-03-23T12:36:17Z MEMBER

@kmuehlbauer this is amazing!

It would be very valuable to add this list of limitations to the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf

I've added a bit to this over at #7654.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving and loading an array of strings changes datatype to object 1632718954
1479080036 https://github.com/pydata/xarray/issues/7652#issuecomment-1479080036 https://api.github.com/repos/pydata/xarray/issues/7652 IC_kwDOAMm_X85YKPhk kmuehlbauer 5821660 2023-03-22T08:12:28Z 2023-03-22T08:58:47Z MEMBER

OK, I've finally gotten to the bottom of this, so I'm writing my findings here:

int64 -> int32

This works with h5netcdf/netcdf4-backends in any case. That's a feature^1 of NETCDF3-format which will be used if scipy is installed and netCDF4 is not installed (in the case of engine=None). It has not notion of int64 so this is silently cast to int32 on write.

<U1 -> object This works with the changes applied in #7654 for h5netcdf/netcdf4-backends in normal cases (writing out as VLEN string). Again, NETCDF3 format does not have a notion of string so all strings have to be converted to an internal NC_CHAR representation. I'm not sure it will do any good to try to make this one work.

My suggestion would be, just use NETCDF4-format and one of the capable backends.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
  Saving and loading an array of strings changes datatype to object 1632718954
1479002634 https://github.com/pydata/xarray/issues/7652#issuecomment-1479002634 https://api.github.com/repos/pydata/xarray/issues/7652 IC_kwDOAMm_X85YJ8oK kmuehlbauer 5821660 2023-03-22T06:49:42Z 2023-03-22T06:49:42Z MEMBER

Great, much appreciated, thanks! Let's iterate over there then.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving and loading an array of strings changes datatype to object 1632718954
1477606313 https://github.com/pydata/xarray/issues/7652#issuecomment-1477606313 https://api.github.com/repos/pydata/xarray/issues/7652 IC_kwDOAMm_X85YEnup kmuehlbauer 5821660 2023-03-21T10:36:43Z 2023-03-21T13:43:04Z MEMBER

A similar problem where saving, loading, saving, loading changes the dtype bool -> int8:

@basnijholt I'd appreciate if you could test #7654 for that particular case.

Update: added another commit which handles the vlen string case.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving and loading an array of strings changes datatype to object 1632718954
1477467808 https://github.com/pydata/xarray/issues/7652#issuecomment-1477467808 https://api.github.com/repos/pydata/xarray/issues/7652 IC_kwDOAMm_X85YEF6g kmuehlbauer 5821660 2023-03-21T08:55:37Z 2023-03-21T10:09:12Z MEMBER

A similar problem where saving, loading, saving, loading changes the dtype bool -> int8:

That's an issue with netcdf file format, too, it has no bool-dtype. XRef: https://github.com/pydata/xarray/issues/1500

python data = np.array([True], dtype=bool) with nc.Dataset("test-bool-netcdf4.nc", mode="w") as ds: ds.createDimension("x", size=1) var = ds.createVariable("da", data.dtype.str, dimensions=("x")) var[:] = data ```


TypeError Traceback (most recent call last) Cell In[42], line 4 2 with nc.Dataset("test-bool-netcdf4.nc", mode="w") as ds: 3 ds.createDimension("x", size=1) ----> 4 var = ds.createVariable("da", data.dtype.str, dimensions=("x")) 5 var[:] = data

File src/netCDF4/_netCDF4.pyx:2945, in netCDF4._netCDF4.Dataset.createVariable()

File src/netCDF4/_netCDF4.pyx:4121, in netCDF4._netCDF4.Variable.init()

TypeError: illegal primitive data type, must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got bool ```

Update: Xarray is forwarding the information to the file, by adding a dtype-attribute. It looks like this information is not correctly distributed back to .encoding in the case of saving/loading/saving/loading. I'd consider that one a bug.

Reason: While decoding the .encoding-dtype is set as original_dtype (int8 in our case), but it should either be removed or explicitely set as bool.

https://github.com/pydata/xarray/blob/f1ff956ff67f3c053a2514d93d35929059e17b07/xarray/conventions.py#L400-L404

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving and loading an array of strings changes datatype to object 1632718954
1477473412 https://github.com/pydata/xarray/issues/7652#issuecomment-1477473412 https://api.github.com/repos/pydata/xarray/issues/7652 IC_kwDOAMm_X85YEHSE kmuehlbauer 5821660 2023-03-21T08:58:51Z 2023-03-21T08:58:51Z MEMBER

Another fun one where int64 -> int32:

Can't reproduce this one with my environment. See above for details.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving and loading an array of strings changes datatype to object 1632718954
1477447875 https://github.com/pydata/xarray/issues/7652#issuecomment-1477447875 https://api.github.com/repos/pydata/xarray/issues/7652 IC_kwDOAMm_X85YEBDD kmuehlbauer 5821660 2023-03-21T08:37:48Z 2023-03-21T08:37:48Z MEMBER

@basnijholt For the string issue this is somehwat kind of netcdf/numpy based issue with VLEN types.

XRef: https://unidata.github.io/netcdf4-python/#dealing-with-strings

The most flexible way to store arrays of strings is with the Variable-length (vlen) string data type. However, this requires the use of the NETCDF4 data model, and the vlen type does not map very well numpy arrays (you have to use numpy arrays of dtype=object, which are arrays of arbitrary python objects).

And numpy will create a VLEN string array if no dtype is given, like in your case.

At least netCDF4 and h5netcdf backends are consistent in their writing (creating similar hdf5-files) and reading back (object-dtype):

plain netCDF4 ```python import netCDF4 as nc import numpy as np data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) auto = False with nc.Dataset("test-plain-netcdf4.nc", mode="w") as ds: print("Write NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) ds.createDimension("x", size=2) ds.createDimension("y", size=2) var = ds.createVariable("da", data.dtype.str, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with nc.Dataset("test-plain-netcdf4.nc") as ds: print("\nRead NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-netcdf4 { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-netcdf4.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } } } ```
plain h5netcdf ```python import h5netcdf.legacyapi as h5nc import h5py data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) with h5nc.Dataset("test-plain-h5netcdf.nc", mode="w") as ds: print("Write NC-File") ds.createDimension("x", 2) ds.createDimension("y", 2) dtype = h5py.string_dtype() print("Source dtype:", dtype) var = ds.createVariable("da", dtype, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with h5nc.Dataset("test-plain-h5netcdf.nc") as ds: print("\nRead NC-File") da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Source dtype: object Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-h5netcdf { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-h5netcdf.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } ATTRIBUTE "_Netcdf4Dimid" { DATATYPE H5T_STD_I32LE DATASPACE SCALAR DATA { (0): 0 } } } } ```

Both get written out as:

DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; }

If you use fixed length strings (eg. |S1) the dtype is preserved during roundtrip:

```python import xarray as xr

Make an xarray with an array of fixed-length strings

data = np.array([["a", "b"], ["c", "d"]], dtype="|S1") da = xr.DataArray( data=data, dims=["x", "y"], coords={"x": [0, 1], "y": [0, 1]}, ) da.to_netcdf("test.nc", mode='w')

Load the xarray back in

da_loaded = xr.load_dataarray("test.nc") assert da.dtype == da_loaded.dtype, "Dtypes don't match" ```

Versions ``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 | packaged by conda-forge | (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.46-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.11.0 sphinx: None ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Saving and loading an array of strings changes datatype to object 1632718954

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 29.557ms · About: xarray-datasette