github: issue_comments: 7 rows where author_association = "MEMBER", issue = 1632718954 and user = 5821660 sorted by updated

7 rows where author_association = "MEMBER", issue = 1632718954 and user = 5821660 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1481120920	https://github.com/pydata/xarray/issues/7652#issuecomment-1481120920	https://api.github.com/repos/pydata/xarray/issues/7652	IC_kwDOAMm_X85YSByY	kmuehlbauer 5821660	2023-03-23T12:36:17Z	2023-03-23T12:36:17Z	MEMBER	@kmuehlbauer this is amazing! It would be very valuable to add this list of limitations to the documentation: https://docs.xarray.dev/en/stable/user-guide/io.html#netcdf I've added a bit to this over at #7654.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving and loading an array of strings changes datatype to object 1632718954
1479080036	https://github.com/pydata/xarray/issues/7652#issuecomment-1479080036	https://api.github.com/repos/pydata/xarray/issues/7652	IC_kwDOAMm_X85YKPhk	kmuehlbauer 5821660	2023-03-22T08:12:28Z	2023-03-22T08:58:47Z	MEMBER	OK, I've finally gotten to the bottom of this, so I'm writing my findings here: `int64` -> `int32` This works with `h5netcdf`/`netcdf4`-backends in any case. That's a feature^1 of `NETCDF3`-format which will be used if `scipy` is installed and `netCDF4` is not installed (in the case of engine=None). It has not notion of `int64` so this is silently cast to `int32` on write. `<U1` -> object This works with the changes applied in #7654 for `h5netcdf`/`netcdf4`-backends in normal cases (writing out as VLEN string). Again, `NETCDF3` format does not have a notion of string so all strings have to be converted to an internal NC_CHAR representation. I'm not sure it will do any good to try to make this one work. My suggestion would be, just use `NETCDF4`-format and one of the capable backends.	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 }	Saving and loading an array of strings changes datatype to object 1632718954
1479002634	https://github.com/pydata/xarray/issues/7652#issuecomment-1479002634	https://api.github.com/repos/pydata/xarray/issues/7652	IC_kwDOAMm_X85YJ8oK	kmuehlbauer 5821660	2023-03-22T06:49:42Z	2023-03-22T06:49:42Z	MEMBER	Great, much appreciated, thanks! Let's iterate over there then.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving and loading an array of strings changes datatype to object 1632718954
1477606313	https://github.com/pydata/xarray/issues/7652#issuecomment-1477606313	https://api.github.com/repos/pydata/xarray/issues/7652	IC_kwDOAMm_X85YEnup	kmuehlbauer 5821660	2023-03-21T10:36:43Z	2023-03-21T13:43:04Z	MEMBER	A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`: @basnijholt I'd appreciate if you could test #7654 for that particular case. Update: added another commit which handles the vlen string case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving and loading an array of strings changes datatype to object 1632718954
1477467808	https://github.com/pydata/xarray/issues/7652#issuecomment-1477467808	https://api.github.com/repos/pydata/xarray/issues/7652	IC_kwDOAMm_X85YEF6g	kmuehlbauer 5821660	2023-03-21T08:55:37Z	2023-03-21T10:09:12Z	MEMBER	A similar problem where saving, loading, saving, loading changes the dtype `bool` -> `int8`: That's an issue with netcdf file format, too, it has no bool-dtype. XRef: https://github.com/pydata/xarray/issues/1500 `python data = np.array([True], dtype=bool) with nc.Dataset("test-bool-netcdf4.nc", mode="w") as ds: ds.createDimension("x", size=1) var = ds.createVariable("da", data.dtype.str, dimensions=("x")) var[:] = data` ``` TypeError Traceback (most recent call last) Cell In[42], line 4 2 with nc.Dataset("test-bool-netcdf4.nc", mode="w") as ds: 3 ds.createDimension("x", size=1) ----> 4 var = ds.createVariable("da", data.dtype.str, dimensions=("x")) 5 var[:] = data File src/netCDF4/_netCDF4.pyx:2945, in netCDF4._netCDF4.Dataset.createVariable() File src/netCDF4/_netCDF4.pyx:4121, in netCDF4._netCDF4.Variable.init() TypeError: illegal primitive data type, must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got bool ``` Update: Xarray is forwarding the information to the file, by adding a dtype-attribute. It looks like this information is not correctly distributed back to `.encoding` in the case of saving/loading/saving/loading. I'd consider that one a bug. Reason: While decoding the `.encoding`-dtype is set as `original_dtype` (`int8` in our case), but it should either be removed or explicitely set as `bool`. https://github.com/pydata/xarray/blob/f1ff956ff67f3c053a2514d93d35929059e17b07/xarray/conventions.py#L400-L404	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving and loading an array of strings changes datatype to object 1632718954
1477473412	https://github.com/pydata/xarray/issues/7652#issuecomment-1477473412	https://api.github.com/repos/pydata/xarray/issues/7652	IC_kwDOAMm_X85YEHSE	kmuehlbauer 5821660	2023-03-21T08:58:51Z	2023-03-21T08:58:51Z	MEMBER	Another fun one where `int64` -> `int32`: Can't reproduce this one with my environment. See above for details.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving and loading an array of strings changes datatype to object 1632718954
1477447875	https://github.com/pydata/xarray/issues/7652#issuecomment-1477447875	https://api.github.com/repos/pydata/xarray/issues/7652	IC_kwDOAMm_X85YEBDD	kmuehlbauer 5821660	2023-03-21T08:37:48Z	2023-03-21T08:37:48Z	MEMBER	@basnijholt For the string issue this is somehwat kind of netcdf/numpy based issue with VLEN types. XRef: https://unidata.github.io/netcdf4-python/#dealing-with-strings The most flexible way to store arrays of strings is with the Variable-length (vlen) string data type. However, this requires the use of the NETCDF4 data model, and the vlen type does not map very well numpy arrays (you have to use numpy arrays of dtype=object, which are arrays of arbitrary python objects). And numpy will create a VLEN string array if no dtype is given, like in your case. At least netCDF4 and h5netcdf backends are consistent in their writing (creating similar hdf5-files) and reading back (object-dtype): plain netCDF4 ```python import netCDF4 as nc import numpy as np data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) auto = False with nc.Dataset("test-plain-netcdf4.nc", mode="w") as ds: print("Write NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) ds.createDimension("x", size=2) ds.createDimension("y", size=2) var = ds.createVariable("da", data.dtype.str, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with nc.Dataset("test-plain-netcdf4.nc") as ds: print("\nRead NC-File") ds.set_auto_maskandscale(auto) ds.set_auto_chartostring(auto) da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <class 'netCDF4._netCDF4.Variable'> vlen da(x, y) vlen data type: <class 'str'> unlimited dimensions: current shape = (2, 2) <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-netcdf4 { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-netcdf4.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } } } ``` plain h5netcdf ```python import h5netcdf.legacyapi as h5nc import h5py data = np.array([["a", "b"], ["c", "d"]], dtype="<U1") print(f"source dtype: {data.dtype.str}\n", ) with h5nc.Dataset("test-plain-h5netcdf.nc", mode="w") as ds: print("Write NC-File") ds.createDimension("x", 2) ds.createDimension("y", 2) dtype = h5py.string_dtype() print("Source dtype:", dtype) var = ds.createVariable("da", dtype, dimensions=("x", "y")) var[:] = data print("Variable\n") print(var) print(var.dtype) print("\nContents\n") print(var[:]) print(var[:].dtype) with h5nc.Dataset("test-plain-h5netcdf.nc") as ds: print("\nRead NC-File") da = ds["da"] print("Variable\n") print(da) print(da.dtype) da = ds["da"][:] print("\nContents\n") print(da) print(da.dtype) ``` ```python source dtype: <U1 Write NC-File Source dtype: object Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object Read NC-File Variable <h5netcdf.legacyapi.Variable '/da': dimensions ('x', 'y'), shape (2, 2), dtype <class 'str'>> Attributes: <class 'str'> Contents [['a' 'b'] ['c' 'd']] object ``` ```bash netcdf test-plain-h5netcdf { dimensions: x = 2 ; y = 2 ; variables: string da(x, y) ; data: da = "a", "b", "c", "d" ; } HDF5 "test-plain-h5netcdf.nc" { DATASET "da" { DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; } DATASPACE SIMPLE { ( 2, 2 ) / ( 2, 2 ) } DATA { (0,0): "a", "b", (1,0): "c", "d" } ATTRIBUTE "DIMENSION_LIST" { DATATYPE H5T_VLEN { H5T_REFERENCE { H5T_STD_REF_OBJECT }} DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): (), () } } ATTRIBUTE "_Netcdf4Coordinates" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2 ) / ( 2 ) } DATA { (0): 0, 1 } } ATTRIBUTE "_Netcdf4Dimid" { DATATYPE H5T_STD_I32LE DATASPACE SCALAR DATA { (0): 0 } } } } ``` Both get written out as: `DATATYPE H5T_STRING { STRSIZE H5T_VARIABLE; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_UTF8; CTYPE H5T_C_S1; }` If you use fixed length strings (eg. `\|S1`) the dtype is preserved during roundtrip: ```python import xarray as xr Make an xarray with an array of fixed-length strings data = np.array([["a", "b"], ["c", "d"]], dtype="\|S1") da = xr.DataArray( data=data, dims=["x", "y"], coords={"x": [0, 1], "y": [0, 1]}, ) da.to_netcdf("test.nc", mode='w') Load the xarray back in da_loaded = xr.load_dataarray("test.nc") assert da.dtype == da_loaded.dtype, "Dtypes don't match" ``` Versions ``` INSTALLED VERSIONS ------------------ commit: None python: 3.11.0 \| packaged by conda-forge \| (main, Jan 14 2023, 12:27:40) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.14.21-150400.24.46-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.2.0 pandas: 1.5.3 numpy: 1.24.2 scipy: 1.10.1 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: None cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.3.6 cfgrib: None iris: None bottleneck: None dask: 2023.3.1 distributed: 2023.3.1 matplotlib: 3.7.1 cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.0 pip: 23.0.1 conda: None pytest: None mypy: None IPython: 8.11.0 sphinx: None ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Saving and loading an array of strings changes datatype to object 1632718954

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

7 rows where author_association = "MEMBER", issue = 1632718954 and user = 5821660 sorted by updated_at descending

Make an xarray with an array of fixed-length strings

Load the xarray back in

Advanced export