issue_comments
12 rows where author_association = "MEMBER", issue = 1685803922 and user = 5821660 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Fill values in time arrays (numpy.datetime64) are lost in zarr · 12 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
1532441433 | https://github.com/pydata/xarray/issues/7790#issuecomment-1532441433 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bVzNZ | kmuehlbauer 5821660 | 2023-05-03T04:25:50Z | 2023-05-03T04:25:50Z | MEMBER | @christine-e-smit Great this works on you side with the proposed patch in #7098. Nevertheless, we've identified three more issues here in the debugging process which can now be handled one by one. So again, thanks for your contribution here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1531050846 | https://github.com/pydata/xarray/issues/7790#issuecomment-1531050846 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bQfte | kmuehlbauer 5821660 | 2023-05-02T08:04:45Z | 2023-05-03T04:20:11Z | MEMBER | As in #7098, citing @dcherian:
There are three more issues revealed here when using datetime64:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1530991257 | https://github.com/pydata/xarray/issues/7790#issuecomment-1530991257 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bQRKZ | kmuehlbauer 5821660 | 2023-05-02T07:09:38Z | 2023-05-02T08:14:36Z | MEMBER | @christine-e-smit I've created an fresh environment with only xarray and zarr and it still works on my machine. I've then followed the Darwin idea and digged up #6191 (I've got those casting warnings from exactly the line you were referring to). Comment https://github.com/pydata/xarray/issues/6191#issuecomment-1209567966 should explain what happens here. tl;dr citing @DocOtak
There is also an open PR #7098. Thanks @christine-e-smit for sticking with me to find the root-cause here by providing detailed information and code examples. :+1: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1530141083 | https://github.com/pydata/xarray/issues/7790#issuecomment-1530141083 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bNBmb | kmuehlbauer 5821660 | 2023-05-01T20:01:50Z | 2023-05-01T20:01:50Z | MEMBER | @christine-e-smit One more idea, you might delete the zarr folder before re-creating (if you are not doing that already). I've removed the complete folder before any new write (by putting eg. It would also be great if you could run the code from https://github.com/pydata/xarray/issues/7790#issuecomment-1529894939 and post the output here, just for the sake of comparison (please delete the zarr-folder before if it exists). Thanks! |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1530131533 | https://github.com/pydata/xarray/issues/7790#issuecomment-1530131533 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bM_RN | kmuehlbauer 5821660 | 2023-05-01T19:53:53Z | 2023-05-01T19:53:53Z | MEMBER | @christine-e-smit I've plugged your code into a fresh notebook, here is my output: ```python xarray created with NaT fill value<xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 xarray created read with NaT fill value<xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {} {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -9223372036854775808, 'units': 'nanoseconds since 1970-01-01', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} ``` The output seems OK on my side. I've no idea why the data isn't correctly decoded as NaT on your side. I've checked that my environment is comparable to yours. The only difference remaining is you are on Darwin arm64 whereas I'm on Linux. ``` INSTALLED VERSIONS commit: None python: 3.11.2 | packaged by conda-forge | (main, Mar 31 2023, 17:51:05) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-144-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.14.0 libnetcdf: None xarray: 2023.4.2 pandas: 2.0.1 numpy: 1.24.3 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.3.2 distributed: 2023.3.2 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.1 pip: 23.0.1 conda: None pytest: 7.2.2 mypy: 0.982 IPython: 8.12.0 sphinx: None ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1530111912 | https://github.com/pydata/xarray/issues/7790#issuecomment-1530111912 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bM6eo | kmuehlbauer 5821660 | 2023-05-01T19:30:22Z | 2023-05-01T19:30:22Z | MEMBER |
Yes, I use NaT because I want to check if the encoder does correctly translate NaT to the provided _FillValue on write. So from your last example I'm assuming you would like to have the int64 representation of NaT as _FillValue, right? I'll try to adapt this, and see what I get |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1529894939 | https://github.com/pydata/xarray/issues/7790#issuecomment-1529894939 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bMFgb | kmuehlbauer 5821660 | 2023-05-01T16:05:19Z | 2023-05-01T16:05:19Z | MEMBER | So, after some debugging I think I've found two issues here with the current code. First, we need to give the fillvalue with a fitting resolution. Second, we have an issue with inferring the units from the data (if not given). Here is some workaround code which (finally, :crossed_fingers:) should at least write and read correct data (added comments below): ```python Create a numpy array of type np.datetime64 with one fill value and one dateFIRST ISSUE WITH _FillValuewe need to provide ns resolution here too, otherwise we get wrong fillvalues (day-reference)time_fill_value = np.datetime64("1900-01-01 00:00:00.00000000", "ns") time = np.array([np.datetime64("NaT", "ns"), '2023-01-02 00:00:00.00000000'], dtype='M8[ns]') Create a dataset with this one arrayxr_time_array = xr.DataArray(data=time,dims=['time'],name='time') xr_ds = xr.Dataset(dict(time=xr_time_array)) print("******") print("Created with fill value 1900-01-01") print(xr_ds["time"]) Save the dataset to zarrlocation_new_fill = "from_xarray_new_fill.zarr" SECOND ISSUE with inferring units from dataWe need to specify "dtype" and "units" which fit our dataNote: as we provide a _FillValue with a reference to unix-epochwe need to provide a fitting units tooencoding = { "time":{"_FillValue":time_fill_value, "dtype":np.int64, "units":"nanoseconds since 1970-01-01"} } xr_ds.to_zarr(location_new_fill, mode="w", encoding=encoding) xr_read = xr.open_zarr(location_new_fill) print("******") print("Read back out of the zarr store with xarray") print(xr_read["time"]) print(xr_read["time"].attrs) print(xr_read["time"].encoding) z_new_fill = zarr.open('from_xarray_new_fill.zarr','r', ) print("******") print("Read back out of the zarr store with zarr") print(z_new_fill["time"]) print(z_new_fill["time"].attrs) print(z_new_fill["time"][:]) ``` ```python Created with fill value 1900-01-01 <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 Read back out of the zarr store with xarray <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {} {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -2208988800000000000, 'units': 'nanoseconds since 1970-01-01', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} Read back out of the zarr store with zarr <zarr.core.Array '/time' (2,) int64 read-only> <zarr.attrs.Attributes object at 0x7f086ab8e710> [-2208988800000000000 1672617600000000000] ``` @christine-e-smit Please let me know, if the above workaround gives you correct results in your workflow. If so, then we can think about how to automatically align fillvalue-resolution with data-resolution and what needs to be done to correctly deduce the units. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1529076482 | https://github.com/pydata/xarray/issues/7790#issuecomment-1529076482 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bI9sC | kmuehlbauer 5821660 | 2023-04-30T16:52:25Z | 2023-04-30T16:52:25Z | MEMBER |
@christine-e-smit Is this just a remnant of copy&paste? The above code writes to Here is my code and output for comparison (using latest zarr/xarray): ```python Create a numpy array of type np.datetime64 with one fill value and one datetime_fill_value = np.datetime64("1900-01-01") time = np.array([np.datetime64("NaT"), '2023-01-02'], dtype='M8[ns]') Create a dataset with this one arrayxr_time_array = xr.DataArray(data=time,dims=['time'],name='time') xr_ds = xr.Dataset(dict(time=xr_time_array)) print("******") print("Created with fill value 1900-01-01") print(xr_ds["time"]) Save the dataset to zarrlocation_new_fill = "from_xarray_new_fill.zarr" encoding = { "time":{"_FillValue":time_fill_value,"dtype":np.int64} } xr_ds.to_zarr(location_new_fill, encoding=encoding) xr_read = xr.open_zarr(location_new_fill) print("******") print("Read back out of the zarr store with xarray") print(xr_read["time"]) print(xr_read["time"].encoding) ``` ```python Created with fill value 1900-01-01 <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 Read back out of the zarr store with xarray <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -25567, 'units': 'days since 2023-01-02 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} ``` This doesn't look correct either. At least the decoded
I totally agree with @christine-e-smit, this is all very confusing. As said at the beginning, I have little knowledge of zarr. I'm currently digging into cf encoding/decoding which made me jump on here. AFAICT, it looks like already the encoding has a problem, at least the data on disk is already not what we expect. It seems that somehow the xarray cf_encoding/decoding is not well aligned with the zarr writing/reading of datetimes. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1527050493 | https://github.com/pydata/xarray/issues/7790#issuecomment-1527050493 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85bBPD9 | kmuehlbauer 5821660 | 2023-04-28T06:21:38Z | 2023-04-28T06:21:38Z | MEMBER | Thanks @dcherian for filling in the details. I've digged up some more related issues: #2265, #3942, #4045 IIUC, #4684 did a great job to iron out much of these issues, but as it looks like only in the case when no In the presence of
One note to this: Xarray is deducing the
@christine-e-smit It would be great if you could confirm that from your side (some sanity check needed on my side). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1525790614 | https://github.com/pydata/xarray/issues/7790#issuecomment-1525790614 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85a8beW | kmuehlbauer 5821660 | 2023-04-27T14:23:16Z | 2023-04-27T14:23:16Z | MEMBER | @christine-e-smit I see, thanks for the details. AFAICT from the code it looks like |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1525524428 | https://github.com/pydata/xarray/issues/7790#issuecomment-1525524428 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85a7afM | kmuehlbauer 5821660 | 2023-04-27T11:26:15Z | 2023-04-27T11:26:15Z | MEMBER | Xref: discussion #7776, which got no attention up to now. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 | |
1525513525 | https://github.com/pydata/xarray/issues/7790#issuecomment-1525513525 | https://api.github.com/repos/pydata/xarray/issues/7790 | IC_kwDOAMm_X85a7X01 | kmuehlbauer 5821660 | 2023-04-27T11:19:24Z | 2023-04-27T11:19:24Z | MEMBER | @christine-e-smit So, I'm no expert for The
No fill value <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -9.223372036854776e+18, 'units': 'days since 2023-01-02 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('float64')} ``` You might also check this without decoding (
No fill value <xarray.DataArray 'time' (time: 2)> array([-9.223372e+18, 0.000000e+00]) Coordinates: * time (time) float64 -9.223e+18 0.0 Attributes: calendar: proleptic_gregorian units: days since 2023-01-02 00:00:00 _FillValue: -9.223372036854776e+18 {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('float64')} ``` Maybe a zarr-expert can chime in here, what's the best practice for time-fill_values. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1