github: issue_comments: 457 rows where author_association = "MEMBER" and user = 5821660 sorted by updated

457 rows where author_association = "MEMBER" and user = 5821660 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1578775636	https://github.com/pydata/xarray/pull/7862#issuecomment-1578775636	https://api.github.com/repos/pydata/xarray/issues/7862	IC_kwDOAMm_X85eGjRU	kmuehlbauer 5821660	2023-06-06T13:30:15Z	2023-06-06T13:30:15Z	MEMBER	Might be worth an issue over at numpy with the example from the test. numpy/numpy#23886 The issue is already resolved over at numpy which is really great! It was also marked as backport. @headtr1ck How are these issues resolved currently or how do we track removing the ignore?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF encoding should preserve vlen dtype for empty arrays 1720045908
1578248748	https://github.com/pydata/xarray/pull/7862#issuecomment-1578248748	https://api.github.com/repos/pydata/xarray/issues/7862	IC_kwDOAMm_X85eEios	kmuehlbauer 5821660	2023-06-06T09:04:39Z	2023-06-06T09:04:39Z	MEMBER	Might be worth an issue over at numpy with the example from the test. https://github.com/numpy/numpy/issues/23886	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF encoding should preserve vlen dtype for empty arrays 1720045908
1576080083	https://github.com/pydata/xarray/issues/7866#issuecomment-1576080083	https://api.github.com/repos/pydata/xarray/issues/7866	IC_kwDOAMm_X85d8RLT	kmuehlbauer 5821660	2023-06-05T05:45:30Z	2023-06-05T05:45:30Z	MEMBER	@vrishk Sorry for the delay here and thanks for bringing this to attention. We now have at least two requests which might move this forward (moving `ensure_dtype_not_object` into the backends). But this would need some discussion first, how to do this.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Enable object_codec in zarr backend 1720924071
1576074048	https://github.com/pydata/xarray/issues/7892#issuecomment-1576074048	https://api.github.com/repos/pydata/xarray/issues/7892	IC_kwDOAMm_X85d8PtA	kmuehlbauer 5821660	2023-06-05T05:37:32Z	2023-06-05T05:37:32Z	MEMBER	@mktippett Thanks for raising this. The issue should be cleared after #7888 is merged.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	GRIB Data Example is broken 1740685974
1572021301	https://github.com/pydata/xarray/pull/7862#issuecomment-1572021301	https://api.github.com/repos/pydata/xarray/issues/7862	IC_kwDOAMm_X85dsyQ1	kmuehlbauer 5821660	2023-06-01T13:06:32Z	2023-06-01T13:06:32Z	MEMBER	@tomwhite I've added tests to check the backend code for vlen string dtype metadadata. Also had to add specific check for the h5py vlen string metadata. I think we've covered everything for the proposed change to allow empty vlen strings dtype metadata. I'm looking at the mypy error and do not have the slightest clue what and where to change. Any help appreciated.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF encoding should preserve vlen dtype for empty arrays 1720045908
1561584592	https://github.com/pydata/xarray/issues/7868#issuecomment-1561584592	https://api.github.com/repos/pydata/xarray/issues/7868	IC_kwDOAMm_X85dE-PQ	kmuehlbauer 5821660	2023-05-24T16:50:34Z	2023-05-24T16:50:34Z	MEMBER	Thanks @ghiggi for your comment. The problem is we have at least two contradicting user requests here, see #7328 and #7862. I'm sure there is a solution to accommodate both sides.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1561285499	https://github.com/pydata/xarray/pull/7862#issuecomment-1561285499	https://api.github.com/repos/pydata/xarray/issues/7862	IC_kwDOAMm_X85dD1N7	kmuehlbauer 5821660	2023-05-24T14:37:58Z	2023-05-24T14:37:58Z	MEMBER	Thanks for trying. I can't think of any downsides for the netcdf4-fix, as it just adds the needed metadata to the object-dtype. But you never know, so it would be good to get another set of eyes on it. So it looks like the changes here with the fix in my branch will get your issue resolved @tomwhite, right? I'm a bit worried, that this might break other users workflows, if they depend on the current conversion to floating point for some reason. Also other backends might rely on this feature. Especially because this has been there since the early days when xarray was known as xray. @dcherian What would be the way to go here? There is also a somehow contradicting issue in #7868.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF encoding should preserve vlen dtype for empty arrays 1720045908
1561214028	https://github.com/pydata/xarray/issues/7868#issuecomment-1561214028	https://api.github.com/repos/pydata/xarray/issues/7868	IC_kwDOAMm_X85dDjxM	kmuehlbauer 5821660	2023-05-24T13:58:16Z	2023-05-24T13:58:16Z	MEMBER	My main question here is, why is dask not trying to retrieve the object types from dtype.metadata? Or does it and fail for some reason?.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1561195832	https://github.com/pydata/xarray/pull/7862#issuecomment-1561195832	https://api.github.com/repos/pydata/xarray/issues/7862	IC_kwDOAMm_X85dDfU4	kmuehlbauer 5821660	2023-05-24T13:52:04Z	2023-05-24T13:52:04Z	MEMBER	@tomwhite I've put a commit with changes to zarr/netcdf4-backends which should preserve the dtype metadata here: https://github.com/kmuehlbauer/xarray/tree/preserve-vlen-string-dtype. I'm not really sure if that is the right location, but as it was already present that location at netcdf4-backend I think it will do.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF encoding should preserve vlen dtype for empty arrays 1720045908
1561162311	https://github.com/pydata/xarray/pull/7862#issuecomment-1561162311	https://api.github.com/repos/pydata/xarray/issues/7862	IC_kwDOAMm_X85dDXJH	kmuehlbauer 5821660	2023-05-24T13:32:26Z	2023-05-24T13:32:57Z	MEMBER	@tomwhite Special casing on netcdf4 backend should be possible, too. But it might need fixing at zarr backend, too: `python ds = xr.Dataset({"a": np.array([], dtype=xr.coding.strings.create_vlen_dtype(str))}) print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}") ds.to_zarr("a.zarr") print("\n### Loading ###") with xr.open_dataset("a.zarr", engine="zarr") as ds: print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}")` ```python dtype: object metadata: {'element_type': <class 'str'>} Loading dtype: object metadata: None ``` Could you verify the above example, please? I'm relatively new to `zarr` :grimacing:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF encoding should preserve vlen dtype for empty arrays 1720045908
1560674198	https://github.com/pydata/xarray/issues/7868#issuecomment-1560674198	https://api.github.com/repos/pydata/xarray/issues/7868	IC_kwDOAMm_X85dBf-W	kmuehlbauer 5821660	2023-05-24T08:27:11Z	2023-05-24T08:27:11Z	MEMBER	@ghiggi Glad it works, but we still have to check if that is the correct location for the fix, as it's not CF specific.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1560559426	https://github.com/pydata/xarray/pull/7862#issuecomment-1560559426	https://api.github.com/repos/pydata/xarray/issues/7862	IC_kwDOAMm_X85dBD9C	kmuehlbauer 5821660	2023-05-24T07:01:44Z	2023-05-24T07:01:44Z	MEMBER	Thanks @tomwhite for the PR. I've only quickly checked the approach, which looks reasonable. But those changes have implications on several locations of the backend code, which we would have to sort out. Considering this example: ```python import numpy as np import xarray as xr print(f"creating dataset with empty string array") print("-----------------------------------------") dtype = xr.coding.strings.create_vlen_dtype(str) ds = xr.Dataset({"a": np.array([], dtype=dtype)}) print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}") ds.to_netcdf("a.nc", engine="netcdf4") print("\nncdump") print("-------") !ncdump a.nc engines = ["netcdf4", "h5netcdf"] for engine in engines: with xr.open_dataset("a.nc", engine=engine) as ds: print(f"\nloading with {engine}") print("-------------------") print(f"dtype: {ds['a'].dtype}") print(f"metadata: {ds['a'].dtype.metadata}") ``` ```python creating dataset with empty string array dtype: object metadata: {'element_type': <class 'str'>} ncdump netcdf a { dimensions: a = UNLIMITED ; // (0 currently) variables: string a(a) ; data: } loading with netcdf4 dtype: object metadata: None loading with h5netcdf dtype: object metadata: {'vlen': <class 'str'>} ``` Engine `netcdf4` does not roundtrip here, losing the dtype metadata information. There is special casing for h5netcdf backend, though. The source is actually located in `open_store_variable` of `netcdf4` backend, when the underlying data is converted to `Variable` (which does some object dtype twiddling). Unfortunately I do not have an immediate solution here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	CF encoding should preserve vlen dtype for empty arrays 1720045908
1560534067	https://github.com/pydata/xarray/issues/7328#issuecomment-1560534067	https://api.github.com/repos/pydata/xarray/issues/7328	IC_kwDOAMm_X85dA9wz	kmuehlbauer 5821660	2023-05-24T06:37:39Z	2023-05-24T06:37:39Z	MEMBER	@tomwhite Sorry for the delay here. I'll respond shortly on your PR #7862, but we might have to reiterate here later	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Zarr store array dtype changes for empty object string 1466586967
1559959581	https://github.com/pydata/xarray/issues/7868#issuecomment-1559959581	https://api.github.com/repos/pydata/xarray/issues/7868	IC_kwDOAMm_X85c-xgd	kmuehlbauer 5821660	2023-05-23T18:42:55Z	2023-05-23T19:01:00Z	MEMBER	@ghiggi Thanks for getting this back into action. I got dragged away from the one string object issue in #7654. I'll split this out and add a PR.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1559973194	https://github.com/pydata/xarray/issues/7868#issuecomment-1559973194	https://api.github.com/repos/pydata/xarray/issues/7868	IC_kwDOAMm_X85c-01K	kmuehlbauer 5821660	2023-05-23T18:55:46Z	2023-05-23T18:55:46Z	MEMBER	@ghiggi I'd appreciate if you could test your workflows against #7869. Your example and the one over in #7652 are working AFAICT.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`open_dataset` with `chunks="auto"` fails when a netCDF4 variables/coordinates is encoded as `NC_STRING` 1722417436
1556891860	https://github.com/pydata/xarray/pull/7827#issuecomment-1556891860	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85czEjU	kmuehlbauer 5821660	2023-05-22T09:40:04Z	2023-05-22T09:40:04Z	MEMBER	The example below is only based on Variable and the cf encode/decode variable functions. ```python import xarray as xr import numpy as np create DataArray times = [np.datetime64("2000-01-01", "ns"), np.datetime64("NaT")] da = xr.DataArray(times, dims=["time"], name="foo") da.encoding["dtype"] = np.float64 da.encoding["_FillValue"] = 20.0 extract Variable source_var = da.variable print("---------- source_var ------------------") print(source_var) print(source_var.encoding) encode Variable encoded_var = xr.conventions.encode_cf_variable(source_var) print("\n---------- encoded_var ------------------") print(encoded_var) decode Variable decoded_var = xr.conventions.decode_cf_variable("foo", encoded_var) print("\n---------- decoded_var ------------------") print(decoded_var.load()) ``` ```python /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:618: RuntimeWarning: invalid value encountered in cast int_num = np.asarray(num, dtype=np.int64) /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype( /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype( ---------- source_var ------------------ <xarray.Variable (time: 2)> array(['2000-01-01T00:00:00.000000000', 'NaT'], dtype='datetime64[ns]') {'dtype': <class 'numpy.float64'>, '_FillValue': 20.0} dtype num float64 ---------- encoded_var ------------------ <xarray.Variable (time: 2)> array([ 0., 20.]) Attributes: units: days since 2000-01-01 00:00:00 calendar: proleptic_gregorian _FillValue: 20.0 ---------- decoded_var ------------------ <xarray.Variable (time: 2)> array(['2000-01-01T00:00:00.000000000', 'NaT'], dtype='datetime64[ns]') {'_FillValue': 20.0, 'units': 'days since 2000-01-01 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('float64')} ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1556869361	https://github.com/pydata/xarray/pull/7827#issuecomment-1556869361	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85cy_Dx	kmuehlbauer 5821660	2023-05-22T09:24:47Z	2023-05-22T09:24:47Z	MEMBER	@spencerkclark With current master I get the following `RuntimeWarning` running your code example: on encoding (calling `to_netcdf()`): `python /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:618: RuntimeWarning: invalid value encountered in cast int_num = np.asarray(num, dtype=np.int64)` on decoding (calling `open_dataset()`): `python /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype( /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(` The latter was discussed in #7098 (casting float64 to int64), the former was aimed to be resolved with this PR. I'll try to create a test case using `Variable` and the respective encoding/decoding functions without involving IO (per your suggestion @spencerkclark).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1554532844	https://github.com/pydata/xarray/pull/7827#issuecomment-1554532844	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85cqEns	kmuehlbauer 5821660	2023-05-19T12:57:31Z	2023-05-19T12:57:31Z	MEMBER	Thanks @spencerkclark for taking the time. NaN has been written to disk (as you assumed). Let's have another try next week.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1545446155	https://github.com/pydata/xarray/pull/7788#issuecomment-1545446155	https://api.github.com/repos/pydata/xarray/issues/7788	IC_kwDOAMm_X85cHaML	kmuehlbauer 5821660	2023-05-12T09:23:13Z	2023-05-12T09:23:13Z	MEMBER	@maxhollmann I'm sorry, I'm still finding my way into Xarray. I've taken a closer look at #2377, especially https://github.com/pydata/xarray/issues/2377#issuecomment-415074188. There @shoyer suggested to just use: `python data = duck_array_ops.where_method(data, ~mask, fill_value)` instead of `python data[mask] = fill_value` I've checked and it works nicely with your test. That way we would get away without the flags test and the special handling will take place in duck_array_ops. Would be great if someone can double check.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix as_compatible_data for read-only np.ma.MaskedArray 1685422501
1545408039	https://github.com/pydata/xarray/issues/4220#issuecomment-1545408039	https://api.github.com/repos/pydata/xarray/issues/4220	IC_kwDOAMm_X85cHQ4n	kmuehlbauer 5821660	2023-05-12T08:55:09Z	2023-05-12T08:55:09Z	MEMBER	`combine_first` uses `fillna` under the hood -> #3570	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	combine_first of Datasets changes dtype of variable present only in one Dataset 656089264
1545346823	https://github.com/pydata/xarray/issues/5706#issuecomment-1545346823	https://api.github.com/repos/pydata/xarray/issues/5706	IC_kwDOAMm_X85cHB8H	kmuehlbauer 5821660	2023-05-12T08:06:06Z	2023-05-12T08:06:06Z	MEMBER	This is resolved in recent `netcdf-c`/`netcdf4-python` and works with recent Xarray.	{ "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Loading datasets of numpy string arrays leads to error and/or segfault 970619131
1545337724	https://github.com/pydata/xarray/pull/7788#issuecomment-1545337724	https://api.github.com/repos/pydata/xarray/issues/7788	IC_kwDOAMm_X85cG_t8	kmuehlbauer 5821660	2023-05-12T07:59:19Z	2023-05-12T07:59:19Z	MEMBER	@maxhollmann We might get at least some more views on this. There have been discussions on handling masked arrays and we should make sure this is exactly the solution we want to have. @dcherian This changes `as_compatible_data`. Could you please have another look here? I'm a bit unclear about the implications.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix as_compatible_data for read-only np.ma.MaskedArray 1685422501
1543526954	https://github.com/pydata/xarray/pull/7834#issuecomment-1543526954	https://api.github.com/repos/pydata/xarray/issues/7834	IC_kwDOAMm_X85cAFoq	kmuehlbauer 5821660	2023-05-11T08:03:01Z	2023-05-11T08:03:01Z	MEMBER	@mx-moth Yes, this casting should be fixed. I'm adding a bit of context here, as this might need to be solved in combination with #7098 and #7827. #7098 removes undefined casting for decoding. In #7827 there are efforts to do this for encoding, too. As `cast_to_int_if_safe` is called for encoding as well as decoding I'm not sure if all cases have been catched by these two PR. One issue on decoding is that at least for datetime64 based times the calculated `time_deltas` are currently converted to float64 in the presence of `NaT` (although `NaT` can perfectly be expressed as int64). It would be great if you could try your PR on top of #7827 (which includes #7098) to see if that fixes the errors in this PR.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Use `numpy.can_cast` instead of casting and checking 1705163672
1543285629	https://github.com/pydata/xarray/issues/7833#issuecomment-1543285629	https://api.github.com/repos/pydata/xarray/issues/7833	IC_kwDOAMm_X85b_Kt9	kmuehlbauer 5821660	2023-05-11T03:39:29Z	2023-05-11T03:39:29Z	MEMBER	@alimanfoo The slow code stems from my changes in #7400. Obviously the performance drop did not manifest in the tests/ benchmarks. In #7824 @Illviljan is tackling concat performance.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Slow performance of concat() 1704950804
1542767369	https://github.com/pydata/xarray/pull/7827#issuecomment-1542767369	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85b9MMJ	kmuehlbauer 5821660	2023-05-10T20:27:08Z	2023-05-10T20:27:08Z	MEMBER	@dcherian You were right from the beginning, changing order for decoding and handling `_FillValue` in `CFDatetimeCoder` seems to be one working solution with minimal code changes. If the CI is happy I'll add tests to cover for the nanosecond issues in #7817.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1541410601	https://github.com/pydata/xarray/issues/7831#issuecomment-1541410601	https://api.github.com/repos/pydata/xarray/issues/7831	IC_kwDOAMm_X85b4A8p	kmuehlbauer 5821660	2023-05-10T06:13:20Z	2023-05-10T06:13:39Z	MEMBER	Yet another idea would be to add and `Engines` heading on https://docs.xarray.dev/en/stable/ecosystem.html where engines/backends and there respective packages can be listed. The error could include a link to that page.	{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Can't open datasets with the `rasterio` engine. 1702025553
1540845511	https://github.com/pydata/xarray/issues/7831#issuecomment-1540845511	https://api.github.com/repos/pydata/xarray/issues/7831	IC_kwDOAMm_X85b12_H	kmuehlbauer 5821660	2023-05-09T20:26:32Z	2023-05-09T20:26:32Z	MEMBER	Maybe it would also help to rephrase the error, something along the lines "Engine `rasterio` is not available. Please install the needed package. Engines [xxx, yyy, zzz] are available."	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Can't open datasets with the `rasterio` engine. 1702025553
1539356386	https://github.com/pydata/xarray/pull/7827#issuecomment-1539356386	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85bwLbi	kmuehlbauer 5821660	2023-05-09T03:51:39Z	2023-05-09T03:51:39Z	MEMBER	Thanks for the heads-up, @spencerkclark. No worries, I need to apply some changes anyway as it turns out.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1538998850	https://github.com/pydata/xarray/pull/7827#issuecomment-1538998850	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85bu0JC	kmuehlbauer 5821660	2023-05-08T20:22:28Z	2023-05-08T20:22:28Z	MEMBER	All tests have passed. Rebased now on latest main. The issue described in #7817 is resolved. Ready for first reviews.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1538966366	https://github.com/pydata/xarray/pull/7827#issuecomment-1538966366	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85busNe	kmuehlbauer 5821660	2023-05-08T20:01:17Z	2023-05-08T20:01:17Z	MEMBER	I've reset the order of coders to the initial behaviour. Instead the times are special cased in the CFMaskCoder. Locally it works, but I'll only trust the CI.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1538819904	https://github.com/pydata/xarray/pull/7771#issuecomment-1538819904	https://api.github.com/repos/pydata/xarray/issues/7771	IC_kwDOAMm_X85buIdA	kmuehlbauer 5821660	2023-05-08T18:11:00Z	2023-05-08T18:11:00Z	MEMBER	Setting status back to draft for now, still evaluating solutions for the CF encoding/decoding.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	implement scale_factor/add_offset CF conformance test, add and align tests 1676309093
1538818465	https://github.com/pydata/xarray/pull/7654#issuecomment-1538818465	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85buIGh	kmuehlbauer 5821660	2023-05-08T18:09:59Z	2023-05-08T18:09:59Z	MEMBER	I've converted to draft for now, as I'm still evaluating solutions for the CF encoding/decoding.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1538364933	https://github.com/pydata/xarray/pull/7827#issuecomment-1538364933	https://api.github.com/repos/pydata/xarray/issues/7827	IC_kwDOAMm_X85bsZYF	kmuehlbauer 5821660	2023-05-08T13:29:07Z	2023-05-08T13:29:07Z	MEMBER	@spencerkclark I'd appreciate if you could have a look here. All but one test pass, but I can't immediately see what that test is doing. Looks like mismatched dtypes on the attributes. If you have any suggestions how to possibly improve, please let me know. I've not added tests here, yet.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Preserve nanosecond resolution when encoding/decoding times 1700227455
1538354499	https://github.com/pydata/xarray/issues/7817#issuecomment-1538354499	https://api.github.com/repos/pydata/xarray/issues/7817	IC_kwDOAMm_X85bsW1D	kmuehlbauer 5821660	2023-05-08T13:22:22Z	2023-05-08T13:22:52Z	MEMBER	@dcherian Yes, I've setup a prototype in #7827. But the overall solution doesn't look that nice. The handling of fill_value has still to be done in CFMaskCoder. Also #7098 is needed for this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	nanosecond precision lost when reading time data 1696097756
1535941525	https://github.com/pydata/xarray/issues/7816#issuecomment-1535941525	https://api.github.com/repos/pydata/xarray/issues/7816	IC_kwDOAMm_X85bjJuV	kmuehlbauer 5821660	2023-05-05T08:55:42Z	2023-05-05T08:55:42Z	MEMBER	@gauteh No worries, glad it works now!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Backend registration does not match docs, and is no longer specifiable in maturin pyproject toml 1695809136
1535776861	https://github.com/pydata/xarray/issues/7814#issuecomment-1535776861	https://api.github.com/repos/pydata/xarray/issues/7814	IC_kwDOAMm_X85bihhd	kmuehlbauer 5821660	2023-05-05T06:31:20Z	2023-05-05T06:31:20Z	MEMBER	@paul0207 Thanks for providing the datafiles. I can't reproduce on my machine. Please provide more information, the output of `xr.show_versions()` would help and a complete traceback of the error you are experiencing. A complete list of installed Python Packages would be nice (eg. by `pip list`), too. Another couple of questions to get some more insight: Does this happen only with these special files, or do you experience this every time? Does the problem persists when specifying `engine="netcdf4"` or `engine="h5netcdf"` in the call to `open_mfdataset`? Does this also happen, if you open the files one-by-one (with `xr.open_dataset`) and combine the Datasets with `xr.concat`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	TypeError: 'NoneType' object is not callable when joining netCDF files. Works when ran interactively. 1695028906
1535724636	https://github.com/pydata/xarray/issues/7816#issuecomment-1535724636	https://api.github.com/repos/pydata/xarray/issues/7816	IC_kwDOAMm_X85biUxc	kmuehlbauer 5821660	2023-05-05T05:46:46Z	2023-05-05T05:46:46Z	MEMBER	@gauteh Yes, please provide as much information as possible. It is also of interest, how you installed the package and what Python environment you are using (eg. system python, conda, venv etc.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Backend registration does not match docs, and is no longer specifiable in maturin pyproject toml 1695809136
1535596259	https://github.com/pydata/xarray/issues/7816#issuecomment-1535596259	https://api.github.com/repos/pydata/xarray/issues/7816	IC_kwDOAMm_X85bh1bj	kmuehlbauer 5821660	2023-05-05T01:46:12Z	2023-05-05T01:46:12Z	MEMBER	@gauteh You would probably have to delete this line: https://github.com/gauteh/hidefix/blob/main/python/hidefix/xarray.py#L192 As @headtr1ck already explained, it is all handled via plugin system to be able to handle duplicate engine names on discovery by the python metadata.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Backend registration does not match docs, and is no longer specifiable in maturin pyproject toml 1695809136
1534855008	https://github.com/pydata/xarray/issues/7817#issuecomment-1534855008	https://api.github.com/repos/pydata/xarray/issues/7817	IC_kwDOAMm_X85bfAdg	kmuehlbauer 5821660	2023-05-04T14:11:26Z	2023-05-04T14:11:26Z	MEMBER	cc @spencerkclark @DocOtak I've tried to at least find one example which incarnates as bug. Nevertheless the transformation from int to float in CFMaskCoder should be avoided. We might think about special casing time data in CFMaskCoder, or handle masking of time data in CFDatetimeCoder/CFTimedeltaCoder.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	nanosecond precision lost when reading time data 1696097756
1532441433	https://github.com/pydata/xarray/issues/7790#issuecomment-1532441433	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bVzNZ	kmuehlbauer 5821660	2023-05-03T04:25:50Z	2023-05-03T04:25:50Z	MEMBER	@christine-e-smit Great this works on you side with the proposed patch in #7098. Nevertheless, we've identified three more issues here in the debugging process which can now be handled one by one. So again, thanks for your contribution here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1531050846	https://github.com/pydata/xarray/issues/7790#issuecomment-1531050846	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bQfte	kmuehlbauer 5821660	2023-05-02T08:04:45Z	2023-05-03T04:20:11Z	MEMBER	As in #7098, citing @dcherian: I think the real solution here is to explicitly handle NaNs during the decoding step. We do want these to be NaT in the output. There are three more issues revealed here when using datetime64: if _FillValue is set in encoding, it has to be of same type/resolution as the times in the array If _FillValue is provided, we need to provide `dtype` and `units` to which fit our data, eg. if the _FillValue is referenced to unix-epoch the unit's should be equivalent when encoding in the presence of NaT the data array is converted to floating point with NaN, which is problematic for the subsequent conversion to int64	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1531496369	https://github.com/pydata/xarray/issues/5490#issuecomment-1531496369	https://api.github.com/repos/pydata/xarray/issues/5490	IC_kwDOAMm_X85bSMex	kmuehlbauer 5821660	2023-05-02T13:38:49Z	2023-05-02T13:38:49Z	MEMBER	This is indeed an issue with `scale_factor` and `add_offset` as @d70-t has already mentioned. That is not a problem per se, but those attributes are obviously different for different files. When concatenating only the first files's attributes survive. That might already be the source of the above problem, as it might slightly change values. An even bigger problem is, when the dynamic range of the decoded data (min/max) doesn't overlap. Then the data might be folded from the lower border to the upper border or vica versa. I've put an example into #5739. The suggestion for now is as @keewis comment to drop encoding in such cases and use floating point values for writing. You might use the available compression options for floating point data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Nan/ changed values in output when only reading data, saving and reading again 924676925
1531465011	https://github.com/pydata/xarray/issues/5490#issuecomment-1531465011	https://api.github.com/repos/pydata/xarray/issues/5490	IC_kwDOAMm_X85bSE0z	kmuehlbauer 5821660	2023-05-02T13:20:46Z	2023-05-02T13:20:46Z	MEMBER	Xref: #5739	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Nan/ changed values in output when only reading data, saving and reading again 924676925
1530991257	https://github.com/pydata/xarray/issues/7790#issuecomment-1530991257	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bQRKZ	kmuehlbauer 5821660	2023-05-02T07:09:38Z	2023-05-02T08:14:36Z	MEMBER	@christine-e-smit I've created an fresh environment with only xarray and zarr and it still works on my machine. I've then followed the Darwin idea and digged up #6191 (I've got those casting warnings from exactly the line you were referring to). Comment https://github.com/pydata/xarray/issues/6191#issuecomment-1209567966 should explain what happens here. tl;dr citing @DocOtak The short explanation is that the time conversion functions do an `astype(np.int64)` or equivalent cast on arrays that contain nans. This is undefined behavior and very soon, doing this will start to emit RuntimeWarnings. There is also an open PR #7098. Thanks @christine-e-smit for sticking with me to find the root-cause here by providing detailed information and code examples. :+1:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1530141083	https://github.com/pydata/xarray/issues/7790#issuecomment-1530141083	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bNBmb	kmuehlbauer 5821660	2023-05-01T20:01:50Z	2023-05-01T20:01:50Z	MEMBER	@christine-e-smit One more idea, you might delete the zarr folder before re-creating (if you are not doing that already). I've removed the complete folder before any new write (by putting eg. `!rm -rf xarray_and_units.zarr` at the beginning of the notebook-cell). It would also be great if you could run the code from https://github.com/pydata/xarray/issues/7790#issuecomment-1529894939 and post the output here, just for the sake of comparison (please delete the zarr-folder before if it exists). Thanks!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1530131533	https://github.com/pydata/xarray/issues/7790#issuecomment-1530131533	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bM_RN	kmuehlbauer 5821660	2023-05-01T19:53:53Z	2023-05-01T19:53:53Z	MEMBER	@christine-e-smit I've plugged your code into a fresh notebook, here is my output: ```python xarray created with NaT fill value <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 xarray created read with NaT fill value <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {} {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -9223372036854775808, 'units': 'nanoseconds since 1970-01-01', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} ``` The output seems OK on my side. I've no idea why the data isn't correctly decoded as NaT on your side. I've checked that my environment is comparable to yours. The only difference remaining is you are on Darwin arm64 whereas I'm on Linux. ``` INSTALLED VERSIONS commit: None python: 3.11.2 \| packaged by conda-forge \| (main, Mar 31 2023, 17:51:05) [GCC 11.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-144-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: de_DE.UTF-8 LOCALE: ('de_DE', 'UTF-8') libhdf5: 1.14.0 libnetcdf: None xarray: 2023.4.2 pandas: 2.0.1 numpy: 1.24.3 scipy: 1.10.1 netCDF4: None pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: None nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: None dask: 2023.3.2 distributed: 2023.3.2 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2023.3.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 67.6.1 pip: 23.0.1 conda: None pytest: 7.2.2 mypy: 0.982 IPython: 8.12.0 sphinx: None ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1530111912	https://github.com/pydata/xarray/issues/7790#issuecomment-1530111912	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bM6eo	kmuehlbauer 5821660	2023-05-01T19:30:22Z	2023-05-01T19:30:22Z	MEMBER	Unfortunately, I think you may have also gotten some wires crossed? You set the time fill value to 1900-01-01, but then use NaT in the actual array? Yes, I use NaT because I want to check if the encoder does correctly translate NaT to the provided _FillValue on write. So from your last example I'm assuming you would like to have the int64 representation of NaT as _FillValue, right? I'll try to adapt this, and see what I get	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1529894939	https://github.com/pydata/xarray/issues/7790#issuecomment-1529894939	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bMFgb	kmuehlbauer 5821660	2023-05-01T16:05:19Z	2023-05-01T16:05:19Z	MEMBER	So, after some debugging I think I've found two issues here with the current code. First, we need to give the fillvalue with a fitting resolution. Second, we have an issue with inferring the units from the data (if not given). Here is some workaround code which (finally, :crossed_fingers:) should at least write and read correct data (added comments below): ```python Create a numpy array of type np.datetime64 with one fill value and one date FIRST ISSUE WITH _FillValue we need to provide ns resolution here too, otherwise we get wrong fillvalues (day-reference) time_fill_value = np.datetime64("1900-01-01 00:00:00.00000000", "ns") time = np.array([np.datetime64("NaT", "ns"), '2023-01-02 00:00:00.00000000'], dtype='M8[ns]') Create a dataset with this one array xr_time_array = xr.DataArray(data=time,dims=['time'],name='time') xr_ds = xr.Dataset(dict(time=xr_time_array)) print("****") print("Created with fill value 1900-01-01") print(xr_ds["time"]) Save the dataset to zarr location_new_fill = "from_xarray_new_fill.zarr" SECOND ISSUE with inferring units from data We need to specify "dtype" and "units" which fit our data Note: as we provide a _FillValue with a reference to unix-epoch we need to provide a fitting units too encoding = { "time":{"_FillValue":time_fill_value, "dtype":np.int64, "units":"nanoseconds since 1970-01-01"} } xr_ds.to_zarr(location_new_fill, mode="w", encoding=encoding) xr_read = xr.open_zarr(location_new_fill) print("**") print("Read back out of the zarr store with xarray") print(xr_read["time"]) print(xr_read["time"].attrs) print(xr_read["time"].encoding) z_new_fill = zarr.open('from_xarray_new_fill.zarr','r', ) print("***") print("Read back out of the zarr store with zarr") print(z_new_fill["time"]) print(z_new_fill["time"].attrs) print(z_new_fill["time"][:]) ``` ```python Created with fill value 1900-01-01 <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: time (time) datetime64[ns] NaT 2023-01-02 Read back out of the zarr store with xarray <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {} {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -2208988800000000000, 'units': 'nanoseconds since 1970-01-01', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} Read back out of the zarr store with zarr <zarr.core.Array '/time' (2,) int64 read-only> <zarr.attrs.Attributes object at 0x7f086ab8e710> [-2208988800000000000 1672617600000000000] ``` @christine-e-smit Please let me know, if the above workaround gives you correct results in your workflow. If so, then we can think about how to automatically align fillvalue-resolution with data-resolution and what needs to be done to correctly deduce the units.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1529076482	https://github.com/pydata/xarray/issues/7790#issuecomment-1529076482	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bI9sC	kmuehlbauer 5821660	2023-04-30T16:52:25Z	2023-04-30T16:52:25Z	MEMBER	```python xr_ds.to_zarr(location_new_fill,encoding=encoding) xr_read = xr.open_zarr(location) print("****") print("Read back out of the zarr store with xarray") print(xr_read["time"]) print(xr_read["time"].encoding) ``` @christine-e-smit Is this just a remnant of copy&paste? The above code writes to `location_new_fill`, but reads from `location`. Here is my code and output for comparison (using latest zarr/xarray): ```python Create a numpy array of type np.datetime64 with one fill value and one date time_fill_value = np.datetime64("1900-01-01") time = np.array([np.datetime64("NaT"), '2023-01-02'], dtype='M8[ns]') Create a dataset with this one array xr_time_array = xr.DataArray(data=time,dims=['time'],name='time') xr_ds = xr.Dataset(dict(time=xr_time_array)) print("**") print("Created with fill value 1900-01-01") print(xr_ds["time"]) Save the dataset to zarr location_new_fill = "from_xarray_new_fill.zarr" encoding = { "time":{"_FillValue":time_fill_value,"dtype":np.int64} } xr_ds.to_zarr(location_new_fill, encoding=encoding) xr_read = xr.open_zarr(location_new_fill) print("***") print("Read back out of the zarr store with xarray") print(xr_read["time"]) print(xr_read["time"].encoding) ``` ```python Created with fill value 1900-01-01 <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: time (time) datetime64[ns] NaT 2023-01-02 Read back out of the zarr store with xarray <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] NaT 2023-01-02 {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -25567, 'units': 'days since 2023-01-02 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('int64')} ``` This doesn't look correct either. At least the decoded `_FillValue` or the `units` are wrong. So -25567 is 1900-01-01 when referenced to of unix-epoch (Question: Is zarr time based on unix epoch?). When read back via zarr only this would decode into: `python <xarray.DataArray 'time' (time: 2)> array(['1953-01-02T00:00:00.000000000', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]')` I totally agree with @christine-e-smit, this is all very confusing. As said at the beginning, I have little knowledge of zarr. I'm currently digging into cf encoding/decoding which made me jump on here. AFAICT, it looks like already the encoding has a problem, at least the data on disk is already not what we expect. It seems that somehow the xarray cf_encoding/decoding is not well aligned with the zarr writing/reading of datetimes.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1527527029	https://github.com/pydata/xarray/issues/2478#issuecomment-1527527029	https://api.github.com/repos/pydata/xarray/issues/2478	IC_kwDOAMm_X85bDDZ1	kmuehlbauer 5821660	2023-04-28T12:59:04Z	2023-04-28T15:46:09Z	MEMBER	@sbiner Sorry for the massive delay here. It doesn't have changed much since creation of your issue. Xarray doesn't take the netcdf default fill values into account (there are reasons, which @shoyer has explained in https://github.com/pydata/xarray/pull/5680#issuecomment-895455163 and https://github.com/pydata/xarray/pull/5680#issuecomment-895508489). On write it just uses `NaN` as `_FillValue` (in case no specific `encoding` is given). Xref: #2374, #7723, #5680	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	masked_array write/read differences between xarray and netCDF4 368833116
1527605739	https://github.com/pydata/xarray/issues/7713#issuecomment-1527605739	https://api.github.com/repos/pydata/xarray/issues/7713	IC_kwDOAMm_X85bDWnr	kmuehlbauer 5821660	2023-04-28T13:55:17Z	2023-04-28T13:55:17Z	MEMBER	The code is there since #867 by @shoyer which was committed almost 7 years ago. I've no idea what's the purpose for packing tuples into 0d arrays but as there are also tests for it in the above PR I'm assuming there is one real reason. Maybe @shoyer can chime in here to shed some light?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`Variable/IndexVariable` do not accept a tuple for data. 1652227927
1527544656	https://github.com/pydata/xarray/issues/7647#issuecomment-1527544656	https://api.github.com/repos/pydata/xarray/issues/7647	IC_kwDOAMm_X85bDHtQ	kmuehlbauer 5821660	2023-04-28T13:12:08Z	2023-04-28T13:12:08Z	MEMBER	@wangshuaicumt Did you get along with this issue? If this is still unresolved it would be great if you could provide the data or a MCVE.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	merge 1631491844
1527541305	https://github.com/pydata/xarray/issues/7630#issuecomment-1527541305	https://api.github.com/repos/pydata/xarray/issues/7630	IC_kwDOAMm_X85bDG45	kmuehlbauer 5821660	2023-04-28T13:09:22Z	2023-04-28T13:09:22Z	MEMBER	@AlxndrLhr I suppose your original issue is resolved. Please reopen or create a new issue if you still have problems with this.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	.loc[] cannot find a value that .sel() can find without problem 1624560934
1527537064	https://github.com/pydata/xarray/issues/6429#issuecomment-1527537064	https://api.github.com/repos/pydata/xarray/issues/6429	IC_kwDOAMm_X85bDF2o	kmuehlbauer 5821660	2023-04-28T13:06:14Z	2023-04-28T13:06:14Z	MEMBER	It looks like this is no issue any more with recent versions of the stack. At least I can't reproduce this. @mjwillson Please reopen, if you still encounter problems while plotting.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	FacetGrid padding goes very bad when cartopy projection specified 1188262115
1527498384	https://github.com/pydata/xarray/issues/7092#issuecomment-1527498384	https://api.github.com/repos/pydata/xarray/issues/7092	IC_kwDOAMm_X85bC8aQ	kmuehlbauer 5821660	2023-04-28T12:34:03Z	2023-04-28T12:34:03Z	MEMBER	@leicunxing-rs Sorry for the delay here. Your issue might be connected with concatenation/merge of several files containing packed data with different `scale_factor`/`add_offset`. See issue #5739 for more details (there they also merge different ERA5 datasets, hence the idea).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Save an nc file and open it again, the content of the data inside has changed 1387341095
1527461082	https://github.com/pydata/xarray/issues/5739#issuecomment-1527461082	https://api.github.com/repos/pydata/xarray/issues/5739	IC_kwDOAMm_X85bCzTa	kmuehlbauer 5821660	2023-04-28T12:00:15Z	2023-04-28T12:00:15Z	MEMBER	@dougrichardson Sorry for the delay. If you are still interested in the source of this issue here is what I found: The root cause is different `scale_factor` and `add_offset` in the source files. When merging only the `.encoding` of the first dataset survives. This leads to wrongly encoded file for the may-dates. But why is this so? The issue is with the packed dtype ("int16") and the particular values of `scale_factor`/`add_offset`. For feb the dynamic range is (228.96394336525748, 309.9690856933594) K whereas for may it is (205.7644192729947, 311.7797088623047) K. Now we can clearly see that all our values which are above 309.969 K will be folded to the lower end (>229 K). To circumvent that you have at least two options: change `scale_factor` and `add_offset` values in the variables `.encoding` before writing to appropriate values which cover your whole dynamic range drop `scale_factor`/`add_offset` (and other CF related attributes) from .encoding to write floating point values It might be nice to have checks for that in the encoding steps, to prevent writing erroneous values. So this is not really a bug, but might be less impactful when encoding is dropped on operations (see discussion in #6323).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing and reopening introduces bad values 979916914
1527376059	https://github.com/pydata/xarray/issues/5170#issuecomment-1527376059	https://api.github.com/repos/pydata/xarray/issues/5170	IC_kwDOAMm_X85bCei7	kmuehlbauer 5821660	2023-04-28T10:47:38Z	2023-04-28T10:47:38Z	MEMBER	@floriankrb Sorry for the long delay. If you are still interested in the source of the issue, here is what I found: By default Xarray will promote any data variable which shares it's name with a dimension to a coordinate. That accounts for ['number', 'time', 'step', 'heightAboveGround', 'latitude', 'longitude']. `valid_time` is a two dimensional coordinate (by CF standard) and is a coordinate here because `t2m` data variable has a corresponding `coordinates`-attribute containing `valid_time`. In the decoding-step `valid_time` gets added to the `.coords`. The attribute is removed from `t2m`'s attrs and kept in `t2m.encoding`. So far so good. By renaming `number` to `n` that coordinates attribute (in encoding) does not change as well. So when the data is written, `t2m` will still hold `number` in it's `coordinates`-attribute (on disk). The issue manifests on subsequent read as now the decoding-step tries to align the found `coordinates` with the available data variables. As `number` is not available, no coordinate from that string will be taken into account as coordinate (note the `all` on line 444): https://github.com/pydata/xarray/blob/0f4e99d036b0d6d76a3271e6191eacbc9922662f/xarray/conventions.py#L439-L447 This can easily be observed by looking into `t2m.attrs` where the `coordinates` remains instead of being preserved in `.encoding`. So the source of all problems here is that the renaming `number` -> `n` was missed for `coordinates`-attribute of `t2m`'s `.encoding`.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf is not idempotent when stacking rename and set_coords 859772411
1527234694	https://github.com/pydata/xarray/issues/2192#issuecomment-1527234694	https://api.github.com/repos/pydata/xarray/issues/2192	IC_kwDOAMm_X85bB8CG	kmuehlbauer 5821660	2023-04-28T09:06:22Z	2023-04-28T09:06:22Z	MEMBER	Can't reproduce with recent xarray/matplotlib/cartopy. Looks like this has been resolved. python import xarray as xr import cartopy.crs as ccrs ds = xr.tutorial.load_dataset('air_temperature') ds = ds.sel(lon = slice(250, 300)) air = ds['air'] transform = ccrs.PlateCarree() projection = ccrs.Mercator(air.lon.values.mean(), air.lat.values.min(), air.lat.values.max()) p = air.isel(time=[0,1]).plot(transform = transform, aspect = ds.dims['lon']/ds.dims['lat'], col = 'time', col_wrap = 1, subplot_kws = {'projection': projection}) for ax in p.axs.flat: ax.set_extent((air.lon.values.min(), air.lon.values.max(), air.lat.values.min(), air.lat.values.max()), crs = transform) ax.set_aspect('equal', 'box') Please reopen, if this is still an issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Subplots overlap each other using plot() and cartopy 327101646
1527050493	https://github.com/pydata/xarray/issues/7790#issuecomment-1527050493	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85bBPD9	kmuehlbauer 5821660	2023-04-28T06:21:38Z	2023-04-28T06:21:38Z	MEMBER	Thanks @dcherian for filling in the details. I've digged up some more related issues: #2265, #3942, #4045 IIUC, #4684 did a great job to iron out much of these issues, but as it looks like only in the case when no `NaT` is within the time array (cc @spencerkclark). @christine-e-smit If you have no `NaT` in your time array then you can just omit `encoding` completely and Xarray will use int64 per default and your data should be fine on disk. In the presence of `NaT` it looks like one workaround to circumvent that issue for the time being is to add the `dtype` in addition to `_FillValue` when writing out to zarr : `python encoding = { "time":{"_FillValue": time_fill_value, "dtype": np.int64} xr_ds.to_zarr(location, encoding=encoding) }` One note to this: Xarray is deducing the `units` from the current time data. So for the above example it will result in `'days since 2023-01-02 00:00:00'` where `days` would now be the resolution in the file. If you want the resolution to be nanoseconds on disk `units` would need to be added to the encoding. `python encoding = { "time":{"_FillValue": time_fill_value, "dtype": np.int64, 'units': 'nanoseconds since 2023-01-02'} } xr_ds.to_zarr(location, encoding=encoding)` @christine-e-smit It would be great if you could confirm that from your side (some sanity check needed on my side).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1525790614	https://github.com/pydata/xarray/issues/7790#issuecomment-1525790614	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85a8beW	kmuehlbauer 5821660	2023-04-27T14:23:16Z	2023-04-27T14:23:16Z	MEMBER	@christine-e-smit I see, thanks for the details. AFAICT from the code it looks like `zarr` is special-cased in some ways compared to other backends. I'd really rely on some zarr-expert shedding light here and over at #7776.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1525780533	https://github.com/pydata/xarray/issues/7713#issuecomment-1525780533	https://api.github.com/repos/pydata/xarray/issues/7713	IC_kwDOAMm_X85a8ZA1	kmuehlbauer 5821660	2023-04-27T14:17:26Z	2023-04-27T14:17:26Z	MEMBER	@zoj613 Thanks for raising this. The root-cause is that the tuple is returned from `as_compatible_data` as single element array: `python import xarray as xr print(xr.core.variable.as_compatible_data((2, 3, 4)))` `python array((2, 3, 4), dtype=object)` This then breaks with the error you are seeing. I'm not quite sure if this is a bug in the code, a bug in the doc or no bug at all. But as a tuple is easily wrapped by `np.array` there should be a reason why Xarray is currently not able to digest tuples.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`Variable/IndexVariable` do not accept a tuple for data. 1652227927
1525705799	https://github.com/pydata/xarray/issues/7782#issuecomment-1525705799	https://api.github.com/repos/pydata/xarray/issues/7782	IC_kwDOAMm_X85a8GxH	kmuehlbauer 5821660	2023-04-27T13:33:50Z	2023-04-27T13:33:50Z	MEMBER	As we can see from the above output, in netCDF4-python scaling is adapting the dtype to unsigned, not masking. This is also reflected in the docs unidata.github.io/netcdf4-python/#Variable. Do we know why this is so? TL;DR: NETCDF3 detail to allow (signal) unsigned integer, still used in recent formats more discussion details on this over at https://github.com/Unidata/netcdf4-python/issues/656 at NetCDF Users Guide on packed data: A conventional way to indicate whether a byte, short, or int variable is meant to be interpreted as unsigned, even for the netCDF-3 classic model that has no external unsigned integer type, is by providing the special variable attribute _Unsigned with value "true". However, most existing data for which packed values are intended to be interpreted as unsigned are stored without this attribute, so readers must be aware of packing assumptions in this case. In the enhanced netCDF-4 data model, packed integers may be declared to be of the appropriate unsigned type. My suggestion would be to nudge the user by issuing warnings and link to new to be added documentation on the topic. This could be in line with the cf-coding conformance checks which have been discussed yesterday in the dev-meeting.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.open_dataset() reading ubyte variables as float32 from DAP server 1681353195
1525524428	https://github.com/pydata/xarray/issues/7790#issuecomment-1525524428	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85a7afM	kmuehlbauer 5821660	2023-04-27T11:26:15Z	2023-04-27T11:26:15Z	MEMBER	Xref: discussion #7776, which got no attention up to now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1525513525	https://github.com/pydata/xarray/issues/7790#issuecomment-1525513525	https://api.github.com/repos/pydata/xarray/issues/7790	IC_kwDOAMm_X85a7X01	kmuehlbauer 5821660	2023-04-27T11:19:24Z	2023-04-27T11:19:24Z	MEMBER	@christine-e-smit So, I'm no expert for `zarr`, but it turns out that your `NaT` was converted to `-9.223372036854776e+18` in the encoding step. I'm assuming that `zarr` is converting `NaT` as the format doesn't allow to use `NaT` directly, so it chooses a (default) value. The `_FillValue` is not lost, but it will be preserved in the `.encoding`-dict of the underlying Variable: `python xr_read = xr.open_zarr(location) print("*****************") print("No fill value") print(xr_read["time"]) print(xr_read["time"].encoding)` ```python No fill value <xarray.DataArray 'time' (time: 2)> array([ 'NaT', '2023-01-02T00:00:00.000000000'], dtype='datetime64[ns]') Coordinates: time (time) datetime64[ns] NaT 2023-01-02 {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, '_FillValue': -9.223372036854776e+18, 'units': 'days since 2023-01-02 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('float64')} ``` You might also check this without decoding (`decode_cd=False`): `python with xr.open_zarr(location, decode_cf=False) as xr_read: print("*****************") print("No fill value") print(xr_read["time"]) print(xr_read["time"].encoding)` ```python No fill value <xarray.DataArray 'time' (time: 2)> array([-9.223372e+18, 0.000000e+00]) Coordinates: time (time) float64 -9.223e+18 0.0 Attributes: calendar: proleptic_gregorian units: days since 2023-01-02 00:00:00 _FillValue: -9.223372036854776e+18 {'chunks': (2,), 'preferred_chunks': {'time': 2}, 'compressor': Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0), 'filters': None, 'dtype': dtype('float64')} ``` Maybe a zarr-expert can chime in here, what's the best practice for time-fill_values.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fill values in time arrays (numpy.datetime64) are lost in zarr 1685803922
1524805132	https://github.com/pydata/xarray/pull/7788#issuecomment-1524805132	https://api.github.com/repos/pydata/xarray/issues/7788	IC_kwDOAMm_X85a4q4M	kmuehlbauer 5821660	2023-04-27T06:13:23Z	2023-04-27T07:19:47Z	MEMBER	@maxhollmann I've checked and memory served well, the following issue might be related: #2377. It looks like your use-case is at least connected to @gerritholl's. It would be great if you could add your original use case (as MCVE, if possible) to get more details. A special case (masked integer arrays) is discussed in #3955. As this might give additional information, it might not exactly fit your problem.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix as_compatible_data for read-only np.ma.MaskedArray 1685422501
1523829332	https://github.com/pydata/xarray/pull/7788#issuecomment-1523829332	https://api.github.com/repos/pydata/xarray/issues/7788	IC_kwDOAMm_X85a08pU	kmuehlbauer 5821660	2023-04-26T17:55:13Z	2023-04-26T17:55:13Z	MEMBER	@maxhollmann I'll have a look into this, I think I've seen something like this some time ago. Maybe you can add the tests to the PR or as comment? This might get more attention and will really help to debug.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix as_compatible_data for read-only np.ma.MaskedArray 1685422501
1523786065	https://github.com/pydata/xarray/pull/7788#issuecomment-1523786065	https://api.github.com/repos/pydata/xarray/issues/7788	IC_kwDOAMm_X85a0yFR	kmuehlbauer 5821660	2023-04-26T17:18:44Z	2023-04-26T17:18:44Z	MEMBER	I've marked this by accident, sorry @maxhollmann. Let us know when you feel this is ready	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix as_compatible_data for read-only np.ma.MaskedArray 1685422501
1522997083	https://github.com/pydata/xarray/issues/7782#issuecomment-1522997083	https://api.github.com/repos/pydata/xarray/issues/7782	IC_kwDOAMm_X85axxdb	kmuehlbauer 5821660	2023-04-26T08:28:39Z	2023-04-26T08:28:39Z	MEMBER	This is how netCDF4-python handles this data with different parameters: python import netCDF4 as nc with nc.Dataset("http://dap.ceda.ac.uk/thredds/dodsC/neodc/esacci/snow/data/scfv/MODIS/v2.0/2010/01/20100101-ESACCI-L3C_SNOW-SCFV-MODIS_TERRA-fv2.0.nc") as ds_dap: v = ds_dap["scfv"] print(v) print("\n- default") print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- maskandscale False") ds_dap.set_auto_maskandscale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask/scale False") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask True / scale False") ds_dap.set_auto_mask(True) ds_dap.set_auto_scale(False) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask False / scale True") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- mask True / scale True") ds_dap.set_auto_mask(True) ds_dap.set_auto_scale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") print("\n- maskandscale True") ds_dap.set_auto_mask(False) ds_dap.set_auto_scale(False) ds_dap.set_auto_maskandscale(True) v = ds_dap["scfv"] print(f"variable dtype: {v.dtype}") print(f"first 2 elements: {v[0, 0, :2].dtype} {v[0, 0, :2]}") print(f"last 2 elements: {v[0, 0, -2:].dtype} {v[0, 0, -2:]}") ```python <class 'netCDF4._netCDF4.Variable'> int8 scfv(time, lat, lon) _Unsigned: true _FillValue: -1 standard_name: snow_area_fraction_viewable_from_above long_name: Snow Cover Fraction Viewable units: percent valid_range: [ 0 -2] actual_range: [ 0 100] flag_values: [-51 -50 -46 -41 -4 -3 -2] flag_meanings: Cloud Polar_Night_or_Night Water Permanent_Snow_and_Ice Classification_failed Input_Data_Error No_Satellite_Acquisition missing_value: -1 ancillary_variables: scfv_unc grid_mapping: spatial_ref _ChunkSizes: [ 1 1385 2770] unlimited dimensions: time current shape = (1, 18000, 36000) filling off default variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] maskandscale False variable dtype: int8 first 2 elements: int8 [-41 -41] last 2 elements: int8 [-41 -41] mask/scale False variable dtype: int8 first 2 elements: int8 [-41 -41] last 2 elements: int8 [-41 -41] mask True / scale False variable dtype: int8 first 2 elements: int8 [-- --] last 2 elements: int8 [-- --] mask False / scale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] mask True / scale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] maskandscale True variable dtype: int8 first 2 elements: uint8 [215 215] last 2 elements: uint8 [215 215] ``` First, the dataset was created with `filling off` (read more about that in the netcdf file format specs https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html). This should not be a problem for the analysis, but it tells us that all data points should have been written to somehow. As we can see from the above output, in netCDF4-python `scaling` is adapting the dtype to unsigned, not masking. This is also reflected in the docs https://unidata.github.io/netcdf4-python/#Variable. If Xarray is trying to align with netCDF4-python it should separate `mask` and `scale` as netCDF4-python is doing. It does this already by using different coders but it doesn't separate it API-wise. We would need a similar approach here for Xarray with additional kwargs `scale` and `mask` in addition to `mask_and_scale`. We cannot just move the UnsignedCoder out of mask_and_scale and apply it unconditionally.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.open_dataset() reading ubyte variables as float32 from DAP server 1681353195
1520804745	https://github.com/pydata/xarray/issues/7782#issuecomment-1520804745	https://api.github.com/repos/pydata/xarray/issues/7782	IC_kwDOAMm_X85apaOJ	kmuehlbauer 5821660	2023-04-24T20:47:43Z	2023-04-24T20:47:43Z	MEMBER	@dcherian The main issue here is that we have two different CF things which are applied, Unsigned and _FillValue/missing_value. For netcdf4-python the values would just be masked and the dtype would be preserved. For xarray it will be cast to float32 because of the _FillValue/missing_value. I agree, moving the Unsigned Coder out of mask_and_scale should help in that particular case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.open_dataset() reading ubyte variables as float32 from DAP server 1681353195
1520514792	https://github.com/pydata/xarray/issues/7782#issuecomment-1520514792	https://api.github.com/repos/pydata/xarray/issues/7782	IC_kwDOAMm_X85aoTbo	kmuehlbauer 5821660	2023-04-24T16:52:30Z	2023-04-24T16:52:30Z	MEMBER	@dcherian Yes, that would work. We would want to check the different attributes and apply the coders only as needed. That might need some refactoring. I'm already wrapping my head around this for several weeks now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.open_dataset() reading ubyte variables as float32 from DAP server 1681353195
1520363622	https://github.com/pydata/xarray/issues/7782#issuecomment-1520363622	https://api.github.com/repos/pydata/xarray/issues/7782	IC_kwDOAMm_X85anuhm	kmuehlbauer 5821660	2023-04-24T15:10:24Z	2023-04-24T15:11:00Z	MEMBER	Then you are somewhat deadlocked. `mask_and_scale=False` will also deactivate the Unsigned decoding. You might be able to achieve what want by using `decode_cf=False` (completely deactivate cf decoding). Then you would have to remove _FillValue attribute as well as missing_value attribute from the variables. Finally, you can run `xr.decode_cf(ds)` to correctly decode your data. I'll add a code example tomorrow if no one beats me to it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.open_dataset() reading ubyte variables as float32 from DAP server 1681353195
1520277594	https://github.com/pydata/xarray/issues/7782#issuecomment-1520277594	https://api.github.com/repos/pydata/xarray/issues/7782	IC_kwDOAMm_X85anZha	kmuehlbauer 5821660	2023-04-24T14:31:00Z	2023-04-24T14:31:00Z	MEMBER	@Articoking As both variables have a _FillValue attached xarray converts these values to NaN effectively casting to float32 in this case. You might inspect the `.encoding`-property of the respective variables to get information of the source dtype. You can deactivate the automatic conversion by adding kwarg `mask_and_scale=False`. There is more information in the docs https://docs.xarray.dev/en/stable/user-guide/io.html	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xr.open_dataset() reading ubyte variables as float32 from DAP server 1681353195
1516573065	https://github.com/pydata/xarray/pull/7771#issuecomment-1516573065	https://api.github.com/repos/pydata/xarray/issues/7771	IC_kwDOAMm_X85aZRGJ	kmuehlbauer 5821660	2023-04-20T15:53:58Z	2023-04-20T15:53:58Z	MEMBER	OK it seems this is ready for a first round of reviews. A bit of added context: Currently there is no dedicated function for checking for CF standard conformance. The idea is to read as much as possible also non-standard conforming data files, but restrict writing non-standard conforming files. The implemented function `ensure_scale_offset_conformance` takes a `strict` keyword argument, which is `True` when encoding and `False` when decoding. If `strict=True` it will raise errors if there is a mismatch with the standard and when `strict=False` it will issue warnings. I've only had to adapt a few tests which where not conforming to standard on encoding to align with that. I've observed some warnings in the test suite which we might to have a look into. One idea would be to fix erroneous `scale_factor`/`add_offset` with our best fitting estimate. This is already done for list-type `scale_factor`/`add_offset`. I will follow-up with checks for CFMaskCoder.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	implement scale_factor/add_offset CF conformance test, add and align tests 1676309093
1515146820	https://github.com/pydata/xarray/issues/7770#issuecomment-1515146820	https://api.github.com/repos/pydata/xarray/issues/7770	IC_kwDOAMm_X85aT05E	kmuehlbauer 5821660	2023-04-19T17:59:00Z	2023-04-19T17:59:00Z	MEMBER	It's also possible to use the custom BackendEntrypoint-class directly in the call to `xr.open_dataset` with the `engine` keyword.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Provide a public API for adding new backends 1675299031
1514437541	https://github.com/pydata/xarray/issues/7767#issuecomment-1514437541	https://api.github.com/repos/pydata/xarray/issues/7767	IC_kwDOAMm_X85aRHul	kmuehlbauer 5821660	2023-04-19T09:42:29Z	2023-04-19T09:42:29Z	MEMBER	I think the equivalent incantation would be (note the different order of arguments in `xr.where`): `python da = xr.DataArray(np.arange(10)) print(xr.where(da < 5, da, 0).values) print(da.where(da < 5, 0).values)` `[0 1 2 3 4 0 0 0 0 0] [0 1 2 3 4 0 0 0 0 0]`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Inconsistency between xr.where() and da.where() 1674532233
1501070685	https://github.com/pydata/xarray/issues/7742#issuecomment-1501070685	https://api.github.com/repos/pydata/xarray/issues/7742	IC_kwDOAMm_X85ZeIVd	kmuehlbauer 5821660	2023-04-09T08:03:18Z	2023-04-09T08:03:18Z	MEMBER	@ChristmasZCY Please have a look at the documentation about string encoding https://docs.xarray.dev/en/stable/user-guide/io.html#string-encoding Good chance that this gives you the needed information.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	About save char into netcdf 1659786592
1500333865	https://github.com/pydata/xarray/pull/7720#issuecomment-1500333865	https://api.github.com/repos/pydata/xarray/issues/7720	IC_kwDOAMm_X85ZbUcp	kmuehlbauer 5821660	2023-04-07T14:21:02Z	2023-04-07T14:21:21Z	MEMBER	Rebased on top main after merge of #7719. This is ready for review. It's a one-liner actually :grin:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	preserve boolean dtype in encoding 1655000231
1498799474	https://github.com/pydata/xarray/issues/4826#issuecomment-1498799474	https://api.github.com/repos/pydata/xarray/issues/4826	IC_kwDOAMm_X85ZVd1y	kmuehlbauer 5821660	2023-04-06T09:59:42Z	2023-04-06T09:59:42Z	MEMBER	@JoerivanEngelen Thanks for taking the time. Much appreciated.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Reading and writing a zarr dataset multiple times casts bools to int8 789410367
1498794212	https://github.com/pydata/xarray/pull/7719#issuecomment-1498794212	https://api.github.com/repos/pydata/xarray/issues/7719	IC_kwDOAMm_X85ZVcjk	kmuehlbauer 5821660	2023-04-06T09:55:25Z	2023-04-06T09:55:25Z	MEMBER	This looks like it is ready to go. This will surely help further refactoring `encode_cf_variable`/`decode_cf_variable`. At least while working on it I spotted several locations where inconsistencies can be ironed out. A neat mostly flaw-free encoding/decoding is needed especially with regard to #6323.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Implement more Variable Coders 1654988876
1498647087	https://github.com/pydata/xarray/issues/7723#issuecomment-1498647087	https://api.github.com/repos/pydata/xarray/issues/7723	IC_kwDOAMm_X85ZU4ov	kmuehlbauer 5821660	2023-04-06T08:00:09Z	2023-04-06T08:00:09Z	MEMBER	I'm still convinced this could be fixed for floating point data. Generally its worse if we obey some default fill values but not others, because it becomes quite confusing to a user. I think this depends from which side you look at it :-) My point here is, we do not have to submissively obey to default fill values, but just use them when decoding. This only need to happen if no `_FillValue` is attached to the variable. By doing this we ensure that these missing values are mapped to `np.nan` (as it is expected by users). In further course we can just apply the xarray standard `np.nan` when writing out. We need to document that in that case exact roundtrip isn't possible (it also isn't currently possible, in this example). Consider this example: ```python dtype = "f4" with nc.Dataset("test-fillvalues-01.nc", mode="w") as ds: x = ds.createDimension("x", 10) test_fillval_fillon = ds.createVariable("test_fillval_fillon", dtype, ("x",), fill_value=nc.default_fillvals[dtype]) test_fillval_fillon[:5] = np.array([0.0, nc.default_fillvals[dtype], np.nan, 1.0, 8.0], dtype=dtype) test_nofillval_fillon = ds.createVariable("test_nofillval_fillon", dtype, ("x",), fill_value=None) test_nofillval_fillon[:5] = np.array([0.0, nc.default_fillvals[dtype], np.nan, 1.0, 8.0], dtype=dtype) with nc.Dataset("test-fillvalues-01.nc") as ds: print("\n read with netCDF4-python") print("---------------------------") print(ds["test_fillval_fillon"]) print(ds["test_fillval_fillon"][:]) print(ds["test_nofillval_fillon"]) print(ds["test_nofillval_fillon"][:]) with xr.open_dataset("test-fillvalues-01.nc").load() as ds: print("\n read with xarray") print("---------------------------") print(ds["test_fillval_fillon"]) print(ds["test_fillval_fillon"][:]) print(ds["test_nofillval_fillon"]) print(ds["test_nofillval_fillon"][:]) python read with netCDF4-python <class 'netCDF4._netCDF4.Variable'> float32 test_fillval_fillon(x) _FillValue: 9.96921e+36 unlimited dimensions: current shape = (10,) filling on [0.0 -- nan 1.0 8.0 -- -- -- -- --] <class 'netCDF4._netCDF4.Variable'> float32 test_nofillval_fillon(x) unlimited dimensions: current shape = (10,) filling on, default _FillValue of 9.969209968386869e+36 used [0.0 -- nan 1.0 8.0 -- -- -- -- --] read with xarray-python <xarray.DataArray 'test_fillval_fillon' (x: 10)> array([ 0., nan, nan, 1., 8., nan, nan, nan, nan, nan], dtype=float32) Dimensions without coordinates: x <xarray.DataArray 'test_nofillval_fillon' (x: 10)> array([0.00000e+00, 9.96921e+36, nan, 1.00000e+00, 8.00000e+00, 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36, 9.96921e+36], dtype=float32) Dimensions without coordinates: x ``` The only difference between these two variables is that on the first the `_FillValue` is declared, and on the other the default `_FillValue` is used. So if xarray obeys (by CF standard) the first it should also obey the second. This might just work, if these cases the default fillvalue is used for decoding to `np.nan`, and declared that `np.nan` will be the new `_FillValue`. Does that make sense?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	default fill_value not masked when read from file 1655569401
1498540636	https://github.com/pydata/xarray/pull/7719#issuecomment-1498540636	https://api.github.com/repos/pydata/xarray/issues/7719	IC_kwDOAMm_X85ZUepc	kmuehlbauer 5821660	2023-04-06T06:07:50Z	2023-04-06T06:23:40Z	MEMBER	Now, this is interesting! It looks like those FillValue issues are following me. What did change that this now materializes here, all of a sudden. Update: Small change - big issue. Checked for `fv_exists` instead of `not fv_exists` :grimacing:	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Implement more Variable Coders 1654988876
1498490570	https://github.com/pydata/xarray/issues/7722#issuecomment-1498490570	https://api.github.com/repos/pydata/xarray/issues/7722	IC_kwDOAMm_X85ZUSbK	kmuehlbauer 5821660	2023-04-06T04:55:02Z	2023-04-06T04:55:02Z	MEMBER	The recommendation is to use `_FillValue` if there is only one value describing missing/fillvalue. https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html#missing-data It's also written that `missing_value` is This attribute is not treated in any special way by the library or conforming generic applications, but is often useful documentation and may be used by specific applications. https://docs.unidata.ucar.edu/netcdf-c/current/attribute_conventions.html Not sure, if xarray is a conforming generic application or a specific application.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Conflicting _FillValue and missing_value on write 1655483374
1498464352	https://github.com/pydata/xarray/issues/7723#issuecomment-1498464352	https://api.github.com/repos/pydata/xarray/issues/7723	IC_kwDOAMm_X85ZUMBg	kmuehlbauer 5821660	2023-04-06T04:09:11Z	2023-04-06T04:09:11Z	MEMBER	@dcherian Great, a duplicate. :-( Sorry I must have overlooked that one. It's somewhat counter-intuitive to get differing results when using netcdf4-python and xarray. Would be a good idea to document this behaviour. It looks like it might at least be resolved for floating point source data: Let's take the above simple example. We have np.nan written to the file, but the netcdf representation on disk uses a default (undeclared by attribute) `_FillValue` for unwritten parts. For the netcdf4-python user the np.nan will not be masked, but the unfilled parts will be masked. For xarray the default fillvalue won't be masked, appearing as valid data, which it is not. On subsequent writes np.nan will be introduced as the new fillvalue (by attribute), effectively changing the meaning of the default fillvalues. Wouldn't it make sense then, to transform these default fill values to np.nan on read too, instead of giving the a seemingly meaningful value? Maybe yet another keyword switch, `use_default_fillvalues`? There should be at least a warning on read, in these situations, that there are undefined values in the dataset which were never written and which will not be masked. If the dataset contains unwritten parts, and a default fillvalue is used, in turn meaning the data creator did this by purpose (by not setting a `_FillValue`) it can mean several things: The creators data does actually not have missing values which need declaring, but it means, that his data will get masked for default fillvalue entries (maybe they doesn't know about this, but that might be unlikely). The creator doesn't care at all, with same conclusion as above. The creator purposefully uses default fillvalue as missing value, since they use this as a means of saving disk space. But this could also be done, by just defining that as `_FillValue` attribute at creation time, if I`m not mistaken. I'm still convinced this could be fixed for floating point data.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	default fill_value not masked when read from file 1655569401
1497971459	https://github.com/pydata/xarray/issues/4826#issuecomment-1497971459	https://api.github.com/repos/pydata/xarray/issues/4826	IC_kwDOAMm_X85ZSTsD	kmuehlbauer 5821660	2023-04-05T18:56:23Z	2023-04-05T18:56:23Z	MEMBER	Please check #7720 if that fixes the conversion problems. Thanks.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Reading and writing a zarr dataset multiple times casts bools to int8 789410367
1497866542	https://github.com/pydata/xarray/issues/7573#issuecomment-1497866542	https://api.github.com/repos/pydata/xarray/issues/7573	IC_kwDOAMm_X85ZR6Eu	kmuehlbauer 5821660	2023-04-05T17:31:05Z	2023-04-05T17:31:05Z	MEMBER	If it helps to minimize interoperability issues I'm all in for the change. One thing I would maybe do is wait for the next version. With the current PR we would end up with two different build numbers with differing behaviour, which might confuse folks. But I'd rely on @ocefpaf's expertise.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Add optional min versions to conda-forge recipe (`run_constrained`) 1603957501
1496973403	https://github.com/pydata/xarray/pull/7654#issuecomment-1496973403	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85ZOgBb	kmuehlbauer 5821660	2023-04-05T06:15:58Z	2023-04-05T06:15:58Z	MEMBER	As explained I've created two PR (#7719 and #7720) for the "easy" changes from this PR. Would be great, if those could go in fast. Thanks!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1496950962	https://github.com/pydata/xarray/pull/7654#issuecomment-1496950962	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85ZOaiy	kmuehlbauer 5821660	2023-04-05T05:46:15Z	2023-04-05T05:46:15Z	MEMBER	@dcherian Just a heads-up: I find this PR getting more and more involved at different parts of the machinery and hard to follow for reviewers. I'll split this up and start with the more or less undisputed changes.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1496044623	https://github.com/pydata/xarray/pull/7654#issuecomment-1496044623	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85ZK9RP	kmuehlbauer 5821660	2023-04-04T14:10:33Z	2023-04-04T14:10:33Z	MEMBER	Still hunting for corner cases and issues inside encode_cf_variable/decode_cf_variable. It looks like I already see some light again. Not sure, if this is the last iteration, but the testsuite is still running green with added and enhanced tests, which is not that bad. Unfortunately https://github.com/pydata/xarray/issues/2304 is still an issue for now. I'll clarify that later with an added test.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1493930592	https://github.com/pydata/xarray/pull/7654#issuecomment-1493930592	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85ZC5Jg	kmuehlbauer 5821660	2023-04-03T08:53:17Z	2023-04-03T08:53:17Z	MEMBER	While trying to create a test which specifically tests `_choose_float_dtype` I've found some issues with checking for availability of scale_factor/add_offset. Now testing for `None`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1493296175	https://github.com/pydata/xarray/pull/7654#issuecomment-1493296175	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85ZAeQv	kmuehlbauer 5821660	2023-04-02T10:47:21Z	2023-04-02T10:47:21Z	MEMBER	This is now ready for another round of reviews, @dcherian, @Illviljan and @mankoff. As @mankoff already pointed out, xarray is very generous to try to encode/decode non CF conforming data. This makes things a bit complicated as some issues only surface in rare corner cases. I've tried to be as explicit in `_choose_float_dtype`, also added comments/tests where needed. I'm finding the typing a bit hard. It seems that mypy can't derive the correct types from return types in certain cases.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1493127898	https://github.com/pydata/xarray/pull/7654#issuecomment-1493127898	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85Y_1La	kmuehlbauer 5821660	2023-04-01T21:23:40Z	2023-04-01T21:23:40Z	MEMBER	If at first you don't succeed... It looks like we have something working here. Some more typing and maybe some more tests covering the cases with scale_factor/add_offset/_FillValue non-conforming CF and we should be good to go. Or do I miss something?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1493084805	https://github.com/pydata/xarray/pull/7654#issuecomment-1493084805	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85Y_qqF	kmuehlbauer 5821660	2023-04-01T19:34:18Z	2023-04-01T19:34:18Z	MEMBER	The latest changes brake #1840 again. We have two contradicting forces here, which need to be aligned.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1492937244	https://github.com/pydata/xarray/issues/5597#issuecomment-1492937244	https://api.github.com/repos/pydata/xarray/issues/5597	IC_kwDOAMm_X85Y_Goc	kmuehlbauer 5821660	2023-04-01T11:03:02Z	2023-04-01T11:03:02Z	MEMBER	To fix this, I think logic in `_choose_float_dtype` should be updated to look at `encoding['dtype']` (if available) instead of `dtype`, in order to understand how the data was originally stored. This is aimed at in #7654	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Decoding netCDF is giving incorrect values for a large file 942738904
1492895855	https://github.com/pydata/xarray/pull/7654#issuecomment-1492895855	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85Y-8hv	kmuehlbauer 5821660	2023-04-01T09:48:57Z	2023-04-01T09:48:57Z	MEMBER	@Illviljan I'm not able to figure out the typing if I want to use Data-types as functions to convert python numbers to array scalars. If you have any suggestion how to fix this, please let me know.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1492880874	https://github.com/pydata/xarray/pull/7654#issuecomment-1492880874	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85Y-43q	kmuehlbauer 5821660	2023-04-01T08:46:49Z	2023-04-01T09:28:16Z	MEMBER	@dcherian @Illviljan Thanks for the first round of review. I've rebased everything on latest main. Now the code moving from `conventions.py` to `coding.variable.py` is correct. I've also removed the functions which have been converted to `VariableCoders` and adapted the tests. To sum up this PR, it does: convert functions to `VariableCoders` along @shoyer's TODO: https://github.com/pydata/xarray/blob/1c81162755457b3f4dc1f551f0321c75ec9daf6c/xarray/conventions.py#L298-L302 https://github.com/pydata/xarray/blob/1c81162755457b3f4dc1f551f0321c75ec9daf6c/xarray/conventions.py#L393-L405 preserve boolean dtype within `encoding`: https://github.com/pydata/xarray/issues/7652#issuecomment-1476956975 deterrmine cf packed dtype from `scale_factor`/`add_offset` 7691, #2304	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1492078304	https://github.com/pydata/xarray/issues/7691#issuecomment-1492078304	https://api.github.com/repos/pydata/xarray/issues/7691	IC_kwDOAMm_X85Y707g	kmuehlbauer 5821660	2023-03-31T15:05:17Z	2023-03-31T15:05:17Z	MEMBER	, the PR seems to solve my specific issue without changing the encoding Great, thanks for testing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`nan` values appearing when saving and loading from `netCDF` due to encoding 1643408278
1491915288	https://github.com/pydata/xarray/issues/7691#issuecomment-1491915288	https://api.github.com/repos/pydata/xarray/issues/7691	IC_kwDOAMm_X85Y7NIY	kmuehlbauer 5821660	2023-03-31T13:19:01Z	2023-03-31T13:19:01Z	MEMBER	@euronion There is a potential fix for your issue in #7654. It would be great, if you could have a closer look and test against that PR.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`nan` values appearing when saving and loading from `netCDF` due to encoding 1643408278
1491760266	https://github.com/pydata/xarray/pull/7654#issuecomment-1491760266	https://api.github.com/repos/pydata/xarray/issues/7654	IC_kwDOAMm_X85Y6nSK	kmuehlbauer 5821660	2023-03-31T11:13:49Z	2023-03-31T11:13:49Z	MEMBER	@dcherian @basnijholt After the dev-meeting I've taken a step back and first implemented the coders as mentioned in @shoyer's ToDo. I've fixed the one bool->int issue and it now derives the dtype for ScaleOffset coding from scale_factor add_offset. I've improved some test with regard to the scale/offset issue. I'll concentrate on the string fillvalue issues in a follow up PR.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	cf-coding 1633623916
1486870845	https://github.com/pydata/xarray/issues/7691#issuecomment-1486870845	https://api.github.com/repos/pydata/xarray/issues/7691	IC_kwDOAMm_X85Yn9k9	kmuehlbauer 5821660	2023-03-28T13:16:31Z	2023-03-28T13:31:46Z	MEMBER	MCVE: python fname = "test-7691.nc" import netCDF4 as nc with nc.Dataset(fname, "w") as ds0: ds0.createDimension("t", 5) ds0.createVariable("x", "int16", ("t",), fill_value=-32767) v = ds0.variables["x"] v.set_auto_maskandscale(False) v.add_offset = 278.297319296597 v.scale_factor = 1.16753614203674e-05 v[:] = np.array([-32768, -32767, -32766, 32767, 0]) with nc.Dataset(fname) as ds1: x1 = ds1["x"][:] print("netCDF4-python:", x1.dtype, x1) with xr.open_dataset(fname) as ds2: x2 = ds2["x"].values ds2.to_netcdf("test-7691-01.nc") print("xarray first read:", x2.dtype, x2) with xr.open_dataset("test-7691-01.nc") as ds3: x3 = ds3["x"].values print("xarray roundtrip:", x3.dtype, x3) `python netCDF4-python: float64 [277.9147410535744 -- 277.9147644042972 278.67988586425815 278.297319296597] xarray first read: float32 [277.91476 nan 277.91476 278.6799 278.29733] xarray roundtrip: float32 [ nan nan nan 278.6799 278.29733]` I've confirmed that correctly promoting to `float64` in `CFMaskCoder` solves this issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`nan` values appearing when saving and loading from `netCDF` due to encoding 1643408278
1486817329	https://github.com/pydata/xarray/issues/7691#issuecomment-1486817329	https://api.github.com/repos/pydata/xarray/issues/7691	IC_kwDOAMm_X85Ynwgx	kmuehlbauer 5821660	2023-03-28T12:41:43Z	2023-03-28T12:41:43Z	MEMBER	As this doesn't surface that often it might just happen here by accident. If the `_FillValue`/`missing_value` would be `-32768` then the issue would not manifest. So for NetCDF the default fillvalue for NC_SHORT (`int16`) is `-32767`. That means the promotion to `float32` instead the needed `float64` is the problem here (floating point precision).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`nan` values appearing when saving and loading from `netCDF` due to encoding 1643408278

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

457 rows where author_association = "MEMBER" and user = 5821660 sorted by updated_at descending

Loading

ncdump

loading with netcdf4

loading with h5netcdf

create DataArray

extract Variable

encode Variable

decode Variable

xarray created with NaT fill value

xarray created read with NaT fill value

Create a numpy array of type np.datetime64 with one fill value and one date

FIRST ISSUE WITH _FillValue

we need to provide ns resolution here too, otherwise we get wrong fillvalues (day-reference)

Create a dataset with this one array

Save the dataset to zarr

SECOND ISSUE with inferring units from data

We need to specify "dtype" and "units" which fit our data

Note: as we provide a _FillValue with a reference to unix-epoch

we need to provide a fitting units too

Create a numpy array of type np.datetime64 with one fill value and one date

Create a dataset with this one array

Save the dataset to zarr

read with xarray-python

7691, #2304

Advanced export