home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

19 rows where issue = 868907284 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 4

  • jvail 8
  • benbovy 6
  • max-sixty 4
  • keewis 1

author_association 2

  • MEMBER 11
  • NONE 8

issue 1

  • 'NaT' as fill value and netcdf export · 19 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
831966730 https://github.com/pydata/xarray/issues/5223#issuecomment-831966730 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgzMTk2NjczMA== jvail 6503378 2021-05-04T14:02:32Z 2021-05-04T14:02:32Z NONE

I think that we can then either close this issue or move it to Xarray-simlab.

Ok. I'll prepare a better example, will try to come up with a suggestion and create an issue over there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
831298605 https://github.com/pydata/xarray/issues/5223#issuecomment-831298605 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgzMTI5ODYwNQ== benbovy 4160723 2021-05-03T14:31:14Z 2021-05-03T14:32:07Z MEMBER

I guess your last example works because Xarray's Dataset.to_zarr() does some operations (e.g., encoding datetime values as floats + adding encoding attributes like calendar and units) that Xarray-simlab doesn't (Xarray-simlab creates the zarr datasets by directly using the zarr-python API).

The key thing is adding units in variable encoding, e.g., from a Xarray-simlab output dataset:

ds_out.to_netcdf('test.nc', engine='netcdf4', encoding={'p__var': {'units': 'days since 2010-01-01 00:00:00'}})

Maybe we could borrow some logic from Dataset.to_zarr() in Xarray-simlab to encode datetime values when it is saved to zarr.

I think that we can then either close this issue or move it to Xarray-simlab.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
831263359 https://github.com/pydata/xarray/issues/5223#issuecomment-831263359 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgzMTI2MzM1OQ== jvail 6503378 2021-05-03T13:35:09Z 2021-05-03T13:36:32Z NONE

If you reset the encoding in your notebook example, e.g., ds_out_no_scale.p__var.encoding = {} you will be able to save the Dataset to a netcdf4 file. Not sure why...

Yes, thank you @benbovy , that works for me as well as a work-around.

However .. (excuse my stubbornness):

I tried writing a simple array from ds -> zarr -> nc -> ds. Surprisingly everything works fine. Even the fill value is there (with and w/o mask_and_scale). It might be a xarray simlab issue and something goes wrong when writing/reading to/from zarr. Should I create an issue over there and close this one?

https://github.com/jvail/xarray-simlab/blob/test_encoding_netcdf/notebooks/ds_to_zarr_to_nc_to_ds.ipynb

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
831130073 https://github.com/pydata/xarray/issues/5223#issuecomment-831130073 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgzMTEzMDA3Mw== benbovy 4160723 2021-05-03T09:09:25Z 2021-05-03T09:09:25Z MEMBER

I checked with a very basic example:

```python import numpy as np import xarray as xr

p_var = np.full((2, 2), np.datetime64('2000-01-01'), dtype='datetime64[ns]') ds = xr.Dataset({'p__var': (('main', 'idx'), var)}) ds.to_netcdf('test.nc', engine='netcdf4') # works! ```

The only difference with the example in your notebook is that in the example above ds.p__var.encoding returns an empty dictionary. If you reset the encoding in your notebook example, e.g., ds_out_no_scale.p__var.encoding = {} you will be able to save the Dataset to a netcdf4 file. Not sure why...

(side note: with mask_and_scale=True, masking missing values withnan causes the dtype to change to float, because type(np.nan) is float).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
829023263 https://github.com/pydata/xarray/issues/5223#issuecomment-829023263 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyOTAyMzI2Mw== jvail 6503378 2021-04-29T07:59:55Z 2021-05-02T09:03:40Z NONE

@jvail could you provide a small reproducible example?

Sure, thank you. I'll put a notebook together

I hope this helps: https://github.com/jvail/xarray-simlab/blob/test_encoding_netcdf/notebooks/test_encoding_netcdf.ipynb

---- edit If I serialize to a dict after popping the _FillValue and then back it seems to work (both the datetime and NaT encoding) unless the datetime64 variable has at least one none 'Nat'.

xr.Dataset.from_dict(ds.to_dict()).to_netcdf('nc', engine='netcdf4')

Removing the entire encoding obj from the variable with dtype datetime64 seems to work as well. However, no idea, if that is a good idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
829016596 https://github.com/pydata/xarray/issues/5223#issuecomment-829016596 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyOTAxNjU5Ng== benbovy 4160723 2021-04-29T07:50:08Z 2021-04-29T07:50:08Z MEMBER

@jvail could you provide a small reproducible example?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
829013469 https://github.com/pydata/xarray/issues/5223#issuecomment-829013469 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyOTAxMzQ2OQ== jvail 6503378 2021-04-29T07:44:58Z 2021-04-29T07:44:58Z NONE

I guess if I manually drop the attrs._FillValue writing netcdf should work.

Hm, now I get ValueError: unsupported dtype for netCDF4 variable: datetime64[ns] :) I'll see if I can figure out where the problem is. Maybe in zarr since I can write an identical Dataset to netcdf without error.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828410285 https://github.com/pydata/xarray/issues/5223#issuecomment-828410285 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODQxMDI4NQ== jvail 6503378 2021-04-28T12:23:02Z 2021-04-28T12:23:02Z NONE

Opened #5226

Thank you @benbovy! I could not have pinned down the issue myself. I guess if I manually drop the attrs._FillValue writing netcdf should work.

Not sure if this should be closed or maybe moved to xarray-simlab?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828292425 https://github.com/pydata/xarray/issues/5223#issuecomment-828292425 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODI5MjQyNQ== benbovy 4160723 2021-04-28T09:11:34Z 2021-04-28T09:11:34Z MEMBER

Opened #5226

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828242346 https://github.com/pydata/xarray/issues/5223#issuecomment-828242346 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODI0MjM0Ng== benbovy 4160723 2021-04-28T08:02:54Z 2021-04-28T08:03:38Z MEMBER

So maybe the Zarr backend should pop _FillValue from Variable's attrs to encoding even for mask_and_scale=False?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828231169 https://github.com/pydata/xarray/issues/5223#issuecomment-828231169 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODIzMTE2OQ== benbovy 4160723 2021-04-28T07:47:44Z 2021-04-28T07:47:44Z MEMBER

For more context, xarray-simlab doesn't set the _FillValue attribute directly. Instead it uses Xarray's zarr backend, which leaves the _FillValue item as-is in the variable attributes when it is not picked up in decode_cf (i.e., when setting mask_and_scale=False):

https://github.com/pydata/xarray/blob/0021cdab91f7466f4be0fb32dae92bf3f8290e19/xarray/backends/zarr.py#L369-L372

https://github.com/pydata/xarray/blob/ab4e94ec4f6933476ee0d21c937d8f0f8d39ed82/xarray/coding/variables.py#L183-L186

So maybe the original issue should be solved there?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828230957 https://github.com/pydata/xarray/issues/5223#issuecomment-828230957 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODIzMDk1Nw== max-sixty 5635139 2021-04-28T07:47:25Z 2021-04-28T07:47:25Z MEMBER

OK great! Does that mean this is solved? Or you need it on attrs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828171777 https://github.com/pydata/xarray/issues/5223#issuecomment-828171777 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODE3MTc3Nw== jvail 6503378 2021-04-28T06:07:33Z 2021-04-28T06:07:56Z NONE

Does putting it on .encoding solve the immediate issue?

The issue is gone if I have neither an attrs._FillValue nor encoding._FillValue or just encoding._FillValue. The issue comes with attrs._FillValue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828162977 https://github.com/pydata/xarray/issues/5223#issuecomment-828162977 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODE2Mjk3Nw== max-sixty 5635139 2021-04-28T05:48:25Z 2021-04-28T05:48:25Z MEMBER

Ah so it is a special attrs...

Does putting it on .encoding solve the immediate issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828159821 https://github.com/pydata/xarray/issues/5223#issuecomment-828159821 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODE1OTgyMQ== jvail 6503378 2021-04-28T05:40:46Z 2021-04-28T05:40:46Z NONE

Please could I ask once more — forgive me if I'm missing something but I did ask this a week ago and still don't understand #5200 (comment):

Is there a specific reason _FillValue needs to be in the attrs? (I'm not a big netcdf user so there may be)

Sorry, overlooked that one. I'll dig in the simlab code.

Why is xarray not ignoring the _FillValue in attrs?

It's trying to serialize it, as it would any other attrs.

Ok, that would make sense if '_FillValue' would just be an ordinary attr. But it seems it is not. If I set it to a string 'NaT' netcdf complains.

``` lib/python3.8/site-packages/scipy/io/netcdf.py in _get_encoded_fill_value(self) 1030 """ 1031 if '_FillValue' in self._attributes: -> 1032 fill_value = np.array(self._attributes['_FillValue'], 1033 dtype=self.data.dtype).tobytes() 1034 if len(fill_value) == self.itemsize():

ValueError: could not convert string to float: b'NaT' ```

If I set it to 0 I do not get an error. But then - after reloading the netcdf file - my NaT becomes array(['1970-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828156283 https://github.com/pydata/xarray/issues/5223#issuecomment-828156283 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODE1NjI4Mw== max-sixty 5635139 2021-04-28T05:32:20Z 2021-04-28T05:32:20Z MEMBER

Please could I ask once more — forgive me if I'm missing something but I did ask this a week ago and still don't understand https://github.com/pydata/xarray/discussions/5200#discussioncomment-638329:

Is there a specific reason _FillValue needs to be in the attrs? (I'm not a big netcdf user so there may be)


Why is xarray not ignoring the _FillValue in attrs?

It's trying to serialize it, as it would any other attrs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
828151911 https://github.com/pydata/xarray/issues/5223#issuecomment-828151911 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyODE1MTkxMQ== jvail 6503378 2021-04-28T05:20:50Z 2021-04-28T05:22:00Z NONE

Re-Hello and thank you,

hm, so the proper way (if necessary at all) would be to set

da.encoding = {'_FillValue': np.datetime64('NAT')}

rather than putting it in attrs?

Maybe I make a mistake in xarray-simlab and I should set the encoding explicitly and my confusion comes from having _FillValue in attrs and encoding. But - as far as I can see - in the xarray docs for writing netcdf the encoding _FillValue is mentioned. Why is xarray not ignoring the _FillValue in attrs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
827675422 https://github.com/pydata/xarray/issues/5223#issuecomment-827675422 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyNzY3NTQyMg== keewis 14808389 2021-04-27T15:00:40Z 2021-04-27T18:15:02Z MEMBER

[xarray-simlab] stores the _FillValue as an attribute which in turn is used by netcdf

that might be a bug in xarray-simlab (cc @benbovy). Usually, the fill value is used to replace missing values on disk. For example, python np.array([0, np.nan, 2, np.nan, np.nan, 5]) with a fill value of -1 could be encoded as [0, -1, 2, -1, -1, 5] before writing to disk, which can be saved as a int (int8, even) instead of a float. Same for datetimes: ["2020-01-01", "NaT", "2020-12-01"] with a fill value of -1 can be encoded as [0, -1, 11] with units = "months since 2020-01-01" and the standard calendar. As far as I understand it, using np.datetime64("NaT") as fill value does not make much sense because netCDF does not support datetime dtypes:

traceback when trying to save a datetime array attribute ```pytb TypeError Traceback (most recent call last) <ipython-input-1-9d07cb2115e9> in <module> 1 import numpy as np 2 import xarray as xr ----> 3 xr.Dataset(attrs={"_FillValue": np.array("NaT", dtype="M")}).to_netcdf("test.nc") .../xarray/core/dataset.py in to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1752 from ..backends.api import to_netcdf 1753 -> 1754 return to_netcdf( 1755 self, 1756 path, .../xarray/backends/api.py in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1066 # TODO: allow this work (setting up the file for writing array data) 1067 # to be parallelized with dask -> 1068 dump_to_store( 1069 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1070 ) .../xarray/backends/api.py in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1113 variables, attrs = encoder(variables, attrs) 1114 -> 1115 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims) 1116 1117 .../xarray/backends/common.py in store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 variables, attributes = self.encode(variables, attributes) 264 --> 265 self.set_attributes(attributes) 266 self.set_dimensions(variables, unlimited_dims=unlimited_dims) 267 self.set_variables( .../xarray/backends/common.py in set_attributes(self, attributes) 280 """ 281 for k, v in attributes.items(): --> 282 self.set_attribute(k, v) 283 284 def set_variables(self, variables, check_encoding_set, writer, unlimited_dims=None): .../xarray/backends/netCDF4_.py in set_attribute(self, key, value) 449 self.ds.setncattr_string(key, value) 450 else: --> 451 self.ds.setncattr(key, value) 452 453 def encode_variable(self, variable): src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.setncattr() src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4._set_att() TypeError: illegal data type for attribute b'_FillValue', must be one of dict_keys(['S1', 'i1', 'u1', 'i2', 'u2', 'i4', 'u4', 'i8', 'u8', 'f4', 'f8']), got M8 ```

Also, it's strange that _FillValue is saved to attrs and not encoding (which means xarray won't actually use it to encode the arrays).

As a summary, I think you should open this issue on the issue tracker of xarray-simlab.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284
827803653 https://github.com/pydata/xarray/issues/5223#issuecomment-827803653 https://api.github.com/repos/pydata/xarray/issues/5223 MDEyOklzc3VlQ29tbWVudDgyNzgwMzY1Mw== max-sixty 5635139 2021-04-27T18:02:38Z 2021-04-27T18:02:38Z MEMBER

Also, it's strange that _FillValue is saved to attrs and not encoding (which means xarray won't actually use it to encode the arrays).

I asked for reference for this in #5200. Agree this is surprising.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  'NaT' as fill value and netcdf export 868907284

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 63.114ms · About: xarray-datasette