home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

3 rows where repo = 13221727 and user = 3487237 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 2
  • open 1

type 1

  • issue 3

repo 1

  • xarray · 3 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1128759050 I_kwDOAMm_X85DR38K 6259 Be able to override calendar in `open_dataset`/`open_mfdataset`/etc OR include another calendar name kthyng 3487237 open 0     6 2022-02-09T16:25:24Z 2024-02-12T15:28:15Z   NONE      

Is your feature request related to a problem?

I think there was a version of ROMS in which the calendar was written as "gregorian_proleptic" instead of "proleptic_gregorian". Only the latter is checked for by xarray for valid calendar names. This unfortunately keeps coming up when I need to deal with model output from such ROMS simulations. I personally am using catalogs to access model output (e.g., intake, stac), making it so I need to be able to provide flags to the open_* command in order to be able to read in the model output from the catalog (i.e. rather than being able to run a command afterward to, say, overwrite the calendar with the correct name).

Describe the solution you'd like

I would like to either: 1. include "gregorian_proleptic" on the known list of calendars, or 2. be able to provide a keyword argument to the "open_*" commands to declare the calendar I want to use, overwriting what is in the file metadata.

Describe alternatives you've considered

I have used decode_times=False in this situation before and then sort of forced the datetimes into submission, but that solution won't work with a catalog setup in which all the necessary keywords to open the file(s) need to be in catalog entry.

Additional context

This code demonstrates the issue: import xarray as xr url = 'https://www.ncei.noaa.gov/thredds/dodsC/model-cbofs-files/2020/02/nos.cbofs.fields.n006.20200208.t18z.nc' ds = xr.open_dataset(url, drop_variables=['dstart'])

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6259/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1196270877 I_kwDOAMm_X85HTaUd 6453 In a specific case, `decode_cf` adds encoding dtype that breaks `to_netcdf` kthyng 3487237 closed 0     3 2022-04-07T16:09:14Z 2022-04-18T15:29:19Z 2022-04-18T15:29:19Z NONE      

What happened?

Though the time variable in the two example datasets, ds1 and ds3, looks the same, the encoding is actually different and that makes it so the Dataset cannot be saved to netcdf. The encoding is added by using decode_cf — but only sometimes. In the case of ds3 below, the encoding of dtype '<M8[ns]' is the problem and is added in that case (open_mfdataset with 2 files) with decode_cf but is not added in the case of ds1 (open_mfdataset with 1 file in a list). When the "encoding dtype" is float, everything works ok (shown in case ds2) — again even though the times in both cases are visually appearing as datetime64-interpreted strings of dates and appear to be essentially the same setup.

What did you expect to happen?

These situations should all work and save to netcdf.

Minimal Complete Verifiable Example

```Python import pandas as pd import xarray as xr tod = pd.Timestamp.today() locs = [tod.strftime('https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/%Y/%m/%d/nos.cbofs.regulargrid.n001.%Y%m%d.t00z.nc'), tod.strftime('https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/%Y/%m/%d/nos.cbofs.regulargrid.n002.%Y%m%d.t00z.nc')]

THIS WORKS: using open_mfdataset with 1 file

ds1 = xr.open_mfdataset([locs[0]]) print('DATASET1: ', ds1['ocean_time'].attrs, ds1['ocean_time'].encoding) print(ds1['ocean_time'].dtype) print(xr.decode_cf(ds1).ocean_time.encoding) xr.decode_cf(ds1).to_netcdf('test1.nc')

THIS WORKS: using open_mfdataset with 2 files but with times not decoded

ds2 = xr.open_mfdataset(locs, decode_times=False) print('\nDATASET2: ', ds2['ocean_time'].attrs, ds2['ocean_time'].encoding) print(ds2['ocean_time'].dtype) print(xr.decode_cf(ds2).ocean_time.encoding) xr.decode_cf(ds2).to_netcdf('test2.nc')

THIS DOES NOT WORK: using open_mfdataset with 2 files with times decoded

ds3 = xr.open_mfdataset(locs) print('\nDATASET3: ', ds3['ocean_time'].attrs, ds3['ocean_time'].encoding) print(ds3['ocean_time'].dtype) print(xr.decode_cf(ds3).ocean_time.encoding) xr.decode_cf(ds3).to_netcdf('test3.nc') ```

Relevant log output

```Python DATASET1: {'long_name': 'time since initialization', 'field': 'time, scalar, series'} {'source': 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/04/07/nos.cbofs.regulargrid.n001.20220407.t00z.nc', 'original_shape': (1,), 'dtype': dtype('float64'), 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian'} datetime64[ns] {'source': 'https://opendap.co-ops.nos.noaa.gov/thredds/dodsC/NOAA/CBOFS/MODELS/2022/04/07/nos.cbofs.regulargrid.n001.20220407.t00z.nc', 'original_shape': (1,), 'dtype': dtype('float64'), 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian'}

DATASET2: {'long_name': 'time since initialization', 'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian', 'field': 'time, scalar, series'} {} float64 {'units': 'seconds since 2016-01-01 00:00:00', 'calendar': 'gregorian', 'dtype': dtype('float64')}

DATASET3: {'long_name': 'time since initialization', 'field': 'time, scalar, series'} {} datetime64[ns] {'dtype': dtype('<M8[ns]')}


ValueError Traceback (most recent call last) Input In [9], in <module> 24 print(ds3['ocean_time'].dtype) 25 print(xr.decode_cf(ds3).ocean_time.encoding) ---> 26 xr.decode_cf(ds3).to_netcdf('test3.nc')

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/core/dataset.py:1900, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf) 1897 encoding = {} 1898 from ..backends.api import to_netcdf -> 1900 return to_netcdf( 1901 self, 1902 path, 1903 mode, 1904 format=format, 1905 group=group, 1906 engine=engine, 1907 encoding=encoding, 1908 unlimited_dims=unlimited_dims, 1909 compute=compute, 1910 invalid_netcdf=invalid_netcdf, 1911 )

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/api.py:1072, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf) 1067 # TODO: figure out how to refactor this logic (here and in save_mfdataset) 1068 # to avoid this mess of conditionals 1069 try: 1070 # TODO: allow this work (setting up the file for writing array data) 1071 # to be parallelized with dask -> 1072 dump_to_store( 1073 dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims 1074 ) 1075 if autoclose: 1076 store.close()

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/api.py:1119, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims) 1116 if encoder: 1117 variables, attrs = encoder(variables, attrs) -> 1119 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/common.py:265, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims) 263 self.set_attributes(attributes) 264 self.set_dimensions(variables, unlimited_dims=unlimited_dims) --> 265 self.set_variables( 266 variables, check_encoding_set, writer, unlimited_dims=unlimited_dims 267 )

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/common.py:303, in AbstractWritableDataStore.set_variables(self, variables, check_encoding_set, writer, unlimited_dims) 301 name = _encode_variable_name(vn) 302 check = vn in check_encoding_set --> 303 target, source = self.prepare_variable( 304 name, v, check, unlimited_dims=unlimited_dims 305 ) 307 writer.add(source, target)

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:464, in NetCDF4DataStore.prepare_variable(self, name, variable, check_encoding, unlimited_dims) 461 def prepare_variable( 462 self, name, variable, check_encoding=False, unlimited_dims=None 463 ): --> 464 datatype = _get_datatype( 465 variable, self.format, raise_on_invalid_encoding=check_encoding 466 ) 467 attrs = variable.attrs.copy() 469 fill_value = attrs.pop("_FillValue", None)

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:139, in _get_datatype(var, nc_format, raise_on_invalid_encoding) 137 def _get_datatype(var, nc_format="NETCDF4", raise_on_invalid_encoding=False): 138 if nc_format == "NETCDF4": --> 139 return _nc4_dtype(var) 140 if "dtype" in var.encoding: 141 encoded_dtype = var.encoding["dtype"]

File ~/miniconda3/envs/model_catalogs/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:160, in _nc4_dtype(var) 158 dtype = var.dtype 159 else: --> 160 raise ValueError(f"unsupported dtype for netCDF4 variable: {var.dtype}") 161 return dtype

ValueError: unsupported dtype for netCDF4 variable: datetime64[ns] ```

Anything else we need to know?

I tried turning off all the keyword-based inputs in decode_cf and the situation is the same as shown above. Currently my workaround is to first check to see if the datetimes in the Dataset are float and need to be decoded (because they were read in with decode_times=False), and only then are they decoded with decode_cf. In that case, the problem doesn't arise. (This is the case of ds2.)

Environment

INSTALLED VERSIONS

commit: None python: 3.8.0 | packaged by conda-forge | (default, Nov 22 2019, 19:11:19) [Clang 9.0.0 (tags/RELEASE_900/final)] python-bits: 64 OS: Darwin OS-release: 19.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 0.21.1 pandas: 1.4.1 numpy: 1.22.2 scipy: 1.8.0 netCDF4: 1.5.8 pydap: None h5netcdf: 0.13.1 h5py: 3.6.0 Nio: None zarr: 2.11.0 cftime: 1.5.2 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.02.0 distributed: 2022.02.0 matplotlib: 3.5.1 cartopy: 0.20.2 seaborn: None numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.8.0 pip: 22.0.3 conda: None pytest: 7.0.1 IPython: 8.0.1 sphinx: 4.4.0

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6453/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
251332357 MDU6SXNzdWUyNTEzMzIzNTc= 1510 open_dataset leading to NetCDF: file not found kthyng 3487237 closed 0     4 2017-08-18T19:05:02Z 2017-08-18T21:27:34Z 2017-08-18T21:27:34Z NONE      

Hi all. I have been using xarray.read_dataset(url) to read in netCDF model output from a thredds server in a script run every 30 minutes for months now. It normally works unless there is something wrong with the url. However, in the last few days, it cannot find the file anymore even though when I read it in with netCDF4, it works.

loc = 'http://barataria.tamu.edu:8080/thredds/dodsC/NcML/oof_archive_agg' import netCDF4 as netCDF d = netCDF.Dataset(loc) # this works import xarray as xr ds = xr.open_dataset(loc) # this doesn't work as of the last couple of days

This is a problem on both my Linux workstation (xarray version 0.9.5, Python 3.5.2) and my Mac (xarray version 0.9.6, Python 3.5.0rc4). Anyone have an idea what could have changed here?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1510/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 28.485ms · About: xarray-datasette