home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

6 rows where user = 57914115 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 3
  • pull 3

state 2

  • closed 3
  • open 3

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1947869312 PR_kwDOAMm_X85dCo6P 8324 Implement cftime vectorization as discussed in PR #8322 antscloud 57914115 open 0     0 2023-10-17T17:01:25Z 2023-10-23T05:11:11Z   CONTRIBUTOR   0 pydata/xarray/pulls/8324

As discussed in #8322, here is the test for implementing the vectorization

Only this test seems to fail in test_coding_times.py : https://github.com/pydata/xarray/blob/f895dc1a748b41d727c5e330e8d664a8b8780800/xarray/tests/test_coding_times.py#L1061-L1071

I don't really understand why though if you have an idea

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8324/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 2,
    "eyes": 0
}
    xarray 13221727 pull
1941775048 I_kwDOAMm_X85zvSLI 8302 Rust-based cftime Implementation for Python antscloud 57914115 open 0     2 2023-10-13T11:33:20Z 2023-10-22T16:35:50Z   CONTRIBUTOR      

Is your feature request related to a problem?

I developped a rust based project with python bindings code to parse the CF time conventions and deal with datetime operations.

You can find the project on GitHub at https://github.com/antscloud/cftime-rs.

It was something missing in the rust ecosystem to deal with NetCDF files, As the project in Rust hits its first working version, I wanted to explore the maturinecosystem and the Rust as a backend for python code. I ended up creating a new cftime implementation for python that have significant perfomance improvement (somewhere between 4 times to 20 times faster depending on the use case) over cftimeoriginal Cython code.

There are surely missing features compared to cftime and need to be tested more, but I think it could be interested as a replacement for some xarray operations (mainly for speed) regarding some of the issues of topic-cftime label

Please, let me know if xarray team could be interested. If you are, I can open a pull request to see it is possible, where it breaks the unit tests and if it's worth it

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8302/reactions",
    "total_count": 2,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 1,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1947508727 PR_kwDOAMm_X85dBaso 8322 Implementation of rust based cftime antscloud 57914115 open 0     1 2023-10-17T14:00:45Z 2023-10-17T22:20:31Z   CONTRIBUTOR   0 pydata/xarray/pulls/8322

As discussed in #8302, here is a first attempt to implement cftime_rs.

There are a lot of tests and I struggle to understand all the processing in coding/times.py. However, with this first attempt I've been able to make the test_cf_datetime work (ignoring one test)

https://github.com/pydata/xarray/blob/8423f2c47306cc3a4a52990818964f278179491f/xarray/tests/test_coding_times.py#L127-L131

Also there are some key differences betwwen cftime and cftime-rs : - A long int is used to represent the timestamp internally, so cftime-rs will not overflow as soon as numpy, pythonor cftime. It can go from -291,672,107,014 BC to 291,672,107,014 AD approximately and this depends on calendar. - There is no only_use_python_datetimes argument. Instead there are 4 distinct functions : - date2num() - num2date() - num2pydate() - pydate2num() - These functions only take a python list of one dimension and return a list of one dimension. A conversion should be done before. - There is no multiple datetime type (there are hidden) but instead a single object PyCFDatetime - There is no conda repository at the moment

Finally, and regardless of this PR, I guess there could be a speed improvement by vectorizing operations by replacing this : https://github.com/pydata/xarray/blob/df0ddaf2e68a6b033b4e39990d7006dc346fcc8c/xarray/coding/times.py#L622-L649

by something like this :

https://github.com/pydata/xarray/blob/8423f2c47306cc3a4a52990818964f278179491f/xarray/coding/times.py#L631-L670

We can use numpy instead of list comprehensions. It takes a bit more of memory though.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8322/reactions",
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 1,
    "eyes": 0
}
    xarray 13221727 pull
1303371718 I_kwDOAMm_X85Nr9_G 6781 Cannot open dataset with empty list units antscloud 57914115 closed 0     6 2022-07-13T12:33:11Z 2022-10-03T20:32:06Z 2022-10-03T20:32:05Z CONTRIBUTOR      

What happened?

I found myself using a netcdf with empty units and by using xarray i was unable to use open_dataset due to the parsing of cf conventions. I reproduce the bug, and it happens in a particular situation when the units is an empty list (See Minimal Complete Verifiable Example)

What did you expect to happen?

To parse the units attribute as an empty string ?

Minimal Complete Verifiable Example

```Python temp = 15 + 8 * np.random.randn(2, 2, 3) precip = 10 * np.random.rand(2, 2, 3) lon = [[-99.83, -99.32], [-99.79, -99.23]] lat = [[42.25, 42.21], [42.63, 42.59]]

for real use cases, its good practice to supply array attributes such as

units, but we won't bother here for the sake of brevity

ds = xr.Dataset( { "temperature": (["x", "y", "time"], temp), "precipitation": (["x", "y", "time"], precip), }, coords={ "lon": (["x", "y"], lon), "lat": (["x", "y"], lat), "time": pd.date_range("2014-09-06", periods=3), "reference_time": pd.Timestamp("2014-09-05"), }, ) ds.temperature.attrs["units"] = []

ds.to_netcdf("test.nc")

ds = xr.open_dataset("test.nc") ds.close() ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

```Python

TypeError Traceback (most recent call last) Input In [3], in <cell line: 1>() ----> 1 ds = xr.open_dataset("test.nc") 2 print(ds["temperature"].attrs) 3 ds.close()

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/api.py:495, in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, kwargs) 483 decoders = _resolve_decoders_kwargs( 484 decode_cf, 485 open_backend_dataset_parameters=backend.open_dataset_parameters, (...) 491 decode_coords=decode_coords, 492 ) 494 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None) --> 495 backend_ds = backend.open_dataset( 496 filename_or_obj, 497 drop_variables=drop_variables, 498 decoders, 499 kwargs, 500 ) 501 ds = _dataset_from_backend_dataset( 502 backend_ds, 503 filename_or_obj, (...) 510 *kwargs, 511 ) 512 return ds

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/netCDF4_.py:564, in NetCDF4BackendEntrypoint.open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, group, mode, format, clobber, diskless, persist, lock, autoclose) 562 store_entrypoint = StoreBackendEntrypoint() 563 with close_on_error(store): --> 564 ds = store_entrypoint.open_dataset( 565 store, 566 mask_and_scale=mask_and_scale, 567 decode_times=decode_times, 568 concat_characters=concat_characters, 569 decode_coords=decode_coords, 570 drop_variables=drop_variables, 571 use_cftime=use_cftime, 572 decode_timedelta=decode_timedelta, 573 ) 574 return ds

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/backends/store.py:27, in StoreBackendEntrypoint.open_dataset(self, store, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta) 24 vars, attrs = store.load() 25 encoding = store.get_encoding() ---> 27 vars, attrs, coord_names = conventions.decode_cf_variables( 28 vars, 29 attrs, 30 mask_and_scale=mask_and_scale, 31 decode_times=decode_times, 32 concat_characters=concat_characters, 33 decode_coords=decode_coords, 34 drop_variables=drop_variables, 35 use_cftime=use_cftime, 36 decode_timedelta=decode_timedelta, 37 ) 39 ds = Dataset(vars, attrs=attrs) 40 ds = ds.set_coords(coord_names.intersection(vars))

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/conventions.py:503, in decode_cf_variables(variables, attributes, concat_characters, mask_and_scale, decode_times, decode_coords, drop_variables, use_cftime, decode_timedelta) 499 continue 500 stack_char_dim = ( 501 concat_characters and v.dtype == "S1" and v.ndim > 0 and stackable(v.dims[-1]) 502 ) --> 503 new_vars[k] = decode_cf_variable( 504 k, 505 v, 506 concat_characters=concat_characters, 507 mask_and_scale=mask_and_scale, 508 decode_times=decode_times, 509 stack_char_dim=stack_char_dim, 510 use_cftime=use_cftime, 511 decode_timedelta=decode_timedelta, 512 ) 513 if decode_coords in [True, "coordinates", "all"]: 514 var_attrs = new_vars[k].attrs

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/conventions.py:354, in decode_cf_variable(name, var, concat_characters, mask_and_scale, decode_times, decode_endianness, stack_char_dim, use_cftime, decode_timedelta) 351 var = coder.decode(var, name=name) 353 if decode_timedelta: --> 354 var = times.CFTimedeltaCoder().decode(var, name=name) 355 if decode_times: 356 var = times.CFDatetimeCoder(use_cftime=use_cftime).decode(var, name=name)

File ~/.local/src/miniconda/envs/uptodatexarray/lib/python3.10/site-packages/xarray/coding/times.py:537, in CFTimedeltaCoder.decode(self, variable, name) 534 def decode(self, variable, name=None): 535 dims, data, attrs, encoding = unpack_for_decoding(variable) --> 537 if "units" in attrs and attrs["units"] in TIME_UNITS: 538 units = pop_to(attrs, encoding, "units") 539 transform = partial(decode_cf_timedelta, units=units)

TypeError: unhashable type: 'numpy.ndarray' ```

Anything else we need to know?

The following assignation produces the bug :

python ds.temperature.attrs["units"] = []

But these ones does not produce the bug : python ds.temperature.attrs["units"] = "[]" ds.temperature.attrs["units"] = ""

Also, i don't know how the units attributes get encoded for writing but i see no difference between ds.temperature.attrs["units"] = "" and ds.temperature.attrs["units"] = [] when using ncdump on the file

Environment

This bug was encountered with versions below this one.

INSTALLED VERSIONS ------------------ commit: None python: 3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0] python-bits: 64 OS: Linux OS-release: 5.13.0-52-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: ('fr_FR', 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.6.1 xarray: 0.20.1 pandas: 1.4.3 numpy: 1.22.3 scipy: None netCDF4: 1.5.7 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: None distributed: None matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: None sparse: None setuptools: 61.2.0 pip: 22.1.2 conda: None pytest: None IPython: 8.4.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6781/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
1068680815 PR_kwDOAMm_X84vQ2hE 6037 Fix wrong typing for tolerance in reindex antscloud 57914115 closed 0     6 2021-12-01T17:19:08Z 2022-01-15T17:28:08Z 2022-01-15T17:27:56Z CONTRIBUTOR   0 pydata/xarray/pulls/6037

In the xarray.core.dataset.pymodule, more particulary in the reindex method the tolerance argument is set to be a Number

https://github.com/pydata/xarray/blob/f08672847abec18f46df75e2f620646d27fa41a2/xarray/core/dataset.py#L2743

But the _reindex function call reindex_variable function. In the reindex_variable the type for tolerance is Any. This function ends to call get_indexer_nd which call the pandas function get_indexer :

https://github.com/pydata/xarray/blob/f08672847abec18f46df75e2f620646d27fa41a2/xarray/core/indexes.py#L137

In pandas the type of tolerance according to the docs can be a scalar or a list-like object

  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6037/reactions",
    "total_count": 2,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 2,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
869948050 MDU6SXNzdWU4Njk5NDgwNTA= 5230 Same files in open_mfdataset() unclear error message antscloud 57914115 closed 0     3 2021-04-28T13:26:59Z 2021-04-30T12:41:17Z 2021-04-30T12:41:17Z CONTRIBUTOR      

When using xr.open_mfdataset() with two exact same files by mistake, it causes an unclear error message

What happened:

With of course the time dimension existing :

python ds=xr.open_mfdataset(["some_file.nc","some_file.nc"],concat_dim="time",engine="netcdf4")

```python

ValueError Traceback (most recent call last)

~/.local/src/miniconda/envs/minireobs/lib/python3.8/site-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, parallel, join, attrs_file, **kwargs) 966 # Redo ordering from coordinates, ignoring how they were ordered 967 # previously --> 968 combined = combine_by_coords( 969 datasets, 970 compat=compat,

~/.local/src/miniconda/envs/minireobs/lib/python3.8/site-packages/xarray/core/combine.py in combine_by_coords(datasets, compat, data_vars, coords, fill_value, join, combine_attrs) 762 concatenated_grouped_by_data_vars = [] 763 for vars, datasets_with_same_vars in grouped_by_vars: --> 764 combined_ids, concat_dims = _infer_concat_order_from_coords( 765 list(datasets_with_same_vars) 766 )

~/.local/src/miniconda/envs/minireobs/lib/python3.8/site-packages/xarray/core/combine.py in _infer_concat_order_from_coords(datasets) 106 107 if len(datasets) > 1 and not concat_dims: --> 108 raise ValueError( 109 "Could not find any dimension coordinates to use to " 110 "order the datasets for concatenation"

ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation ``` What you expected to happen: A warning saying that we are using the same dataset ? A more explicit error message (exact same dimensions) ? No error and no concatenation, remove duplicated datasets?

Environment:

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-72-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: fr_FR.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.7.3 xarray: 0.17.0 pandas: 1.1.1 numpy: 1.19.2 scipy: 1.5.2 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.04.0 distributed: 2021.04.0 matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20200814 pip: 20.2.2 conda: None pytest: 6.1.1 IPython: 7.18.1 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5230/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 4960.814ms · About: xarray-datasette