home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

4 rows where comments = 3 and user = 20629530 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 2
  • pull 2

state 1

  • closed 4

repo 1

  • xarray 4
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2075019328 PR_kwDOAMm_X85juCQ- 8603 Convert 360_day calendars by choosing random dates to drop or add aulemahal 20629530 closed 0     3 2024-01-10T19:13:31Z 2024-04-16T14:53:42Z 2024-04-16T14:53:42Z CONTRIBUTOR   0 pydata/xarray/pulls/8603
  • [x] Tests added
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst

Small PR to add a new "method" to convert to and from 360_day calendars. The current two methods (chosen with the align_on keyword) will always remove or add the same day-of-year for all years of the same length.

This new option will randomly chose the days, one for each fifth of the year (72-days period). It emulates the method of the LOCA datasets (see web page and article ). February 29th is always removed/added when the source/target is a leap year.

I copied the implementation from xclim (which I wrote), see code here .

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8603/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1347026292 I_kwDOAMm_X85QSf10 6946 reset_index not resetting levels of MultiIndex aulemahal 20629530 closed 0 benbovy 4160723   3 2022-08-22T21:47:04Z 2022-09-27T10:35:39Z 2022-09-27T10:35:39Z CONTRIBUTOR      

What happened?

I'm not sure my usecase is the simplest way to demonstrate the issue, but let's try anyway.

I have a DataArray with two coordinates and I stack them into a new multi-index. I want to pass the levels of that new multi-index into a function, but as dask arrays. Turns out, it is not straightforward to chunk these variables because they act like IndexVariable objects and refuse to be chunked.

Thus, I reset the multi-index, drop it, but the variables still don't want to be chunked!

What did you expect to happen?

I expected the levels to be chunkable after the sequence : stack, reset_index.

Minimal Complete Verifiable Example

```Python import xarray as xr ds = xr.tutorial.open_dataset('air_temperature')

ds = ds.stack(spatial=['lon', 'lat']) ds = ds.reset_index('spatial', drop=True) # I don't think the drop is important here. lon_chunked = ds.lon.chunk() # woups, doesn't do anything!

type(ds.lon.variable) # xarray.core.variable.IndexVariable # I assumed either the stack or the reset_index would have modified this type into a normal variable. ```

MVCE confirmation

  • [X] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [X] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [X] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [X] New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

Seems kinda related to the issues around reset_index. I thinks this is related to (but not a duplicate of) #4366.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.5 | packaged by conda-forge | (main, Jun 14 2022, 07:04:59) [GCC 10.3.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.49.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_CA.UTF-8 LOCALE: ('en_CA', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 2022.6.0 pandas: 1.4.3 numpy: 1.22.4 scipy: 1.9.0 netCDF4: 1.6.0 pydap: None h5netcdf: None h5py: 3.7.0 Nio: None zarr: 2.12.0 cftime: 1.6.1 nc_time_axis: 1.4.1 PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.5 dask: 2022.8.0 distributed: 2022.8.0 matplotlib: 3.5.2 cartopy: 0.20.3 seaborn: None numbagg: None fsspec: 2022.7.1 cupy: None pint: 0.19.2 sparse: 0.13.0 flox: 0.5.9 numpy_groupies: 0.9.19 setuptools: 63.4.2 pip: 22.2.2 conda: None pytest: None IPython: 8.4.0 sphinx: 5.1.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6946/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
824917345 MDU6SXNzdWU4MjQ5MTczNDU= 5010 DataArrays inside apply_ufunc with dask=parallelized aulemahal 20629530 closed 0     3 2021-03-08T20:19:41Z 2021-03-08T20:37:15Z 2021-03-08T20:35:01Z CONTRIBUTOR      

Is your feature request related to a problem? Please describe. Currently, when using apply_ufunc with dask=parallelized the wrapped function receives numpy arrays upon computation.
Some xarray operations generate enormous amount of chunks (best example : da.groupby('time.dayofyear'), so any complex script using dask ends up with huge task graphs. Dask's scheduler becomes overloaded, sometimes even hangs, sometimes uses way more RAM than its workers.

Describe the solution you'd like I'd want to profit from both the tools of xarray and the power of dask parallelization. I'd like to be able to do something like this:

```python3 def func(da): """Example of an operation not (easily) possible with numpy.""" return da.groupby('time').mean()

xr.apply_ufunc( da, func, input_core_dims=[['time']], pass_xr=True, dask='parallelized' ) `` I'd like the wrapped func to receive DataArrays resembling the inputs (named dims, coords and all), but only with the subset of that dask chunk. Doing this, the whole function gets parallelized : dask only sees 1 task and I can code using xarray. Depending on the implementation, it might be less efficient thandask=allowed` for small dataset, but I think this could be beneficial for long and complex computations on large datasets.

Describe alternatives you've considered The alternative is to reduce the size of the datasets (looping on other dimensions), but that defeats the purpose of dask.

Another alternative I am currently testing, is to add a layer between apply_ufunc and the func. That layer reconstruct a DataArray and deconstructs it before returning the result, so xarray/dask only passing by. If this works and is elegant enough, I can maybe suggest an implementation within xarray.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5010/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
612846594 MDExOlB1bGxSZXF1ZXN0NDEzNzEzODg2 4033 xr.infer_freq aulemahal 20629530 closed 0     3 2020-05-05T19:39:05Z 2020-05-30T18:11:36Z 2020-05-30T18:08:27Z CONTRIBUTOR   0 pydata/xarray/pulls/4033
  • [x] Tests added
  • [x] Passes isort -rc . && black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

This PR adds a xr.infer_freq method to copy pandas infer_freq but on CFTimeIndex objects. I tried to subclass pandas _FrequencyInferer and to only override as little as possible.

Two things are problematic right now and I would like to get feedback on how to implement them if this PR gets the dev's approval.

1) pd.DatetimeIndex.asi8 returns integers representing nanoseconds since 1970-1-1, while xr.CFTimeIndex.asi8 returns microseconds. In order not to break the API, I patched the _CFTimeFrequencyInferer to store 1000x the values. Not sure if this is the best, but it works.

2) As of now, xr.infer_freq will fail on weekly indexes. This is because pandas is using datetime.weekday() at some point but cftime objects do not implement that (they use dayofwk instead). I'm not sure what to do? Cftime could implement it to completly mirror python's datetime or pandas could use dayofwk since it's available on the TimeStamp objects.

Another option, cleaner but longer, would be to reimplement _FrequencyInferer from scratch. I may have time for this, cause I really think a xr.infer_freq method would be useful.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4033/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 2569.222ms · About: xarray-datasette