home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

9 rows where repo = 13221727, state = "open" and user = 6815844 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 2

  • issue 8
  • pull 1

state 1

  • open · 9 ✖

repo 1

  • xarray · 9 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
274797981 MDU6SXNzdWUyNzQ3OTc5ODE= 1725 Switch our lazy array classes to use Dask instead? fujiisoup 6815844 open 0     9 2017-11-17T09:12:34Z 2023-09-15T15:51:41Z   MEMBER      

Ported from #1724, comment by @shoyer

In the long term, it would be nice to get ride of these uses of _data, maybe by switching entirely from our lazy array classes to Dask.

The subtleties of checking _data vs data are undesirable, e.g., consider the bug on these lines: https://github.com/pydata/xarray/blob/1a012080e0910f3295d0fc26806ae18885f56751/xarray/core/formatting.py#L212-L213

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1725/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
527237590 MDU6SXNzdWU1MjcyMzc1OTA= 3562 Minimize `.item()` call fujiisoup 6815844 open 0     1 2019-11-22T14:44:43Z 2023-06-08T04:48:50Z   MEMBER      

MCVE Code Sample

I want to minimize the number of calls .item() within my data analysis. It often happens

  1. when putting a 0d-DataArray into a slice python da = xr.DataArray([0.5, 4.5, 2.5], dims=['x'], coords={'x': [0, 1, 2]}) da[: da.argmax()] -> TypeError: 'DataArray' object cannot be interpreted as an integer

  2. when using a 0d-DataArray for selecting python da = xr.DataArray([0.5, 4.5, 2.5], dims=['x'], coords={'x': [0, 0, 2]}) da.sel(x=da['x'][0]) -> IndexError: arrays used as indices must be of integer (or boolean) type

Both cases, I need to call '.item()'. It is not a big issue, but I think it would be nice if xarray becomes more self-contained.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3562/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
675482176 MDU6SXNzdWU2NzU0ODIxNzY= 4325 Optimize ndrolling nanreduce fujiisoup 6815844 open 0     5 2020-08-08T07:46:53Z 2023-04-13T15:56:52Z   MEMBER      

In #4219 we added ndrolling. However, nanreduce, such as ds.rolling(x=3, y=2).mean() calls np.nanmean which copies the strided-array into a full-array. This is memory-inefficient.

We can implement inhouse-nanreduce methods for the strided array. For example, our .nansum currently does make a strided array -> copy the array -> replace nan by 0 -> sum but we can do instead replace nan by 0 -> make a strided array -> sum This is much more memory efficient.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4325/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
531087939 MDExOlB1bGxSZXF1ZXN0MzQ3NTkyNzE1 3587 boundary options for rolling.construct fujiisoup 6815844 open 0     4 2019-12-02T12:11:44Z 2022-06-09T14:50:17Z   MEMBER   0 pydata/xarray/pulls/3587
  • [x] Closes #2007, #2011
  • [x] Tests added
  • [x] Passes black . && mypy . && flake8
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Added some boundary options for rolling.construct. Currently, the option names are inherited from np.pad, ['edge' | 'reflect' | 'symmetric' | 'wrap']. Do we want a more intuitive name, such as periodic?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3587/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
280875330 MDU6SXNzdWUyODA4NzUzMzA= 1772 nonzero method for xr.DataArray fujiisoup 6815844 open 0     5 2017-12-11T02:25:11Z 2022-04-01T10:42:20Z   MEMBER      

np.nonzero to DataArray returns a wrong result,

python In [4]: da = xr.DataArray(np.arange(12).reshape(4, 3), dims=['x', 'y'], ...: coords={'x': [0, 1, 2, 3], 'y': ['a', 'b', 'c']}) ...: np.nonzero(da) ...: Out[4]: <xarray.DataArray (x: 2, y: 11)> array([[0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]]) Coordinates: * x (x) int64 0 1 2 3 * y (y) <U1 'a' 'b' 'c'

Problem description

Apparently, the dimensions and the coordinates conflict each other. I think we can have our own nonzero method, which may return a Dataset consisting of indexes and appropriate coordinates.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.5.2.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-101-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 xarray: 0.9.6-172-gc58d142 pandas: 0.21.0 numpy: 1.13.1 scipy: 0.19.1 netCDF4: None h5netcdf: None Nio: None bottleneck: 1.2.1 cyordereddict: None dask: 0.16.0 matplotlib: 2.0.2 cartopy: None seaborn: 0.7.1 setuptools: 36.5.0 pip: 9.0.1 conda: 4.3.30 pytest: 3.2.3 IPython: 6.0.0 sphinx: 1.6.3
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1772/reactions",
    "total_count": 6,
    "+1": 6,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
898657012 MDU6SXNzdWU4OTg2NTcwMTI= 5361 Inconsistent behavior in grouby depending on the dimension order fujiisoup 6815844 open 0     1 2021-05-21T23:11:37Z 2022-03-29T11:45:32Z   MEMBER      

groupby works inconsistently depending on the dimension order of a DataArray. Furthermore, in some cases, this causes a corrupted object.

python In [4]: data = xr.DataArray( ...: np.random.randn(4, 2), ...: dims=['x', 'z'], ...: coords={'x': ['a', 'b', 'a', 'c'], 'y': ('x', [0, 1, 0, 2])} ...: ) ...: ...: data.groupby('x').mean() Out[4]: <xarray.DataArray (x: 3, z: 2)> array([[ 0.95447186, -1.14467028], [ 0.76294958, 0.3751244 ], [-0.41030223, -1.35344548]]) Coordinates: * x (x) object 'a' 'b' 'c' Dimensions without coordinates: z groupby works fine (although this drops nondimensional coordinate y, related to #3745).

However, groupby does not give a correct result if we work on the second dimension, python In [5]: data.T.groupby('x').mean() # <--- change the dimension order, and do the same thing Out[5]: <xarray.DataArray (z: 2, x: 3)> array([[ 0.95447186, 0.76294958, -0.41030223], [-1.14467028, 0.3751244 , -1.35344548]]) Coordinates: * x (x) object 'a' 'b' 'c' y (x) int64 0 1 0 2 # <-- the size must be 3!! Dimensions without coordinates: z

The bug has been discussed in #2944 and solved, but I found this is still there.

Output of <tt>xr.show_versions()</tt> INSTALLED VERSIONS ------------------ commit: 09d8a4a785fa6521314924fd785740f2d13fb8ee python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.0-72-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.10.4 libnetcdf: 4.6.1 xarray: 0.16.1.dev30+g1d3dee08.d20200808 pandas: 1.1.3 numpy: 1.18.1 scipy: 1.5.2 netCDF4: 1.4.2 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.6.0 distributed: 2.7.0 matplotlib: 3.2.2 cartopy: None seaborn: 0.10.1 numbagg: None pint: None setuptools: 46.1.1.post20200323 pip: 20.0.2 conda: None pytest: 5.2.1 IPython: 7.13.0 sphinx: None
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5361/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
359240638 MDU6SXNzdWUzNTkyNDA2Mzg= 2410 Updated text for indexing page fujiisoup 6815844 open 0     11 2018-09-11T22:01:39Z 2021-11-15T21:17:14Z   MEMBER      

We have a bunch of terms to describe the xarray structure, such as dimension, coordinate, dimension coordinate, etc.. Although it has been discussed in #1295 and we tried to use the consistent terminology in our docs, it looks still not easy for users to understand our functionalities.

In #2399, @horta wrote a list of definitions (https://drive.google.com/file/d/1uJ_U6nedkNe916SMViuVKlkGwPX-mGK7/view?usp=sharing). I think it would be nice to have something like this in our docs.

Any thought?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2410/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
254927382 MDU6SXNzdWUyNTQ5MjczODI= 1553 Multidimensional reindex fujiisoup 6815844 open 0     2 2017-09-04T03:29:39Z 2020-12-19T16:00:00Z   MEMBER      

From a discussion in #1473 comment

It would be convenient if we have multi-dimensional reindex method, where we consider dimensions and coordinates of indexers. The proposed outline by @shoyer is

  • Given reindex arguments of the form dim=array where array is a 1D unlabeled array/list, convert them into DataArray(array, [(dim, array)]).
  • Do multi-dimensional indexing with broadcasting like sel, but fill in NaN for missing values (we could allow for customizing this with a fill_value argument).
  • Join coordinates like for sel, but coordinates from the indexers take precedence over coordinates from the object being indexed.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1553/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
280673215 MDU6SXNzdWUyODA2NzMyMTU= 1771 Needs performance check / improvements in value assignment of DataArray fujiisoup 6815844 open 0     1 2017-12-09T03:42:41Z 2019-10-28T14:53:24Z   MEMBER      

https://github.com/pydata/xarray/blob/5e801894886b2060efa8b28798780a91561a29fd/xarray/core/dataarray.py#L482-L489

In #1746, we added a validation in xr.DataArray.__setitem__ whether the coordinates consistency of array, key, and values are checked. In the current implementation, we call xr.DataArray.__getitem__ to use the existing coordinate validation logic, but it does unnecessary indexing and it may decrease the __setitem__ performance if the arrray is multidimensional.

We may need to optimize the logic here.

Is it reasonable to constantly monitor the performance of basic operations, such as Dataset construction, alignment, indexing, and assignment? (or are these operations too light to make a performance monitor?)

cc @jhamman @shoyer

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1771/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1277.607ms · About: xarray-datasette