home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

8 rows where repo = 13221727, state = "closed" and "updated_at" is on date 2021-02-16 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: user, comments, updated_at, closed_at, author_association, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 5
  • pull 3

state 1

  • closed · 8 ✖

repo 1

  • xarray · 8 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
771198437 MDExOlB1bGxSZXF1ZXN0NTQyNzk3NDYy 4711 Adding vectorized indexing docs EricKeenan 44210245 closed 0     3 2020-12-18T22:10:49Z 2021-02-16T23:37:30Z 2021-02-16T23:37:30Z CONTRIBUTOR   0 pydata/xarray/pulls/4711
  • [x] closes #4630, closes #3768

4630: Adds a new vectorized indexing example to sel docstring and narrative docs.

Thanks to @dcherian for introducing me to vectorized indexing and @keewis for providing some information to get started. Also thanks to the community for the excellent contribution guide. http://xarray.pydata.org/en/stable/contributing.html

Am I missing anything here? Or is there anything that can be improved? I'm happy to see this through - thanks in advance for any feedback/tips!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4711/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
564555854 MDU6SXNzdWU1NjQ1NTU4NTQ= 3768 Pointwise indexing ivirshup 8238804 closed 0     6 2020-02-13T09:39:27Z 2021-02-16T23:37:29Z 2021-02-16T23:37:29Z NONE      

MCVE Code Sample

```python import xarray as xr import numpy as np

da = xr.DataArray( np.arange(56).reshape((7, 8)), coords={ 'x': list('abcdefg'), 'y': 10 * np.arange(8) }, dims=['x', 'y'] )

Shouldn't this be (2,)?

assert da.isel(x=[0, 1], y=[0, 1]).shape == (2, 2) ```

Expected Output

I had expected da.isel(x=[0, 1], y=[0, 1]) to have shape (2,). I had generally expected indexing with isel to behave more like numpy indexing. It's very possible I'm just missing something, or that this is more of a documentation issue more than a behavior issue.

Problem Description

Going off this example in #507:

python In [3]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points') Out[3]: <xray.DataArray (points: 3)> array([ 0, 9, 48]) Coordinates: y (points) int64 0 10 0 x (points) |S1 'a' 'b' 'g' * points (points) int64 0 1 2

and the deprecation of isel_points with isel, I had expected to get numpy-like coordinate indexing using isel.

This was made a little bit more confusing by the documentation for setting values by index. In particular the example:

```python In [68]: da[ind_x, ind_y] = -2 # assign -2 to (ix, iy) = (0, 0) and (1, 1)

In [69]: da Out[69]: <xarray.DataArray (x: 3, y: 4)> array([[-2, -2, -1, -1], [-2, -2, 6, 7], [ 8, 9, 10, 11]]) ```

To me, the comment # assign -2 to (ix, iy) = (0, 0) and (1, 1) makes it sound like values will be assigned at the coordinates (0, 0) and (1, 1), not (0, 0), (0, 1), (1, 0), and (1, 1).

All in all, I'm not sure if this is a bug, or an issue with documentation. If isel is not meant to behave like isel_points, it would be nice to see that in the documentation. If it is possible to get and set points by coordinate (without looping over single coordinates) it would be nice to see an example in the documentation where that's shown.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 (default, Jan 4 2020, 12:18:30) [Clang 11.0.0 (clang-1100.0.33.16)] python-bits: 64 OS: Darwin OS-release: 19.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.2 libnetcdf: 4.6.3 xarray: 0.15.0 pandas: 1.0.1 numpy: 1.18.1 scipy: 1.4.1 netCDF4: 1.5.2 pydap: None h5netcdf: 0.7.4 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.3.4 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.9.2 distributed: 2.9.3 matplotlib: 3.1.3 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 45.2.0 pip: 20.0.2 conda: None pytest: 5.3.4 IPython: 7.11.1 sphinx: 2.3.1
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3768/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
753874419 MDU6SXNzdWU3NTM4NzQ0MTk= 4630 .sel(...., method='nearest') fails for large requests. EricKeenan 44210245 closed 0     8 2020-11-30T23:20:18Z 2021-02-16T23:37:29Z 2021-02-16T23:37:29Z CONTRIBUTOR      

A common usage of xarray is to retrieve climate model data from the grid cells closest to a weather station. That might look like this

``` import xarray as xr import numpy as np

ds = xr.tutorial.open_dataset("air_temperature")

Define taget latitude and longitude

tgt_lat = np.linspace(0, 100, num=10) tgt_lon = np.linspace(0, 100, num=10)

Retrieve data at target latitude and longitude

tgt_data = ds['air'].sel(lon=tgt_lon, lat=tgt_lat, method='nearest') ```

My problem is that I am trying subset ds to 10 points in space (which is the length of tgt_lat and tgt_lon), but in fact xarray retrieves 100 points (10 latitude by 10 longitude). I can get around this by calling tgt_data = tgt_data.values.diagonal(). But this results in a non-xarray object. Furthermore, if instead of querying for 10 points in space, I query for 10,000, I run out of memory because xarray retrieves 100,000,000 points in space (10,000^2).

Is there a way to only retrieve the diagonal elements? If not, is this something that should be added?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4630/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
788398518 MDExOlB1bGxSZXF1ZXN0NTU2OTE3MDIx 4823 Allow fsspec URLs in open_(mf)dataset martindurant 6042212 closed 0     20 2021-01-18T16:22:35Z 2021-02-16T21:26:53Z 2021-02-16T21:18:05Z CONTRIBUTOR   0 pydata/xarray/pulls/4823
  • [x] Closes #4461 and related
  • [x] Tests added
  • [x] Passes pre-commit run --all-files
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4823/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
603309899 MDU6SXNzdWU2MDMzMDk4OTk= 3985 xarray=1.15.1 regression: Groupby drop multi-index DancingQuanta 8419157 closed 0     4 2020-04-20T15:05:51Z 2021-02-16T15:59:46Z 2021-02-16T15:59:46Z NONE      

I have written a function process_stacked_groupby that stack all but one dimension of a dataset/dataarray and perform groupby-apply-combine on the stacked dimension. However, after upgrading to 0.15.1, the function cease to work.

MCVE Code Sample

```python import xarray as xr

Dimensions

N = xr.DataArray(np.arange(100), dims='N', name='N') reps = xr.DataArray(np.arange(5), dims='reps', name='reps') horizon = xr.DataArray([1, -1], dims='horizon', name='horizon') horizon.attrs = {'long_name': 'Horizonal', 'units': 'H'} vertical = xr.DataArray(np.arange(1, 4), dims='vertical', name='vertical') vertical.attrs = {'long_name': 'Vertical', 'units': 'V'}

Variables

x = xr.DataArray(np.random.randn(len(N), len(reps), len(horizon), len(vertical)), dims=['N', 'reps', 'horizon', 'vertical'], name='x') y = x * 0.1 y.name = 'y'

Merge x, y

data = xr.merge([x, y])

Assign coords

data = data.assign_coords(reps=reps, vertical=vertical, horizon=horizon)

Function that stack all but one diensions and groupby over the stacked dimension.

def process_stacked_groupby(ds, dim, func, *args):

# Function to apply to stacked groupby
def apply_fn(ds, dim, func, *args):

    # Get groupby dim
    groupby_dim = list(ds.dims)
    groupby_dim.remove(dim)
    groupby_var = ds[groupby_dim]

    # Unstack groupby dim
    ds2 = ds.unstack(groupby_dim).squeeze()

    # perform function
    ds3 = func(ds2, *args)

    # Add mulit-index groupby_var to result
    ds3 = (ds3
           .reset_coords(drop=True)
           .assign_coords(groupby_var)
           .expand_dims(groupby_dim)
         )
    return ds3

# Get list of dimensions
groupby_dims = list(ds.dims)

# Remove dimension not grouped
groupby_dims.remove(dim)

# Stack all but one dimensions
stack_dim = '_'.join(groupby_dims)
ds2 = ds.stack({stack_dim: groupby_dims})

# Groupby and apply
ds2 = ds2.groupby(stack_dim, squeeze=False).map(apply_fn, args=(dim, func, *args))

# Unstack
ds2 = ds2.unstack(stack_dim)

# Restore attrs
for dim in groupby_dims:
    ds2[dim].attrs = ds[dim].attrs

return ds2

Function to apply on groupby

def fn(ds): return ds

Run groupby with applied function

data.pipe(process_stacked_groupby, 'N', fn) ```

Expected Output

Prior to xarray=0.15.0, the above code produce a result that I wanted.

The function should be able to 1. stack chosen dimensions 2. groupby the stacked dimension 3. apply a function on each group a. The function actually passes along another function with unstacked group coord b. Add multi-index stacked group coord back to the results of this function 4. combine the groups 5. Unstack stacked dimension

Problem Description

After upgrading to 0.15.1, the above code stopped working. The error occurred at the line # Unstack ds2 = ds2.unstack(stack_dim) with ValueError: cannot unstack dimensions that do not have a MultiIndex: ['horizon_reps_vertical']. This is on 5th step where the resulting combined object was found not to contain any multi-index. Somewhere in the 4th step, the combination of groups have lost the multi-index stacked dimension.

Versions

0.15.1

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3985/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
709187212 MDExOlB1bGxSZXF1ZXN0NDkzMjkyOTIw 4461 Allow fsspec/zarr/mfdataset martindurant 6042212 closed 0     18 2020-09-25T18:14:38Z 2021-02-16T15:36:54Z 2021-02-16T15:36:54Z CONTRIBUTOR   0 pydata/xarray/pulls/4461

Requires https://github.com/zarr-developers/zarr-python/pull/606

  • [ ] ~Closes #xxxx~
  • [x] Tests added
  • [x] Passes isort . && black . && mypy . && flake8
  • [x] User visible changes (including notable bug fixes) are documented in whats-new.rst
  • [x] New functions/methods are listed in api.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4461/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
496809167 MDU6SXNzdWU0OTY4MDkxNjc= 3332 Memory usage of `da.rolling().construct` fjanoos 923438 closed 0     5 2019-09-22T17:35:06Z 2021-02-16T15:00:37Z 2021-02-16T15:00:37Z NONE      

If I were to do data_array.rolling( time=1000 ).construct('temp_time') - what is going on under hood ? Does it make a 1000 phyiscal copies of the original dataarray - or is it only returning a view ? I feel like it's the latter - but I'm seeing a memory spike (about 20-30% increase in total process memory consumption) when I use it - so there might be something else going on ? Any ideas / pointers would be appreciated. Thanks!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3332/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
748229907 MDU6SXNzdWU3NDgyMjk5MDc= 4598 Calling pd.to_datetime on cftime variable raybellwaves 17162724 closed 0     4 2020-11-22T12:14:27Z 2021-02-16T02:42:35Z 2021-02-16T02:42:35Z CONTRIBUTOR      

It would be nice to be able to convert cftime variables to pandas datetime to utilize the functionality there.

I understand this is an upstream issue as pandas probably isn't aware of cftime. However, i'm curious if a method could be added to cftime such as .to_dataframe().

I've found pd.to_datetime(np.datetime64(date_cf)) is the best way to do this currently.

``` import xarray as xr import numpy as np import pandas as pd

date_str = '2020-01-01' date_np = np.datetime64(date_str)

date_np numpy.datetime64('2020-01-01') date_pd = pd.to_datetime(date_np) date_pd Timestamp('2020-01-01 00:00:00')

date_cf = xr.cftime_range(start=date_str, periods=1)[0] pd.to_datetime(date_cf)

pd.to_datetime(date_cf) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 830, in to_datetime result = convert_listlike(np.array([arg]), format)[0] File "/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/tools/datetimes.py", line 459, in _convert_listlike_datetimes result, tz_parsed = objects_to_datetime64ns( File "/home/ray/local/bin/anaconda3/envs/a/lib/python3.8/site-packages/pandas/core/arrays/datetimes.py", line 2044, in objects_to_datetime64ns result, tz_parsed = tslib.array_to_datetime( File "pandas/_libs/tslib.pyx", line 352, in pandas._libs.tslib.array_to_datetime File "pandas/_libs/tslib.pyx", line 579, in pandas._libs.tslib.array_to_datetime File "pandas/_libs/tslib.pyx", line 718, in pandas._libs.tslib.array_to_datetime_object File "pandas/_libs/tslib.pyx", line 552, in pandas._libs.tslib.array_to_datetime TypeError: <class 'cftime._cftime.DatetimeGregorian'> is not convertible to datetime ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4598/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1129.468ms · About: xarray-datasette