home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

16 rows where milestone = 741199 and state = "closed" sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: user, comments, author_association, created_at (date), updated_at (date), closed_at (date)

type 1

  • issue 16

state 1

  • closed · 16 ✖

repo 1

  • xarray 16
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
262642978 MDU6SXNzdWUyNjI2NDI5Nzg= 1603 Explicit indexes in xarray's data-model (Future of MultiIndex) fujiisoup 6815844 closed 0   1.0 741199 68 2017-10-04T01:51:47Z 2022-09-28T09:24:20Z 2022-09-28T09:24:20Z MEMBER      

I think we can continue the discussion we have in #1426 about MultiIndex here.

In comment , @shoyer recommended to remove MultiIndex from public API.

I agree with this, as long as my codes work with this improvement.

I think if we could have a list of possible MultiIndex use cases here, it would be easier to deeply discuss and arrive at a consensus of the future API.

Current limitations of MultiIndex are + It drops scalar coordinate after selection #1408, #1491 + It does not support to serialize to NetCDF #1077 + Stack/unstack behaviors are inconsistent #1431

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1603/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
29136905 MDU6SXNzdWUyOTEzNjkwNQ== 60 Implement DataArray.idxmax() shoyer 1217238 closed 0   1.0 741199 14 2014-03-10T22:03:06Z 2020-03-29T01:54:25Z 2020-03-29T01:54:25Z MEMBER      

Should match the pandas function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/60/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
46756098 MDU6SXNzdWU0Njc1NjA5OA== 266 Easy iteration over slices of a DataArray andreas-h 358378 closed 0   1.0 741199 2 2014-10-24T16:20:51Z 2019-01-15T20:09:35Z 2019-01-15T20:09:34Z CONTRIBUTOR      

The DataArray object would benefit from functionality similar to iris.cube.Cube.slices. Given an array

```

Out23: Coordinates: * sza (sza) float64 0.0 36.87 53.13 60.0 72.54 75.52 81.37 87.13 88.28 * vza (vza) float64 0.0 72.54 * raa (raa) float64 0.0 60.0 90.0 120.0 180.0 * wl (wl) float64 360.0 380.0 400.0 420.0 440.0 ```

it would be nice to be able to do

for sl in data.slices(["raa", "wl"]): # do magic with a DataArray of coordinates (sza, vza)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/266/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
171828347 MDU6SXNzdWUxNzE4MjgzNDc= 974 Indexing with alignment and broadcasting shoyer 1217238 closed 0   1.0 741199 6 2016-08-18T06:39:27Z 2018-02-04T23:30:12Z 2018-02-04T23:30:11Z MEMBER      

I think we can bring all of NumPy's advanced indexing to xarray in a very consistent way, with only very minor breaks in backwards compatibility.

For boolean indexing: - da[key] where key is a boolean labelled array (with any number of dimensions) is made equivalent to da.where(key.reindex_like(ds), drop=True). This matches the existing behavior if key is a 1D boolean array. For multi-dimensional arrays, even though the result is now multi-dimensional, this coupled with automatic skipping of NaNs means that da[key].mean() gives the same result as in NumPy. - da[key] = value where key is a boolean labelled array can be made equivalent to da = da.where(*align(key.reindex_like(da), value.reindex_like(da))) (that is, the three argument form of where). - da[key_0, ..., key_n] where all of key_i are boolean arrays gets handled in the usual way. It is an IndexingError to supply multiple labelled keys if any of them are not already aligned with as the corresponding index coordinates (and share the same dimension name). If they want alignment, we suggest users simply write da[key_0 & ... & key_n].

For vectorized indexing (by integer or index value): - da[key_0, ..., key_n] where all of key_i are integer labelled arrays with any number of dimensions gets handled like NumPy, except instead of broadcasting numpy-style we do broadcasting xarray-style: - If any of key_i are unlabelled, 1D arrays (e.g., numpy arrays), we convert them into an xarray.Variable along the respective dimension. 0D arrays remain scalars. This ensures that the result of broadcasting them (in the next step) will be consistent with our current "outer indexing" behavior. Unlabelled higher dimensional arrays triggers an IndexingError. - We ensure all keys have the same dimensions/coordinates by mapping it to da[*broadcast(key_0, ..., key_n)] (note that broadcast now includes automatic alignment). - The result's dimensions and coordinates are copied from the broadcast keys. - The result's values are taken by mapping each set of integer locations specified by the broadcast version of key_i to the integer position on the corresponding ith axis on da. - Labeled indexing like ds.loc[key_0, ...., key_n] works exactly as above, except instead of doing integer lookup, we lookup label values in the corresponding index instead. - Indexing with .isel and .sel/.reindex works like the two previous cases, except we lookup axes by dimension name instead of axis position. - I haven't fully thought through the implications for assignment (da[key] = value or da.loc[key] = value), but I think it works in a straightforwardly similar fashion.

All of these methods should also work for indexing on Dataset by looping over Dataset variables in the usual way.

This framework neatly subsumes most of the major limitations with xarray's existing indexing: - Boolean indexing on multi-dimensional arrays works in an intuitive way, for both selection and assignment. - No more need for specialized methods (sel_points/isel_points) for pointwise indexing. If you want to select along the diagonal of an array, you simply need to supply indexers that use a new dimension. Instead of arr.sel_points(lat=stations.lat, lon=stations.lon, dim='station'), you would simply write arr.sel(lat=stations.lat, lon=stations.lon) -- the station dimension is taken automatically from the indexer. - Other use cases for NumPy's advanced indexing that currently are impossible in xarray also automatically work. For example, nearest neighbor interpolation to a completely different grid is now as simple as ds.reindex(lon=grid.lon, lat=grid.lat, method='nearest', tolerance=0.5) or ds.reindex_like(grid, method='nearest', tolerance=0.5).

Questions to consider: - How does this interact with @benbovy's enhancements for MultiIndex indexing? (#802 and #947) - How do we handle mixed slice and array indexing? In NumPy, this is a major source of confusion, because slicing is done before broadcasting and the order of slices in the result is handled separately from broadcast indices. I think we may be able to resolve this by mapping slices in this case to 1D arrays along their respective axes, and using our normal broadcasting rules. - Should we deprecate non-boolean indexing with [] and .loc[] and non-labelled arrays when some but not all dimensions are provided? Instead, we would require explicitly indexing like [key, ...] (yes, writing ...), which indicates "all trailing axes" like NumPy. This behavior has been suggested for new indexers in NumPy because it precludes a class of bugs where the array has an unexpected number of dimensions. On the other hand, it's not so necessary for us when we have explicit indexing by dimension name with .sel.

xref these comments from @MaximilianR and myself

Note: I would certainly welcome help making this happen from a contributor other than myself, though you should probably wait until I finish #964, first, which lays important groundwork.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/974/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
107424151 MDU6SXNzdWUxMDc0MjQxNTE= 585 Parallel map/apply powered by dask.array shoyer 1217238 closed 0   1.0 741199 11 2015-09-20T23:27:55Z 2017-10-13T15:58:30Z 2017-10-09T23:26:06Z MEMBER      

Dask is awesome, but it isn't always easy to use it for parallel operations. In many cases, especially when wrapping routines from external libraries, it is most straightforward to express operations in terms of a function that expects and returns xray objects loaded into memory.

Dask array has a map_blocks function/method, but it's applicability is limited because dask.array doesn't have axis names for unambiguously identifying dimensions. da.atop can handle many of these cases, but it's not the easiest to use. Fortunately, we have sufficient metadata in xray that we could probably parallelize many atop operations automatically by inferring result dimensions and dtypes from applying the function once. See here for more discussion on the dask side: https://github.com/blaze/dask/issues/702

So I would like to add some convenience methods for automatic parallelization with dask of a function defined on xray objects loaded into memory. In addition to a map_blocks method/function, it would be useful to add some sort of parallel_apply method to groupby objects that works very similarly, by lazily applying a function that takes and returns xray objects loaded into memory.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/585/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
171077425 MDU6SXNzdWUxNzEwNzc0MjU= 967 sortby() or sort_index() method for Dataset and DataArray shoyer 1217238 closed 0   1.0 741199 8 2016-08-14T20:40:13Z 2017-05-12T00:29:12Z 2017-05-12T00:29:12Z MEMBER      

They should function like the pandas methods of the same name.

Under the covers, I believe it would suffice to simply remap ds.sort_index('time') -> ds.isel(time=ds.indexes['time'].argsort()).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/967/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
42380798 MDU6SXNzdWU0MjM4MDc5OA== 230 set_index(keys, inplace=False) should be both a DataArray and Dataset method. shoyer 1217238 closed 0   1.0 741199 1 2014-09-10T06:03:56Z 2017-02-01T16:57:50Z 2017-02-01T16:57:50Z MEMBER      

originally mentioned in #197.

ideally this will smoothly create multi-indexes as/when necessary (#164), just like the pandas method

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/230/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
124665607 MDU6SXNzdWUxMjQ2NjU2MDc= 700 BUG: not converting series with CategoricalIndex jreback 953992 closed 0   1.0 741199 2 2016-01-03T19:05:59Z 2017-02-01T16:56:56Z 2017-02-01T16:56:56Z MEMBER      

xray 0.6.1

``` In [1]: s = Series(range(5),index=pd.CategoricalIndex(list('aabbc'),name='foo'))

In [4]: xray.DataArray.from_series(s) ValueError: object array method not producing an array ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/700/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
168848449 MDU6SXNzdWUxNjg4NDg0NDk= 931 How to cite xarray in a research paper andreas-h 358378 closed 0 jhamman 2443309 1.0 741199 4 2016-08-02T10:13:09Z 2016-08-04T21:17:53Z 2016-08-04T21:17:53Z CONTRIBUTOR      

It would be helpful if the documentation had an entry (for example, in the FAQ) about how to properly cite xarray for a scientific publication. I personally like the way e.g. the ipython folks are doing it, they have bibtex code to copy'n'paste (see https://ipython.org/citing.html).

This issue is related to #290, but addresses the general problem and not a specific way.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/931/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
109589162 MDU6SXNzdWUxMDk1ODkxNjI= 605 Support Two-Dimensional Coordinate Variables jhamman 2443309 closed 0   1.0 741199 11 2015-10-02T23:27:18Z 2016-07-31T23:02:46Z 2016-07-31T23:02:46Z MEMBER      

The CF Conventions supports the notion of a 2d coordinate variable in the case of irregularly spaced data. An example of this sort of dataset is below. The CF Convention is to add a "coordinates" attribute with a string describing the 2d coordinates.

dimensions: xc = 128 ; yc = 64 ; lev = 18 ; variables: float T(lev,yc,xc) ; T:long_name = "temperature" ; T:units = "K" ; T:coordinates = "lon lat" ; float xc(xc) ; xc:axis = "X" ; xc:long_name = "x-coordinate in Cartesian system" ; xc:units = "m" ; float yc(yc) ; yc:axis = "Y" ; yc:long_name = "y-coordinate in Cartesian system" ; yc:units = "m" ; float lev(lev) ; lev:long_name = "pressure level" ; lev:units = "hPa" ; float lon(yc,xc) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; float lat(yc,xc) ; lat:long_name = "latitude" ; lat:units = "degrees_north" ;

I'd like to discuss how we could support this in xray. There motivating application for this is in plotting operations but it may also have application in other grouping and remapping operations (e.g. #324, #475, #486).

One option would just to honor the "coordinates" attr in plotting and use the specified coordinates as the x/y values.

ref: http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#idp5559280

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/605/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
33559045 MDU6SXNzdWUzMzU1OTA0NQ== 130 Wrap bottleneck for fast moving window aggregations shoyer 1217238 closed 0   1.0 741199 4 2014-05-15T06:42:43Z 2016-02-20T02:35:09Z 2016-02-20T02:35:09Z MEMBER      

Like pandas, we should wrap bottleneck to create fast moving window operations and missing value operation that can be applied to xray data arrays.

As xray is designed to make it straightforward to work with high dimensional arrays, it would be particularly convenient if bottleneck had fast functions for N > 3 dimensions (see kwgoodman/bottleneck/issues/84) but we should wrap bottleneck regardless for functions like rolling_mean, rolling_sum, rolling_min, etc.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/130/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
35956575 MDU6SXNzdWUzNTk1NjU3NQ== 164 Support pandas.MultIndex axes on xray objects shoyer 1217238 closed 0   1.0 741199 0 2014-06-18T05:53:08Z 2016-01-18T00:11:11Z 2016-01-18T00:11:11Z MEMBER      
  • Appropriate casting with xray.Coordinate
  • Call out to MultIndex.get_locs in indexing.convert_label_indexer
  • Get multi-index support working with .loc and .sel()
  • Serialization to NetCDF
  • Consider stack/unstack and pivot like methods

Not all of these would be necessary for an MVP.

~~Right now we don't consider the possibility of a MultiIndex at all -- at the very least it would be nice to give an error message.~~

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/164/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
38109425 MDU6SXNzdWUzODEwOTQyNQ== 185 Plot methods shoyer 1217238 closed 0   1.0 741199 10 2014-07-17T18:07:18Z 2015-08-18T18:25:39Z 2015-08-18T18:25:39Z MEMBER      

It would be awesome to have built in plot methods, similar to pandas.DataFrame.plot and pandas.Series.plot.

Although we could just copy the basic plotting methods from pandas, the strongest need is for cases where there is no corresponding plot methods.

Notably, we should have shortcut methods for plotting 2-dimensional arrays with labels, corresponding to matplotlib's contour/contourf/imshow/pcolormesh. If we include an axis argument, such an API should even suffice for plotting data on a map via cartopy, although it wouldn't hurt to add some optional keyword arguments shortcuts (e.g., proj='orthographic').

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/185/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
59572709 MDU6SXNzdWU1OTU3MjcwOQ== 354 resample method shoyer 1217238 closed 0   1.0 741199 0 2015-03-02T23:55:58Z 2015-03-05T19:29:39Z 2015-03-05T19:29:39Z MEMBER      

This should be a shortcut for .groupby(resampled_times).mean('time') (e.g., this example), with an API similar to resample in pandas.

Something like the following should work: ds.resample('24H', dim='time', how='mean', base=12, label='right').

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/354/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
51070269 MDU6SXNzdWU1MTA3MDI2OQ== 286 Add support for attribute based variable lookups? shoyer 1217238 closed 0   1.0 741199 0 2014-12-05T07:17:31Z 2014-12-24T07:07:24Z 2014-12-24T07:07:24Z MEMBER      

e.g., ds.latitude instead of ds['latitude']

It should include autocomplete support in editors like IPython.

This would make it a little easier to use xray, but isn't a top priority for me to implement right now. Pull requests would be welcome!

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/286/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
39383188 MDU6SXNzdWUzOTM4MzE4OA== 200 Support mathematical operators (+-*/, etc) for Dataset objects shoyer 1217238 closed 0   1.0 741199 0 2014-08-04T00:10:04Z 2014-09-07T04:18:05Z 2014-09-07T04:18:05Z MEMBER      

(Dataset, Dataset) operations like ds - ds should align based on the names of non-coordinates, and then pass all operations off to the DataArray objects. Even when we switch to doing automatic alignment, an exception should be raised if the intersection of non-coordinate names is empty.

(Dataset, DataArray) or (Dataset, ndarray) operations like ds - ds['x'] should simply map over the dataset non-coordinates. Note that this behaved is different from pandas, for which df - df['x'] will usually raise an exception: pandas aligns Series to DataFrame rows, following numpy's broadcasting rules.

This would be a nice complement to Dataset summary methods (#131).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/200/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 24.278ms · About: xarray-datasette