home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

3 rows where type = "issue" and user = 5629061 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 3 ✖

state 1

  • closed 3

repo 1

  • xarray 3
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
149130368 MDU6SXNzdWUxNDkxMzAzNjg= 830 "Reverse" groupby method for split/apply/combine hottwaj 5629061 closed 0     5 2016-04-18T12:00:04Z 2020-10-04T16:06:58Z 2020-10-04T16:06:58Z NONE      

When dealing with high-dimensional data, algorithms often involve operations or aggregation on a particular dimension only, whilst keeping all other dimensions in the dataset.

For example, I might know that I want to average all data along the time axis, and I'm indifferent to the other dimensions present, i.e. I want my algorithm to work whenever there is a time axis, and to be indifferent to the presence/lack of any other dimensions.

Mapping this kind of implementation to xarray is awkward though because I can only use groupby() for the split/apply/combine operation.

For example, in xarray I have to do this:

averages = dataarray.groupby([dimensions excluding time dimension]).apply(my_method_that_works_on_time_dimension)

instead of this (where aggregate_over() is my "reverse" groupby method):

averages = dataarray.aggregate_over([time_dimension]).apply(my_method_that_works_on_time_dimension)

For the first example I have to do some extra work: I have to write additional code to fetch all the dimensions in the array, remove the time dimension from that list, and then use that list with groupby, in order to make my code depend on the time dimension only.

It would be really helpful to add a aggregate_over() method (name TBD of course!) as an alternative to groupby() that automates this extra work.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/830/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
192325490 MDU6SXNzdWUxOTIzMjU0OTA= 1143 timedelta64[D] is always coerced to timedelta64[ns] hottwaj 5629061 closed 0     5 2016-11-29T16:11:53Z 2019-01-22T19:21:18Z 2019-01-22T19:21:18Z NONE      

Hi guys, the following snippets show the issue...

``` xarray.DataArray([1,2,3,4]).astype('timedelta64[D]')

output is

""" <xarray.DataArray (dim_0: 4)> array([ 86400000000000, 172800000000000, 259200000000000, 345600000000000], dtype='timedelta64[ns]') Coordinates: * dim_0 (dim_0) int64 0 1 2 3 """ ```

Compare this with Pandas: ``` pandas.Series([1,2,3,4]).astype('timedelta64[D]')

output is

""" 0 1 days 1 2 days 2 3 days 3 4 days dtype: timedelta64[D] """ ```

This behvaiour becomes more problematic when trying to convert from timedelta[ns] to e.g. days as ints:

``` xarray.DataArray(pandas.Series([1,2,3,4]).astype('timedelta64[D]')).astype(int)

output is

""" <xarray.DataArray (dim_0: 4)> array([ 86400000000000, 172800000000000, 259200000000000, 345600000000000]) Coordinates: * dim_0 (dim_0) int64 0 1 2 3 """ ```

Again contrast that with pandas:

``` pandas.Series([1,2,3,4]).astype('timedelta64[D]').astype(int)

output is

""" 0 1 1 2 2 3 3 4 dtype: int64 """ ```

Other variations of timedelta e.g. timedelta64[s], timedelta64[W] etc suffer from the same problem.

Thanks

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1143/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
207477701 MDU6SXNzdWUyMDc0Nzc3MDE= 1267 "in" operator does not work as expected on DataArray dimensions hottwaj 5629061 closed 0   0.11 2856429 2 2017-02-14T10:35:41Z 2018-10-28T17:56:17Z 2018-10-28T17:56:17Z NONE      

As an example I have a DataArray called "my_dataarray" that looks something like this:

<xarray.DataArray 'values' (Type: 3)> array([1, 2, 3]) Coordinates: * Type (Type) object 'Type 1' 'Type 2' 'Type 3'

'Type' is a dimension on my DataArray. Note that 'Type' is also a DataArray that looks like this:

OrderedDict([('Type', <xarray.IndexVariable 'Type' (Type: 3)> array(['Type 1', 'Type 2', 'Type 3'], dtype='object'))])

Let's say I run:

'Type 1' in my_dataarray.Type

The result is False, even though 'Type 1' is in the "Type" dimension.

To get the result I was expecting I need to run:

'Type 1' in my_dataarray.Type.values

Stepping through the code, the problematic line is here: https://github.com/pydata/xarray/blob/20ec32430fac63a8976699d9528b5fdc1cd4125d/xarray/core/dataarray.py#L487

The test used for __contains__(self, key) on the Type dimension is whether the key is in the _coords of Type.

This is probably the right thing to do when the DataArray is used for storing data, but probably not what we want if the DataArray is being used as a dimension - it should instead check if 'Type 1' is in the values of Type?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1267/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 44.408ms · About: xarray-datasette