home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

18 rows where author_association = "MEMBER", issue = 179052741 and user = 1217238 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • shoyer · 18 ✖

issue 1

  • WIP: Optional indexes (no more default coordinates given by range(n)) · 18 ✖

author_association 1

  • MEMBER · 18 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
272670336 https://github.com/pydata/xarray/pull/1017#issuecomment-272670336 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI3MjY3MDMzNg== shoyer 1217238 2017-01-15T03:09:05Z 2017-01-15T03:09:05Z MEMBER

@fmaussion has raised some concerns about the new repr in #1199

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
267222684 https://github.com/pydata/xarray/pull/1017#issuecomment-267222684 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2NzIyMjY4NA== shoyer 1217238 2016-12-15T02:40:42Z 2016-12-15T02:40:42Z MEMBER

OK, in it goes!

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
267216439 https://github.com/pydata/xarray/pull/1017#issuecomment-267216439 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2NzIxNjQzOQ== shoyer 1217238 2016-12-15T02:08:42Z 2016-12-15T02:08:57Z MEMBER

or use another symbol like o that means 'no index'?

This seems like the best alternative to me. I don't like omitting the variable name because it seems that it might fall under the previous row, like a level in the MultiIndex repr.

Done. Any further concerns? I'd really like to merge this and then get the 0.9 release out shortly after.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
266015193 https://github.com/pydata/xarray/pull/1017#issuecomment-266015193 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2NjAxNTE5Mw== shoyer 1217238 2016-12-09T13:36:12Z 2016-12-09T13:36:12Z MEMBER

or use another symbol like o that means 'no index'?

This seems like the best alternative to me. I don't like omitting the variable name because it seems that it might fall under the previous row, like a level in the MultiIndex repr.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
265702329 https://github.com/pydata/xarray/pull/1017#issuecomment-265702329 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2NTcwMjMyOQ== shoyer 1217238 2016-12-08T10:07:02Z 2016-12-08T10:07:02Z MEMBER

maybe you could print the dummy coord (as in my example) if there's one or more real coord, and don't print the coords block at all if there isn't any (as in your example)? The problem of readability only happens when there's some coords - so one needs to go look at the dims and notice that there's more than meets the eye. When there's no coords at all, the only place to look at is the dims, so I think it's fairly readable.

Done -- missing coordinates are marked by - for the dtype/values, as long as there is at least one.

Are there any other concerns before I merge this?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
264825732 https://github.com/pydata/xarray/pull/1017#issuecomment-264825732 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2NDgyNTczMg== shoyer 1217238 2016-12-05T11:02:34Z 2016-12-05T11:02:34Z MEMBER

Another option: finally add a boolean drop keyword argument to isel/sel/squeeze (#242). Then the original example becomes v1.sel(somedim=somevalue, drop=True), which we can make work regardless of whether or not a coordinate value for somedim exists.

See #1153

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
261754050 https://github.com/pydata/xarray/pull/1017#issuecomment-261754050 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2MTc1NDA1MA== shoyer 1217238 2016-11-20T02:29:20Z 2016-11-20T02:29:20Z MEMBER

Another option: finally add a boolean drop keyword argument to isel/sel/squeeze (https://github.com/pydata/xarray/issues/242). Then the original example becomes v1.sel(somedim=somevalue, drop=True), which we can make work regardless of whether or not a coordinate value for somedim exists.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
261689325 https://github.com/pydata/xarray/pull/1017#issuecomment-261689325 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2MTY4OTMyNQ== shoyer 1217238 2016-11-19T03:07:40Z 2016-11-19T03:07:40Z MEMBER

drop in pandas has a keyword argument errors='raise' (or errors='ignore') that lets it switch to work like discard. A separate discard method feels cleaner to me, but I can understand an argument for consistency with pandas. Thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
261688345 https://github.com/pydata/xarray/pull/1017#issuecomment-261688345 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2MTY4ODM0NQ== shoyer 1217238 2016-11-19T02:50:12Z 2016-11-19T02:51:09Z MEMBER

I tried adding * to the repr for dimension names with coordinates. This doesn't work as well for Dataset, because the column with the dimensions no longer lines up:

<xarray.Dataset> Dimensions: (*dim2: 9, *dim3: 10, *time: 20) Coordinates: * time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 ... * dim2 (dim2) float64 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 * dim3 (dim3) %s 'a' 'b' 'c' 'd' 'e' 'f' 'g' 'h' 'i' 'j' numbers (dim3) int64 0 1 2 0 0 1 1 2 2 3 Data variables: var3 (dim3, dim1) float64 0.5565 -0.2121 0.4563 1.545 -0.2397 0.1433 ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
260802408 https://github.com/pydata/xarray/pull/1017#issuecomment-260802408 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2MDgwMjQwOA== shoyer 1217238 2016-11-15T23:20:53Z 2016-11-15T23:21:06Z MEMBER

@crusaderky thanks for giving this a try!

RE: missing dimension in the repr

I like the idea of mirroring * from the coordinate list in the list of dimensions.

I'm less sure about adding markers for empty dimensions to coordinates. That makes for a much longer repr for some simple examples, e.g.,

<xarray.DataArray (x: 3, y: 4)> array([[ 0.8167696 , 0.15151986, 0.81139993, 0.33878428], [ 0.96861902, 0.34231084, 0.55831466, 0.92723981], [ 0.16737575, 0.32391949, 0.39093643, 0.64267858]])

vs

<xarray.DataArray (x: 3, y: 4)> array([[ 0.8167696 , 0.15151986, 0.81139993, 0.33878428], [ 0.96861902, 0.34231084, 0.55831466, 0.92723981], [ 0.16737575, 0.32391949, 0.39093643, 0.64267858]]) Coordinates: * x (x) - * y (y) -

RE: drop

I understand the annoyance of v1.sel(somedim=somevalue).drop('somedim') no longer working. Unfortunately, I don't think we can make drop handle former dimensions differently, because after v1.sel(somedim=somevalue) we no longer have any representation of the fact that the dimension was indexed in the xarray data model.

What we could do is add a separate discard method (sort of like set.discard), which works like drop but ignores missing labels instead of raising an error.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
260104325 https://github.com/pydata/xarray/pull/1017#issuecomment-260104325 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI2MDEwNDMyNQ== shoyer 1217238 2016-11-12T06:04:14Z 2016-11-12T06:04:14Z MEMBER

I've gone through the docs updated every mention I could find of default indexes. So, I think this really is nearly ready to merge now -- review would be highly appreciated. I've added renderings of a few choice sections of the docs to the top post.

The last remaining design decision is how to handle the transition. I would really prefer to avoid a deprecation cycle involving issuing warnings in new users' first encounter with xarray. This means that dependent libraries will need to be updated if this causes them to break (which I think is likely).

My best idea is to issue a "beta" release, write a mailing list post and give people lots of time to test and update their packages (a few weeks to a month).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
258752617 https://github.com/pydata/xarray/pull/1017#issuecomment-258752617 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI1ODc1MjYxNw== shoyer 1217238 2016-11-07T05:50:06Z 2016-11-12T05:54:55Z MEMBER

This is ready for review. (The failing test is unrelated -- a regression in dask.array.)

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
254834925 https://github.com/pydata/xarray/pull/1017#issuecomment-254834925 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI1NDgzNDkyNQ== shoyer 1217238 2016-10-19T14:45:25Z 2016-10-19T14:45:25Z MEMBER

I have never encountered this case yet but ignoring that dimension seems like a bad idea to me. When I use a.reindex_like(b), I usually mean "make a compatible with b", so I assume the resulting index and shape is the same than (or at least compatible with) b. More importantly, I expect to be able to do binary operations between the re-indexed a and b.

@gdementen thanks for the input. I am inclined to agree. Even within the xarray codebase, we basically use x = y.reindex_like(z) as cleaner spelling for x, _ = align(y, z, join='left').

Given that, I would go with either an error, or treat the missing index like it was range() in this case, ie fill the extra values with NAN.

I think we want the error here, given that this is one of the major motivations for allowing missing coordinate labels (not assuming invalid labels).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
254245020 https://github.com/pydata/xarray/pull/1017#issuecomment-254245020 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI1NDI0NTAyMA== shoyer 1217238 2016-10-17T15:40:00Z 2016-10-17T15:40:00Z MEMBER

New design question:

What should the new behavior for reindex_like be, if the argument has dimensions of different sizes but no dimension labels? Should we raise an error, or simply ignore these dimensions? e.g.,

array1 = xr.DataArray([1, 2], dims='x') array2 = xr.DataArray([3, 4, 5], dims='x') array1.reindex_like(array2)

array2.indexes will be an empty dict, which, given that reindex_like is basically an alias for .reindex(**other.indexes) suggests that we shouldn't raise an error. But align does currently align an error in such cases, unless the dimension is explicitly excluded with exclude.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
253683651 https://github.com/pydata/xarray/pull/1017#issuecomment-253683651 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI1MzY4MzY1MQ== shoyer 1217238 2016-10-14T01:13:30Z 2016-10-14T01:13:30Z MEMBER

@benbovy This is actually a good use case for no dimension labels. E.g., from my working branch:

``` In [7]: da = xr.DataArray(z, dims=('x', 'y'), name='z')

In [8]: dz_dx_xarray = da[1:, :] - da[:-1, :]

In [9]: dz_dx dz_dx_numpy dz_dx_xarray

In [9]: dz_dx_xarray Out[9]: <xarray.DataArray 'z' (x: 4, y: 5)> array([[ 0.15224392, -0.03428312, -0.10936435, 0.06149288, -0.69317859], [-0.61928605, 0.71636887, -0.05578677, -0.39489466, 0.63472963], [ 0.05180684, -0.72471438, 0.64259117, 0.24830877, -0.24006862], [ 0.44981358, 0.19054462, -0.69880118, -0.20120161, 0.08580928]])

In [10]: da.diff('x') Out[10]: <xarray.DataArray 'z' (x: 4, y: 5)> array([[ 0.15224392, -0.03428312, -0.10936435, 0.06149288, -0.69317859], [-0.61928605, 0.71636887, -0.05578677, -0.39489466, 0.63472963], [ 0.05180684, -0.72471438, 0.64259117, 0.24830877, -0.24006862], [ 0.44981358, 0.19054462, -0.69880118, -0.20120161, 0.08580928]]) ```

This does depend on the details of how .diff is implemented though. It does a loop over variables.items() instead of explicitly accessing variables corresponding to dimensions.

e.g., array.coords['x'] would return a DataArray with values range(n) (importantly, this would not change the original array). That sounds a bit weird to me (I'm not sure to understand, actually). What are the reasons/benefits of returning a DataArray instead of raising a KeyError?

The (theoretical) benefit would be an easier transition, because previously ds[dim] always worked.

As I'm working through porting xarray's test suite, I'm realizing that this may not be the best approach. If a user is relying on ds[dim] or array.coords[dim] always working, they are probably not going to be happy with inadvertently adding new coordinates like range(n) (which implies new alignment semantics). It might be a better idea to force a hard transition.

Either way, the functionality in your reset_index() PR is going to be really essential, because these range(n) indexes are going to show up inadvertently sometimes, such as converting pandas objects or .unstack().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
249908900 https://github.com/pydata/xarray/pull/1017#issuecomment-249908900 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI0OTkwODkwMA== shoyer 1217238 2016-09-27T15:54:15Z 2016-09-27T15:54:15Z MEMBER

How does one select / slice a dataarray with no index? Does isel work but not sel?

isel will definitely work unchanged.

For .sel, there are two choices: - Raise a TypeError about how an index is required (this would be the strictest option) - Just pass on the key arguments for dimension without an index unchanged on to .isel (this is the approach I currently prefer, because it's a bit more convenient and also preserves backwards compatibility)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
249618769 https://github.com/pydata/xarray/pull/1017#issuecomment-249618769 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI0OTYxODc2OQ== shoyer 1217238 2016-09-26T16:17:28Z 2016-09-26T16:18:55Z MEMBER

@gdementen Thanks for chiming in! Yes, in practice I think "no index" for xarray will work almost exactly the same as your "wildcard index".

The only advantage I can think of now (except it was easier for me to implement it this way) of having a "wildcard axis" instead of no index/labels at all is that a subset could keep the information about which "tick" it comes from (again without blocking alignment). Not sure it's worth it though (I have actually not implemented it this way yet, but I was contemplating doing so).

I'm not a fan of this one. It's a perpetual minor annoyance with pandas to subset a meaningless range(n) index only to get non-sensical tick labels. Also, keeping track of tick labels but not using them for alignment makes it less obvious that a dimension doesn't have an index.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741
249435537 https://github.com/pydata/xarray/pull/1017#issuecomment-249435537 https://api.github.com/repos/pydata/xarray/issues/1017 MDEyOklzc3VlQ29tbWVudDI0OTQzNTUzNw== shoyer 1217238 2016-09-25T17:51:45Z 2016-09-26T04:55:13Z MEMBER

why make it the new default instead of using a keyword (something like no_index=True)?

Basically, this is about the user experience and first impressions.

There are very few cases when somebody would prefer a default index to no index at all, so I see few cases for no_index=False in the long term. It seems cleaner to simply spell this as coords={'x': range(n)}.

From the experience of new users, it's really nice to be able to incrementally try out features from a new library. Seeing extra information appear in the data model that they didn't add makes people (rightfully) nervous, because they don't know how it will work yet.

without coordinates the majority of xarray's functions are not working anymore. So what will xarray have that numpy doesn't have already? (the only thing I could think of is labeled dimensions, but there are probably more use cases?)

Labeled dimensions without coordinate labels actually get you plenty. You get better broadcasting, aggregation (e.g., .sum('x')) and even indexing (e.g., .isel(x=0)).

But the big advantage is the ability to cleanly mix dimensions with and without indexes on the same objects, which covers several more use cases for labeled arrays. Examples off hand include images (see the example from my first post) and machine learning models (where columns usually have labels corresponding to features but rows often are simply unlabeled examples).

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  WIP: Optional indexes (no more default coordinates given by range(n)) 179052741

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 219.046ms · About: xarray-datasette