home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

where issue = 324350248 and user = 1217238 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

These facets timed out: author_association, issue

user 1

  • shoyer · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
435733418 https://github.com/pydata/xarray/issues/2159#issuecomment-435733418 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDQzNTczMzQxOA== shoyer 1217238 2018-11-05T02:03:00Z 2018-11-05T02:03:00Z MEMBER

This is fine though right? We can do all of this, because it should compartmentalise fairly easily shouldn't it? You end up with logic like:

Yes, this seems totally fine to me.

We don't need to, but I don't think it would be that hard (if the structure above is feasible), and I think it's a common use case. Also there's an argument for putting in special effort to generalize this function as much as possible, because it lowers the barrier to entry for xarray for new users. Though perhaps I'm just biased because it happens to be my use case...

Sure, no opposition from me if you want to do it! 👍

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
435544853 https://github.com/pydata/xarray/issues/2159#issuecomment-435544853 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDQzNTU0NDg1Mw== shoyer 1217238 2018-11-03T00:22:46Z 2018-11-03T00:22:46Z MEMBER

@TomNicholas I agree with your steps 1/2/3 for open_mfdataset.

My concern with a single prealigned=True argument is that there are two distinct use-cases: 1. Checking coordinates along the dimension(s) being concatenated to determine the result order 2. Checking coordinates along other non-concatenated dimensions to verify consistency

Currently we always do (2) and never do (1).

We definitely want an option to disable (2) for speed, and also want an option to support (1) (what you propose here). But these are distinct use cases -- we probably want to support all permutations of 1/2.

However this would mean that users wanting to do a multi-dimensional auto_combine on data without monotonic indexes would have to supply their datasets in some way that specifies their desired N-dimensional ordering

I'm not sure we need to support this yet -- it would be enough to have keyword argument for falling back to the existing behavior that only supports 1D concatenation in the order provided.

Also I'm assuming we are not going to provide functionality to handle uneven sub-lists, e.g. [[t1x1, t1x2], [t2x1, t2x2, t2x3]]?

Agreed, not important unless someone really wants/needs it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
418476055 https://github.com/pydata/xarray/issues/2159#issuecomment-418476055 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDQxODQ3NjA1NQ== shoyer 1217238 2018-09-04T18:44:35Z 2018-09-04T22:16:34Z MEMBER

NumPy's handling of object arrays is unfortunately inconsistent. So maybe it isn't the best idea to use NumPy arrays for this.

Python's built-in list/dict might be better choices here. Something like: python def concat_nd(datasets): # find the set of dimensions across which to possibly merge # could possibly use OrderedSet here: # https://github.com/pydata/xarray/blob/v0.10.8/xarray/core/utils.py#L401 all_dims = set(ds.dims for ds in datasets) # Create a map from each dimension to a tuple giving the size of each # dimension on an input dataset. Not all collections of datasets have consistent # sizes along each dimension, but the ones we can automatically concatenate do. # I recommend researching how "chunks" work in dask.array: # http://dask.pydata.org/en/latest/array-design.html # http://dask.pydata.org/en/latest/array-chunks.html chunks = {dim: ... for dim in all_dims} # find the sorted, de-duplicated union of all indexes along those dimensions # np.unique followed by wrapping with pd.Index() # might work OK for the "union" function here combined_indexes = {dim: union([ds.indexes[dim] for ds in datasets]) for dim in all_dims} # create a map mapping from "tile id" to dataset # get_indexes() should use pandas.Index.get_indexer to lookup ds.indexes[dim] # in the combined index, e.g., of type Dict[Tuple[int, ...], xarray.Dataset] indexes_to_dataset = {get_indexes(ds, chunks, combined_coords): ds for ds in datasets} # call concat() in a loop to construct the combined dataset

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
416389795 https://github.com/pydata/xarray/issues/2159#issuecomment-416389795 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDQxNjM4OTc5NQ== shoyer 1217238 2018-08-27T22:29:22Z 2018-08-27T22:29:22Z MEMBER

@TomNicholas I think your analysis is correct here.

I suspect that in most cases we could figure out how to tile datasets by looking at 1D coordinates along each dimension (e.g., indexes for each dataset), e.g., to find a "chunk id" along each concatenated dimension.

These could be used to build something like a NumPy object array of xarray.Dataset/DataArray objects, which could split up into a bunch of 1D calls to xarray.concat().

I would rather avoid using the positions argument of concat. It doesn't really add any flexibility compared to reordering the inputs with xarray.core.nputils.inverse_permutation.

Final point - this common use case also has the added complexity of having ghost or guard cells around every dataset, which should be thrown away. Clearly some user input is required here (ghost_cells_x=2, ghost_cells_y=2, ghost_cells_z=0, ...), but I'm really not sure what the best way to fit that kind of logic in is. Yet more arguments to open_mfdataset?

We could potentially just encourage using the existing preprocess argument.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
393626605 https://github.com/pydata/xarray/issues/2159#issuecomment-393626605 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDM5MzYyNjYwNQ== shoyer 1217238 2018-05-31T18:19:32Z 2018-05-31T18:19:32Z MEMBER

@aluhamaa I don't think you're missing anything here. I agree that it would be pretty straightforward, it just would take a bit of work.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
391509672 https://github.com/pydata/xarray/issues/2159#issuecomment-391509672 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDM5MTUwOTY3Mg== shoyer 1217238 2018-05-23T21:57:56Z 2018-05-23T21:57:56Z MEMBER

@TomNicholas I think you could use the existing preprocess argument to open_mfdataset() for that.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248
391499524 https://github.com/pydata/xarray/issues/2159#issuecomment-391499524 https://api.github.com/repos/pydata/xarray/issues/2159 MDEyOklzc3VlQ29tbWVudDM5MTQ5OTUyNA== shoyer 1217238 2018-05-23T21:17:42Z 2018-05-23T21:17:42Z MEMBER

I agree with @jhamman that it would take effort from an interested developer to do this but in principle it's quite doable.

I think our logic in auto_combine (which powers open_mfdataset) could probably be extended to handle concatenation across multiple dimensions. The main implementation would need to look at coordinates along concatenated dimensions to break the operation into multiple calls xarray.concat()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Concatenate across multiple dimensions with open_mfdataset 324350248

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 8640.069ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows