github: issue_comments: where issue = 324350248 and user = 1217238 sorted by updated

where issue = 324350248 and user = 1217238 sorted by updated_at descending

Search:

✖

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
435733418	https://github.com/pydata/xarray/issues/2159#issuecomment-435733418	https://api.github.com/repos/pydata/xarray/issues/2159	MDEyOklzc3VlQ29tbWVudDQzNTczMzQxOA==	shoyer 1217238	2018-11-05T02:03:00Z	2018-11-05T02:03:00Z	MEMBER	This is fine though right? We can do all of this, because it should compartmentalise fairly easily shouldn't it? You end up with logic like: Yes, this seems totally fine to me. We don't need to, but I don't think it would be that hard (if the structure above is feasible), and I think it's a common use case. Also there's an argument for putting in special effort to generalize this function as much as possible, because it lowers the barrier to entry for xarray for new users. Though perhaps I'm just biased because it happens to be my use case... Sure, no opposition from me if you want to do it! 👍	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Concatenate across multiple dimensions with open_mfdataset 324350248
435544853	https://github.com/pydata/xarray/issues/2159#issuecomment-435544853	https://api.github.com/repos/pydata/xarray/issues/2159	MDEyOklzc3VlQ29tbWVudDQzNTU0NDg1Mw==	shoyer 1217238	2018-11-03T00:22:46Z	2018-11-03T00:22:46Z	MEMBER	@TomNicholas I agree with your steps 1/2/3 for `open_mfdataset`. My concern with a single `prealigned=True` argument is that there are two distinct use-cases: 1. Checking coordinates along the dimension(s) being concatenated to determine the result order 2. Checking coordinates along other non-concatenated dimensions to verify consistency Currently we always do (2) and never do (1). We definitely want an option to disable (2) for speed, and also want an option to support (1) (what you propose here). But these are distinct use cases -- we probably want to support all permutations of 1/2. However this would mean that users wanting to do a multi-dimensional auto_combine on data without monotonic indexes would have to supply their datasets in some way that specifies their desired N-dimensional ordering I'm not sure we need to support this yet -- it would be enough to have keyword argument for falling back to the existing behavior that only supports 1D concatenation in the order provided. Also I'm assuming we are not going to provide functionality to handle uneven sub-lists, e.g. [[t1x1, t1x2], [t2x1, t2x2, t2x3]]? Agreed, not important unless someone really wants/needs it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Concatenate across multiple dimensions with open_mfdataset 324350248
418476055	https://github.com/pydata/xarray/issues/2159#issuecomment-418476055	https://api.github.com/repos/pydata/xarray/issues/2159	MDEyOklzc3VlQ29tbWVudDQxODQ3NjA1NQ==	shoyer 1217238	2018-09-04T18:44:35Z	2018-09-04T22:16:34Z	MEMBER	NumPy's handling of object arrays is unfortunately inconsistent. So maybe it isn't the best idea to use NumPy arrays for this. Python's built-in list/dict might be better choices here. Something like: python def concat_nd(datasets): # find the set of dimensions across which to possibly merge # could possibly use OrderedSet here: # https://github.com/pydata/xarray/blob/v0.10.8/xarray/core/utils.py#L401 all_dims = set(ds.dims for ds in datasets) # Create a map from each dimension to a tuple giving the size of each # dimension on an input dataset. Not all collections of datasets have consistent # sizes along each dimension, but the ones we can automatically concatenate do. # I recommend researching how "chunks" work in dask.array: # http://dask.pydata.org/en/latest/array-design.html # http://dask.pydata.org/en/latest/array-chunks.html chunks = {dim: ... for dim in all_dims} # find the sorted, de-duplicated union of all indexes along those dimensions # np.unique followed by wrapping with pd.Index() # might work OK for the "union" function here combined_indexes = {dim: union([ds.indexes[dim] for ds in datasets]) for dim in all_dims} # create a map mapping from "tile id" to dataset # get_indexes() should use pandas.Index.get_indexer to lookup ds.indexes[dim] # in the combined index, e.g., of type Dict[Tuple[int, ...], xarray.Dataset] indexes_to_dataset = {get_indexes(ds, chunks, combined_coords): ds for ds in datasets} # call concat() in a loop to construct the combined dataset	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Concatenate across multiple dimensions with open_mfdataset 324350248
416389795	https://github.com/pydata/xarray/issues/2159#issuecomment-416389795	https://api.github.com/repos/pydata/xarray/issues/2159	MDEyOklzc3VlQ29tbWVudDQxNjM4OTc5NQ==	shoyer 1217238	2018-08-27T22:29:22Z	2018-08-27T22:29:22Z	MEMBER	@TomNicholas I think your analysis is correct here. I suspect that in most cases we could figure out how to tile datasets by looking at 1D coordinates along each dimension (e.g., indexes for each dataset), e.g., to find a "chunk id" along each concatenated dimension. These could be used to build something like a NumPy object array of xarray.Dataset/DataArray objects, which could split up into a bunch of 1D calls to `xarray.concat()`. I would rather avoid using the `positions` argument of `concat`. It doesn't really add any flexibility compared to reordering the inputs with `xarray.core.nputils.inverse_permutation`. Final point - this common use case also has the added complexity of having ghost or guard cells around every dataset, which should be thrown away. Clearly some user input is required here (ghost_cells_x=2, ghost_cells_y=2, ghost_cells_z=0, ...), but I'm really not sure what the best way to fit that kind of logic in is. Yet more arguments to open_mfdataset? We could potentially just encourage using the existing `preprocess` argument.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Concatenate across multiple dimensions with open_mfdataset 324350248
393626605	https://github.com/pydata/xarray/issues/2159#issuecomment-393626605	https://api.github.com/repos/pydata/xarray/issues/2159	MDEyOklzc3VlQ29tbWVudDM5MzYyNjYwNQ==	shoyer 1217238	2018-05-31T18:19:32Z	2018-05-31T18:19:32Z	MEMBER	@aluhamaa I don't think you're missing anything here. I agree that it would be pretty straightforward, it just would take a bit of work.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Concatenate across multiple dimensions with open_mfdataset 324350248
391509672	https://github.com/pydata/xarray/issues/2159#issuecomment-391509672	https://api.github.com/repos/pydata/xarray/issues/2159	MDEyOklzc3VlQ29tbWVudDM5MTUwOTY3Mg==	shoyer 1217238	2018-05-23T21:57:56Z	2018-05-23T21:57:56Z	MEMBER	@TomNicholas I think you could use the existing `preprocess` argument to `open_mfdataset()` for that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Concatenate across multiple dimensions with open_mfdataset 324350248
391499524	https://github.com/pydata/xarray/issues/2159#issuecomment-391499524	https://api.github.com/repos/pydata/xarray/issues/2159	MDEyOklzc3VlQ29tbWVudDM5MTQ5OTUyNA==	shoyer 1217238	2018-05-23T21:17:42Z	2018-05-23T21:17:42Z	MEMBER	I agree with @jhamman that it would take effort from an interested developer to do this but in principle it's quite doable. I think our logic in auto_combine (which powers open_mfdataset) could probably be extended to handle concatenation across multiple dimensions. The main implementation would need to look at coordinates along concatenated dimensions to break the operation into multiple calls `xarray.concat()`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Concatenate across multiple dimensions with open_mfdataset 324350248

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);