issue_comments
10 rows where author_association = "MEMBER", issue = 324350248 and user = 35968931 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Concatenate across multiple dimensions with open_mfdataset · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
505505980 | https://github.com/pydata/xarray/issues/2159#issuecomment-505505980 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDUwNTUwNTk4MA== | TomNicholas 35968931 | 2019-06-25T15:50:33Z | 2019-06-25T15:50:33Z | MEMBER | Closed by #2616 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
437579539 | https://github.com/pydata/xarray/issues/2159#issuecomment-437579539 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDQzNzU3OTUzOQ== | TomNicholas 35968931 | 2018-11-10T12:10:00Z | 2018-11-10T12:10:00Z | MEMBER | @shoyer see my PR trying to implement this (#2553). Inputting a list of lists into 1) I could try to somehow generalise all of the list comprehensions in 2) Write some kind of recursive iterator function which would allow me to apply the 3) Separate the logic so that the input is assumed to be a flat list unless 4) Always recursively flatten the input before opening the files, but store the structure somehow |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
435706762 | https://github.com/pydata/xarray/issues/2159#issuecomment-435706762 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDQzNTcwNjc2Mg== | TomNicholas 35968931 | 2018-11-04T21:10:55Z | 2018-11-05T00:56:06Z | MEMBER |
This is fine though right? We can do all of this, because it should compartmentalise fairly easily shouldn't it? You end up with logic like: ```python def auto_combine(ds_sequence, infer_order_from_coords=True, check_alignment=True): if check_alignment: # Check alignment along non-concatenated dimensions (your (2))
```
We don't need to, but I don't think it would be that hard (if the structure above is feasible), and I think it's a common use case. Also there's an argument for putting in special effort to generalize this function as much as possible, because it lowers the barrier to entry for xarray for new users. Though perhaps I'm just biased because it happens to be my use case... Also if we know what form the tile_IDs should take then I can write the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
435336049 | https://github.com/pydata/xarray/issues/2159#issuecomment-435336049 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDQzNTMzNjA0OQ== | TomNicholas 35968931 | 2018-11-02T10:29:24Z | 2018-11-02T11:07:17Z | MEMBER | I was thinking about the general solution to this problem again and wanted to clarify some things. Currently I think that any general multi-dimensional version of the 1) If possible use the values in the dimension indexes to arrange the datasets so that the indexes are monotonic, 2) Else issue a warning that some of the indexes supplied are not monotonic, 3) Then instead concatenate the supplied datasets in the order supplied (for some N-dimensional definition of "order"). The warning should tell the user that's what it's doing. This approach would then be backwards-compatible, accommodate users whose data does not have monotonic indexes (they would just have to arrange their datasets into the correct order themselves first), while still doing the obviously correct thing in unambiguous cases. However this would mean that users wanting to do a multi-dimensional Also I'm assuming we are not going to provide functionality to handle uneven sub-lists, e.g. Edit:I've just realised that there is a lot of related discussion in #2039, #1385, & #1823. I suppose what I'm suggesting here is essentially the N-D generalisation of the approach discussed in those issues, namely an extra argument |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
427892990 | https://github.com/pydata/xarray/issues/2159#issuecomment-427892990 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDQyNzg5Mjk5MA== | TomNicholas 35968931 | 2018-10-08T16:12:06Z | 2018-10-08T16:12:06Z | MEMBER | Thanks @shoyer for the description of how this should be done properly. In the meantime however, I thought I would describe how I solved the problem in my last comment. My method works but you probably wouldn't want to use it in xarray itself because it's pretty "hacky". To avoid the issue of numpy reading the
def _concat_nd(obj_grid, concat_dims=None, data_vars=None, kwargs): # Combine datasets along one dimension at a time, # Have to start with last axis and finish with axis=0 otherwise axes will disappear before the loop reaches them for axis in reversed(range(obj_grid.ndim)): obj_grid = np.apply_along_axis(_concat_dicts, axis, arr=obj_grid, dim=concat_dims[axis], data_vars=data_vars[axis], kwargs)
def _concat_dicts(dict_objs, dim, data_vars, kwargs): objs = [dict_obj['key'] for dict_obj in dict_objs] return {'key': concat(objs, dim, data_vars, kwargs)} ``` In case anyone is interested then this is how I've (hopefully temporarily) solved the N-D concatenation problem in the case of my data. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
417802225 | https://github.com/pydata/xarray/issues/2159#issuecomment-417802225 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDQxNzgwMjIyNQ== | TomNicholas 35968931 | 2018-08-31T22:12:28Z | 2018-08-31T22:16:15Z | MEMBER | I started having a go at writing the second half of this - the "n-dimensional-concatenation" function which would accept a grid of xarray.DataSet/DataArray objects (assumed to be in the correct order along all dimensions), and return a single merged dataset. However, I think there's an issue with using
My plan was to call def concat_nd(obj_grid, concat_dims=None): """ Concatenates a structured ndarray of xarray Datasets along multiple dimensions.
I think this is because even just the idea of having a ndarray containing xarray datasets seems to cause problems - if I do it with a single item then xarray thinks I'm trying to convert the Dataset into a numpy array and throws the same error:
and if I do it with multiple items then numpy will dive down and extract the variables in the dataset instead of just storing a reference to the dataset:
Is this the intended behaviour of xarray? Does this mean I can't use numpy arrays of xarray objects at all for this problem? If so then what structure do you think I should use instead (list of lists etc.)? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
412177726 | https://github.com/pydata/xarray/issues/2159#issuecomment-412177726 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDQxMjE3NzcyNg== | TomNicholas 35968931 | 2018-08-10T19:08:56Z | 2018-08-11T00:09:28Z | MEMBER | I've been looking through the functions The current behaviour isn't completely explicit, and I would like to check my understanding with a few questions: 1) If you 2) Although
will only organise the datasets into groups according to the set of dimensions they have, it doesn't order the datasets within each group according to the values in the dimension coordinates? We can show this because this (new) testcase fails: ```python @requires_dask def test_auto_combine_along_coords(self): # drop the third dimension to keep things relatively understandable data = create_test_data() for k in list(data.variables): if 'dim3' in data[k].dims: del data[k]
``` with output
3) So the call to
4) Therefore what needs to be done here is the Also, ```python User specifies how they split up their domaindomain_decomposition_structure = how_was_this_parallelized('output.*.nc') Feeds this info into open_mfdatasetfull_domain = xr.open_mfdataset('output.*.nc', positions=domain_decomposition_structure) ``` This approach would be much less general but would dodge the issue of writing generalized N-D auto-concatenation logic. Final point - this common use case also has the added complexity of having ghost or guard cells around every dataset, which should be thrown away. Clearly some user input is required here ( |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
410357191 | https://github.com/pydata/xarray/issues/2159#issuecomment-410357191 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDQxMDM1NzE5MQ== | TomNicholas 35968931 | 2018-08-03T19:44:32Z | 2018-08-03T19:44:32Z | MEMBER | Thanks @jnhansen ! I actually ended up writing my own, much lower level, version of this using the netcdf library. The reason I did that was because I was finding it hard to work out how to merge multiple datasets, then write the data out to a new netcdf file in chunks - I kept accidentally loading the entire merged dataset into memory at once. This might just be because I wasn't using the dask integration properly though. Have you tried using your function to merge netcdf files, then write out a single file which is larger than RAM? Is that even possible in xarray? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
391512018 | https://github.com/pydata/xarray/issues/2159#issuecomment-391512018 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDM5MTUxMjAxOA== | TomNicholas 35968931 | 2018-05-23T22:07:30Z | 2018-05-23T22:07:30Z | MEMBER | @shoyer At the risk of going off on a tangent - I think that only works if the number of guard cells you want to remove can be determined from the data in the dataset you're loading, because preprocess doesn't accept any further arguments. For example, say you want to remove all ghost cells except the ones at the edge of your simulation domain. If there's no information in each dataset which marks it as a dataset containing a simulation boundary region, then the preprocess function can't know to treat it differently without further arguments. I might be wrong though? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 | |
391501504 | https://github.com/pydata/xarray/issues/2159#issuecomment-391501504 | https://api.github.com/repos/pydata/xarray/issues/2159 | MDEyOklzc3VlQ29tbWVudDM5MTUwMTUwNA== | TomNicholas 35968931 | 2018-05-23T21:25:12Z | 2018-05-23T21:25:12Z | MEMBER | Another suggestion: as one of the obvious uses for this is in collecting the output from parallelized simulations, which always have ghost cells around the domain each processor computes on, would it be worth adding an option to throw those away as the mf dataset is loaded? Or is that a task better dealt with by slicing the resultant array after the fact? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Concatenate across multiple dimensions with open_mfdataset 324350248 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1