github: issue_comments: 8 rows where issue = 566490806 and user = 35968931 sorted by updated

8 rows where issue = 566490806 and user = 35968931 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
866963499	https://github.com/pydata/xarray/issues/3774#issuecomment-866963499	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDg2Njk2MzQ5OQ==	TomNicholas 35968931	2021-06-23T15:58:10Z	2021-06-23T15:58:10Z	MEMBER	Looking back at @shoyer 's comments above I think I'm just going to close this (and the corresponding PR #3982) for now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806
617352770	https://github.com/pydata/xarray/issues/3774#issuecomment-617352770	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDYxNzM1Mjc3MA==	TomNicholas 35968931	2020-04-21T18:59:53Z	2020-04-21T18:59:53Z	MEMBER	I don't think it's a great idea to automatically turn scalar coords into 1d arrays. Fair enough. Those are good reasons. The other choice for situations like this would be to encourage switching to combine_nested for use cases like this The fact that you can solve this use case with `combine_nested` does make this feature a lot less important. We could even put a reference to combine_nested in this error raised by combine_by_coords. The error you get is `ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation` so we could try to automatically detect the coordinates the user might have wanted to combine along, or we can just add something like ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation. To specify dimensions or coordinates to concatenate along, use the `concat_dim` argument to `xarray.combine_nested`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806
616633079	https://github.com/pydata/xarray/issues/3774#issuecomment-616633079	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDYxNjYzMzA3OQ==	TomNicholas 35968931	2020-04-20T15:37:10Z	2020-04-20T15:37:10Z	MEMBER	Suppose there were multiple scalar coordinates that are unique for each variable. How would combine_by_coords pick a dimension to stack along? @shoyer it would expand and stack along both, filling the (many) gaps created with `NaN`s. ```python import xarray as xr data_0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data_0.coords['trial'] = 0 # scalar coords data_0.coords['day'] = 1 data_1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data_1.coords['trial'] = 1 data_1.coords['day'] = 0 both scalar coords will be promoted to dims all_trials = xr.combine_by_coords([data_0, data_1]) print(all_trials) <xarray.Dataset> Dimensions: (day: 2, time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * trial (trial) int64 0 1 * day (day) int64 0 1 Data variables: temperature (day, trial, time) float64 nan nan nan 50.0 ... nan nan nan `The gaps created will be filled in with NaNs`python print(all_trials['temperature'].data) [[[nan nan nan] [50. 60. 70.]] [[10. 20. 30.] [nan nan nan]]] ``` This gap-filling isn't new though - without this PR the same thing already happens with length-1 dimension coords (since PR #3649 - see my comment there) ```python data_0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data_0.coords['trial'] = [0] # 1D dimension coords data_0.coords['day'] = [1] data_1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data_1.coords['trial'] = [1] data_1.coords['day'] = [0] all_trials = xr.combine_by_coords([data_0, data_1]) print(all_trials) <xarray.Dataset> Dimensions: (day: 2, time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * day (day) int64 0 1 * trial (trial) int64 0 1 Data variables: temperature (trial, day, time) float64 nan nan nan 10.0 ... nan nan nan ``` ```python gaps will again be filled in with NaNs print(all_trials['temperature'].data) [[[nan nan nan] [10. 20. 30.]] [[50. 60. 70.] [nan nan nan]]] ``` So all my PR is doing is promoting all scalar coordinates (those which aren't equal across all datasets) to dimension coordinates before combining. There is a chance this could unwittingly increase the overall size of people's datasets (when they have different scalar coordinates in different datasets), but that could already happen since #3649.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806
615985368	https://github.com/pydata/xarray/issues/3774#issuecomment-615985368	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDYxNTk4NTM2OA==	TomNicholas 35968931	2020-04-19T00:02:18Z	2020-04-19T00:02:18Z	MEMBER	I opened a PR to fix this. Also I realised that what I said about dealing with ambiguities of multiple coordinates for the same dimension doesn't make sense. In that situation there is no way to know the user wanted each of the `'trial'` and `'day'` coords to correspond to the same dim, and if you give them each their own dim it's a well-defined combination operation, you just end up combining along an extra dim `'day'` and padding with a lot of `NaN`s. I don't think there's a problem with that behaviour, so I've just left it without any special treatment. (I did make sure it handles the case which caused `TestDask.test_preprocess_mfdataset` to fail though, because that's to do with checking that the coordinates aren't just the same on every dataset.)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806
615953378	https://github.com/pydata/xarray/issues/3774#issuecomment-615953378	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDYxNTk1MzM3OA==	TomNicholas 35968931	2020-04-18T21:33:40Z	2020-04-18T21:33:40Z	MEMBER	@dcherian it looks like the old `auto_combine` does handle this case, but so does the new `combine_nested` (which is intentional, one of the purposes of `combine_nested` is to make it easier to explicitly combine along new dimensions): ```python import xarray as xr from xarray.core.combine import _old_auto_combine data_0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data_0.coords['trial'] = 0 data_1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data_1.coords['trial'] = 1 all_trials = _old_auto_combine([data_0, data_1], concat_dim='trial') print(all_trials) <xarray.Dataset> Dimensions: (time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * trial (trial) int64 0 1 Data variables: temperature (trial, time) int64 10 20 30 50 60 70 `but you can get the same result with`python all_trials = xr.combine_nested([data_0, data_1], concat_dim='trial') print(all_trials) <xarray.Dataset> Dimensions: (time: 3, trial: 2) Coordinates: * time (time) int64 0 1 2 * trial (trial) int64 0 1 Data variables: temperature (trial, time) int64 10 20 30 50 60 70 ``` So technically this issue doesn't need to be closed before completely removing the old `auto_combine`, but we probably should fix this anyway.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806
590925016	https://github.com/pydata/xarray/issues/3774#issuecomment-590925016	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDU5MDkyNTAxNg==	TomNicholas 35968931	2020-02-25T15:30:20Z	2020-02-25T15:30:20Z	MEMBER	is _infer_concat_order_from_coords used in other places than combine_by_coords? `combine_by_coords` can be called by `open_mfdataset`, but other than that no. By the way, there is an additional problem to consider in addition to the above: ambiguity with multiple coordinates for the same dimension. ```python data0 = xr.Dataset({'temperature': ('time', [10,20,30])}, coords={'time': [0,1,2]}) data0.coords['trial'] = 0 data0.coords['day'] = 5 data1 = xr.Dataset({'temperature': ('time', [50,60,70])}, coords={'time': [0,1,2]}) data1.coords['trial'] = 1 data1.coord['day'] = 3 ``` In this example then `combine_by_coords` won't know which coordinate to use to order along the new dimension, should it be `'trial'` or `'day'`? They will give differently-ordered results. We could pass extra optional arguments to deal with this, but for a "magic" function that doesn't seem desirable. It might be better to have some logic like: 1) Is there a dimension coordinate for this dimension? If yes just use that. 2) Otherwise, can the datasets be unambiguously ordered along the dim by looking at non-dimension coords? 3) If yes, do that, if no, throw informative error. That would be backwards-compatible (currently all magic ordering is being done for dimension coords), and not make any arbitrary choices.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806
588178645	https://github.com/pydata/xarray/issues/3774#issuecomment-588178645	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDU4ODE3ODY0NQ==	TomNicholas 35968931	2020-02-19T11:37:58Z	2020-02-19T11:40:25Z	MEMBER	I did get one test failure `TestDask.test_preprocess_mfdataset`, which failed with ``` def test_preprocess_mfdataset(self): original = Dataset({"foo": ("x", np.random.randn(10))}) with create_tmp_file() as tmp: original.to_netcdf(tmp) `def preprocess(ds): return ds.assign_coords(z=0) expected = preprocess(original) with open_mfdataset( tmp, preprocess=preprocess, combine="by_coords" ) as actual:` `assert_identical(expected, actual)` E AssertionError: Left and right Dataset objects are not identical E Differing dimensions: E (x: 10) != (x: 10, z: 1) E Differing coordinates: E L z int64 0 E R * z (z) int64 0 E Differing data variables: E L foo (x) float64 0.07341 -0.01756 0.6903 ... -0.08741 0.08472 -0.1905 E R foo (z, x) float64 dask.array<chunksize=(1, 10), meta=np.ndarray> xarray/tests/test_backends.py:2955: AssertionError ``` I think the problem here is that it's happily promoting the coord to a coordinate dimension even though there is only 1 dataset. We should probably only expand dims which are going to be used in the concatenation...	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806
588165645	https://github.com/pydata/xarray/issues/3774#issuecomment-588165645	https://api.github.com/repos/pydata/xarray/issues/3774	MDEyOklzc3VlQ29tbWVudDU4ODE2NTY0NQ==	TomNicholas 35968931	2020-02-19T11:07:26Z	2020-02-19T11:34:41Z	MEMBER	Hi @kazimuth , thanks for highlighting this! This is a case which I can see the argument for handling, but we apparently didn't think of when implementing `combine_by_coords`. What this really boils down to is "should `combine_by_coords` combine by 0D non-dimension coordinates as well as 1D dimension coordinates?". I think yes, that makes sense. On the other hand, this could be seen as "magic" This function is supposed to be "magic". But the behaviour still needs to be clearly-defined. maybe the code that concatenates dimensions should look for full dimensions first, and point coordinates second Yes - I actually got your example to work with just a few extra lines. Change the top of `_infer_concat_order_from_coords` to read like this: ```python def _infer_concat_order_from_coords(datasets): # All datasets have same variables because they've been grouped as such ds0 = datasets[0] concat_dim_candidates = list(ds0.dims.keys()) # Look for 0D coordinates which can be concatenated over but aren't dims non_dimension_0d_coords = [coord for coord in ds0.coords if coord not in ds0.dims and len(ds0[coord].dims) == 0] for coord in non_dimension_0d_coords: if all(coord in ds.coords for ds in datasets): datasets = [ds.expand_dims(coord) for ds in datasets] concat_dim_candidates.append(coord) concat_dims = [] tile_ids = [() for ds in datasets] for dim in concat_dim_candidates: ... # The rest of the function is unchanged ``` it might also be a breaking change to combine_by_coords; I'm not sure. I don't think it would be breaking... the only cases which would behave differently are ones which would previously would have thrown an error. Would have to think about this though. Might you be interested in submitting a PR along these lines @kazimuth ? It would require some more robust input checks, some tests for different cases, and some minor changes to docstrings I think.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow xr.combine_by_coords to work with point coordinates? 566490806

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

8 rows where issue = 566490806 and user = 35968931 sorted by updated_at descending

both scalar coords will be promoted to dims

gaps will again be filled in with NaNs

Advanced export