github: issue_comments: 14 rows where author_association = "MEMBER" and issue = 314764258 sorted by updated

14 rows where author_association = "MEMBER" and issue = 314764258 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
531818131	https://github.com/pydata/xarray/issues/2064#issuecomment-531818131	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUzMTgxODEzMQ==	dcherian 2448579	2019-09-16T15:03:12Z	2019-09-16T15:03:12Z	MEMBER	#3239 has been merged. Now `minimal` is more useful since you can specify `compat="override"` to skip compatibility checking. What's left is to change defaults to implement @shoyer's comment So I'm thinking that we probably want to combine "all" and "minimal" into a single mode to use as the default, and remove the other behavior, which is either useless or broken. Maybe it would make sense to come up with a new name for this mode, and to make both "all" and "minimal" deprecated aliases for it? In the long term, this leaves only two "automatic" modes for xarray.concat, which should make things simpler for users trying to figure this out.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
524021001	https://github.com/pydata/xarray/issues/2064#issuecomment-524021001	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUyNDAyMTAwMQ==	dcherian 2448579	2019-08-22T18:22:37Z	2019-08-22T18:22:37Z	MEMBER	Thanks for your input @bonnland. The pandas concat() function uses the option join = {'inner', 'outer', 'left', 'right'} in order to mimic logical database join operations. If there is a reason that xarray cannot do the same, it is not obvious to me. I think the pandas options have the advantage of logical simplicity and traditional usage within database systems. We do have a `join` argument that takes these arguments + 'override' which was added recently to skip expensive comparisons. This works for "indexes" or "dimension coordinates". An example: if you have 2 dataarrays, one on a coordinate `x=[1, 2, 3]` and the other on `x=[2,3,4]`, `join` lets you control the `x` coordinate of the output. This is done by `xr.align`. What's under discussion here is what to do about variables duplicated across datasets or indeed, how do we know that these variables are duplicated across datasets when concatenating other variables.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
523960862	https://github.com/pydata/xarray/issues/2064#issuecomment-523960862	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUyMzk2MDg2Mg==	dcherian 2448579	2019-08-22T15:42:10Z	2019-08-22T15:42:10Z	MEMBER	I have a draft solution in #3239. It adds a new mode called "sensible" that acts like "all" when the concat dimension doesn't exist in the dataset and acts like "minimal" when the dimension is present. We can decide whether this is the right way i.e. add a new mode but the more fundamental problem is below. The issue is dealing with variables that should not be concatentated in "minimal" mode (e.g. time-invariant non dim coords when concatenating in time). In this case, we want to skip the equality checks in `_calc_concat_over`. This is a common reason for poor `open_mfdataset` performance. I thought the clean way to do this would be to add the `compat` kwarg to `concat` and then add `compat='override'` since the current behaviour is effectively `compat='equals'`. However, `merge` takes `compat` too and `concat` and `merge` support different `compat` arguments at present. This makes it complicated to easily thread `compat` down from `combine` or `open_mfdataset` without adding `concat_compat` and `merge_compat` which is silly. So do we want to support all the other `compat` modes in `concat`? Things like `broadcast_equals` or `no_conflicts` are funny because they're basically `merge` operations and it means `concat` acts like both `stack`, `concat` and `merge`. OTOH if you have a set of variables with the same name from different datasets and you want to pick one of those (i.e. no concatenation), then you're basically doing `merge` anyway. This would require some refactoring since `concat` assumes the first dataset is a template for the rest. @shoyer What do you think?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
519149757	https://github.com/pydata/xarray/issues/2064#issuecomment-519149757	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxOTE0OTc1Nw==	dcherian 2448579	2019-08-07T15:32:16Z	2019-08-07T15:32:16Z	MEMBER	Maybe it would make sense to come up with a new name for this mode, and to make both "all" and "minimal" deprecated aliases for it? I'm in favour of this. What should we name this mode? One comment on "existing dimensions" mode: "minimal" does the right thing, concatenating only variables with the dimension. For variables without the dimension, this will still raise a `ValueError` because `compat` can only be `'equals'` or `'identical'`. It seems to me like we need `compat='override'` and/or `compat='tolerance', tolerance=...` that would use numpy's approximate equality testing. This checking of non-dimensional coordinates is a common source of `mfdataset` issues. What do you think?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
512036050	https://github.com/pydata/xarray/issues/2064#issuecomment-512036050	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMjAzNjA1MA==	shoyer 1217238	2019-07-16T23:09:24Z	2019-07-16T23:09:24Z	MEMBER	UPDATE: @shoyer it could be that unit tests are failing because, as your final example shows, you get an error for data_vars='minimal' if any variables have different values across datasets, when adding a new concatentation dimension. If this is the reason so many unit tests are failing, then the failures are a red herring and should probably be ignored/rewritten. This seems very likely to me. The existing behavior of `data_vars='minimal'` is only useful in "existing dimensions mode". Xarray's unit test suite is definitely a good "smoke test" for understanding the impact of changes to `concat` on our users. What it tells us is that we can't change the default value from `"all"` to `"minimal"` without breaking existing code. Instead, we need to change how "all" or "minimal" works, or switch to yet another mode for the new behavior. The tests we should feel free to rewrite are cases where we set `data_vars="all"` or `data_vars="minimal"` explicitly for verifying the weird edge behaviors that I noted in my earlier comments. There shouldn't be too many of these tests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
512000102	https://github.com/pydata/xarray/issues/2064#issuecomment-512000102	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMjAwMDEwMg==	shoyer 1217238	2019-07-16T21:44:52Z	2019-07-16T21:44:52Z	MEMBER	Specifically, what should the default behavior of concat() be, when both datasets include a variable that does not include the concatenation dimension? Currently, the concat dimension is added, and the result is a "stacked" version of the variable. Others have argued that this variable should not be included in the concat() result by default, but this appears to break compatibility with Pandas concat(). Can you give a specific example of the behavior in question?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511611430	https://github.com/pydata/xarray/issues/2064#issuecomment-511611430	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTYxMTQzMA==	shoyer 1217238	2019-07-15T23:54:47Z	2019-07-15T23:54:47Z	MEMBER	The logic for determining which variables to concatenate is in the `_calc_concat_over` helper function: https://github.com/pydata/xarray/blob/539fb4a98d0961c281daa5474a8e492a0ae1d8a2/xarray/core/concat.py#L146 Only `"different"` is supposed to load variables into memory to determine which ones to concatenate. Right now we also have `"all"` and `"minimal"` options: - `"all"` attempts to concatenate every variable that can be broadcast to a matching shape: https://github.com/pydata/xarray/blob/539fb4a98d0961c281daa5474a8e492a0ae1d8a2/xarray/core/concat.py#L188-L190 - `"minimal"` only concatenates variables that already have the matching dimension. Recall that `concat` handles two types of concatenation: existing dimensions (corresponding to `np.concatenate`) and new dimensions (corresponding to `np.stack`). Currently, this is all done together in one messy codebase, but logically it would be cleaner to separate these modes into two separate function: - In "existing dimensions" mode: - `"all"` is currently broken, because it will also concatenate variables that don't have the dimension. - `"minimal"` does the right thing, concatenating only variables with the dimension. - In "new dimensions" mode: - `"all"` will add the dimension to all variables. - `"minimal"` raise an error if any variables have different values. If you're datasets have any data variables with different values at all, it raises an error. This is pretty much useless. Here's my notebook testing this out: https://gist.github.com/shoyer/f44300eddda4f7c476c61f76d1df938b So I'm thinking that we probably want to combine "all" and "minimal" into a single mode to use as the default, and remove the other behavior, which is either useless or broken. Maybe it would make sense to come up with a new name for this mode, and to make both `"all"` and `"minimal"` deprecated aliases for it? In the long term, this leaves only two "automatic" modes for `xarray.concat`, which should make things simpler for users trying to figure this out.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
511468454	https://github.com/pydata/xarray/issues/2064#issuecomment-511468454	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDUxMTQ2ODQ1NA==	dcherian 2448579	2019-07-15T16:15:51Z	2019-07-15T16:15:51Z	MEMBER	@bonnland I don't think you want to change the default `data_vars` but instead update the heuristics as in this comment we shouldn't implicitly add a new dimensions to variables in the case where the dimension already exists in the dataset. We only need the heuristics/comparisons when an entirely new dimension is being added.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381975937	https://github.com/pydata/xarray/issues/2064#issuecomment-381975937	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTk3NTkzNw==	rabernat 1197350	2018-04-17T12:34:15Z	2018-04-17T12:34:15Z	MEMBER	I'm glad! FWIW, I think this is a relatively simple fix within xarray. @xylar, if you are game, we would love to see a PR from you. Could be a good opportunity to learn more about xarray internals.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381728814	https://github.com/pydata/xarray/issues/2064#issuecomment-381728814	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyODgxNA==	shoyer 1217238	2018-04-16T19:55:24Z	2018-04-16T19:55:24Z	MEMBER	I stand corrected. in 0.10.1, I also see the Time variable getting added to refBottomDepth when I open multiple files. So maybe this is not in fact a new problem but an existing issue that happened to behave as I expected only when opening a single file in previous versions. Sorry for not noticing that sooner. OK, in that case I think #2048 was still the right change/bug-fix, making multi-file and single-file behavior consistent. But you certainly have exposed a real issue here. But this issue raises an important basic point: we might want different behavior for variables in which concat_dim is already a dimension vs. variables for which it is not. Yes, we shouldn't implicitly add a new dimensions to variables in the case where the dimension already exists in the dataset. We only need the heuristics/comparisons when an entirely new dimension is being added.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381725478	https://github.com/pydata/xarray/issues/2064#issuecomment-381725478	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyNTQ3OA==	rabernat 1197350	2018-04-16T19:44:00Z	2018-04-16T19:44:00Z	MEMBER	But this issue raises an important basic point: we might want different behavior for variables in which `concat_dim` is already a dimension vs. variables for which it is not.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381722944	https://github.com/pydata/xarray/issues/2064#issuecomment-381722944	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcyMjk0NA==	rabernat 1197350	2018-04-16T19:35:12Z	2018-04-16T19:35:12Z	MEMBER	so you're fooling xarray into not including the time dimension in your non-time variables by making them coordinates in the above example? Exactly. They are coordinates. Those variables are usually related to grid geometry or constants, as I presume is `refBottomDepth` in your example.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381717472	https://github.com/pydata/xarray/issues/2064#issuecomment-381717472	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcxNzQ3Mg==	rabernat 1197350	2018-04-16T19:15:19Z	2018-04-16T19:15:19Z	MEMBER	👍 This is a persistent problem for me as well. I often find myself writing a preprocessor function like this `python def process_coords(ds, concat_dim='time', drop=True): coord_vars = [v for v in ds.data_vars if concat_dim not in ds[v].dims] if drop: return ds.drop(coord_vars) else: return ds.set_coords(coord_vars) ds = xr.open_mfdataset('*.nc', preprocess=process_coords)` The reason to drop the coordinates is to avoid the comparison that happens when you concatenate coords.	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258
381707540	https://github.com/pydata/xarray/issues/2064#issuecomment-381707540	https://api.github.com/repos/pydata/xarray/issues/2064	MDEyOklzc3VlQ29tbWVudDM4MTcwNzU0MA==	shoyer 1217238	2018-04-16T18:42:06Z	2018-04-16T18:42:06Z	MEMBER	What happens if you open multiple files with `open_mfdataset()`, e.g., for both January and February. Does it result in a dataset with the right dimensions on each variable?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	concat_dim getting added to all variables of multifile datasets 314764258

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);