github: issue_comments: 6 rows where issue = 771382653 sorted by updated

6 rows where issue = 771382653 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
748498256	https://github.com/pydata/xarray/issues/4714#issuecomment-748498256	https://api.github.com/repos/pydata/xarray/issues/4714	MDEyOklzc3VlQ29tbWVudDc0ODQ5ODI1Ng==	keewis 14808389	2020-12-19T16:58:20Z	2020-12-19T16:58:20Z	MEMBER	I think `reindex` would need to be changed that's true, I only tried the special case where the data that would be used to do the forward fill is included in the result. I guess this works but it's a bit cumbersome yeah, `to_dataset` is probably not the right tool for pointwise indexing. it does not fail if one of the sensors in the query list is missing if I understand correctly, you would like to index with arbitrary values for `time`, but would like an error for missing values of `sensor`. Unfortunately, I don't think that is possible using a single call to `sel`. Instead, you could set the `fill_value` parameter of `reindex` to some other value (for example, `-np.inf`) and then drop these values after the pointwise indexing.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow sel's method and tolerance to vary per-dimension 771382653
748486801	https://github.com/pydata/xarray/issues/4714#issuecomment-748486801	https://api.github.com/repos/pydata/xarray/issues/4714	MDEyOklzc3VlQ29tbWVudDc0ODQ4NjgwMQ==	batterseapower 18488	2020-12-19T15:13:36Z	2020-12-19T15:14:59Z	NONE	Thanks for the response. I think `reindex` would need to be changed as well because this code: `python sensor_data.reindex({ 'time': [1], 'sensor': ['A', 'B'] }, method='ffill')` Is not equivalent to this code: `python sensor_data.reindex({ 'time': [1], 'sensor': ['A', 'B'] }).ffill(dim='time').ffill(dim='sensor')` So if I understand your `to_dataset` idea correctly, you are proposing: `python ds = sensor_data.to_dataset(dim='sensor') xr.concat([ ds[sensor].sel({'time': time}, method='ffill', drop=True) for sensor, time in zip(['A', 'A', 'A', 'B', 'C'], [0, 1, 2, 0, 0]) ], dim='sample')` I guess this works but it's a bit cumbersome and unlikely to be fast. I think there must be something I'm not understanding here - I'm not familiar with all the nuances of the `xarray` api. Your idea of `reindex` followed by `sel` is an interesting one, but it does do something slightly different than what I was asking for: it does not fail if one of the sensors in the query list is missing, but rather inserts a NaN. I suppose you could fix this by doing an extra check afterwards, assuming that your original pre-reindex data contained no NaNs. In general `min(SN,TN)` could be much larger than `ST`, so for big queries it's quite possible that you wouldn't have enough space to allocate the intermediate even if you could fit 100s of copies of the original `ST` matrix. Using a dask cluster would make this situation less likely of course, but it seems like it would be better to avoid all this copying (even on a beefy cluster) even if just for performance reasons.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow sel's method and tolerance to vary per-dimension 771382653
748483357	https://github.com/pydata/xarray/issues/4714#issuecomment-748483357	https://api.github.com/repos/pydata/xarray/issues/4714	MDEyOklzc3VlQ29tbWVudDc0ODQ4MzM1Nw==	keewis 14808389	2020-12-19T14:41:47Z	2020-12-19T14:41:47Z	MEMBER	`reindex` does not have to be changed since we can just call e.g. `ffill` with the `dim` parameter for this to work: `python arr.reindex(...).ffill(dim="dim")` This really depends on how you intend to use the result of the indexing. For example, if you don't really need the big matrix, you could just convert the `DataArray` to a `Dataset` where the `sensor` dimension is the names of the variables (using `to_dataset(dim="sensor")`, or construct it that way). If you do need the matrix, this might be slightly better (you still end up allocating a `T * (S + n)` array): `python arr.reindex(sensor=["A", "B", "C"]).sel({"sensor": ..., "time": ...}, method="ffill")` but if you really care about the memory allocated at once, you might be better off using `dask`: `python arr.chunk({"time": 100}).reindex(...).sel(...)` If all of that is not an option, I guess we might be able add a `method_kwargs` parameter (not sure if there is a better option, though).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow sel's method and tolerance to vary per-dimension 771382653
748479287	https://github.com/pydata/xarray/issues/4714#issuecomment-748479287	https://api.github.com/repos/pydata/xarray/issues/4714	MDEyOklzc3VlQ29tbWVudDc0ODQ3OTI4Nw==	batterseapower 18488	2020-12-19T14:06:36Z	2020-12-19T14:06:36Z	NONE	Thanks for the suggestion. One issue with this alternative is it creates a potentially large intermediate object. If you have T times and S sensors, and want to sample them at N (time, sensor) pairs, then the intermediate object with your approach has size `TN` (if you index sensors first) or `SN` (if you index time first). If you can index both dimensions in one `sel` call then we should only need to allocate memory for the result of size `N`, which is considerably better.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow sel's method and tolerance to vary per-dimension 771382653
748478029	https://github.com/pydata/xarray/issues/4714#issuecomment-748478029	https://api.github.com/repos/pydata/xarray/issues/4714	MDEyOklzc3VlQ29tbWVudDc0ODQ3ODAyOQ==	mathause 10194086	2020-12-19T13:55:07Z	2020-12-19T13:55:07Z	MEMBER	Could you split it in two calls or does this not do what you want? `python sensor_data.sel({ 'sensor': xr.DataArray(['A', 'A', 'A', 'B', 'C'], dims=['sample'])}).sel({ 'time': xr.DataArray([0, 1, 2, 0, 0], dims=['sample']) }, method='ffill')`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow sel's method and tolerance to vary per-dimension 771382653
748477889	https://github.com/pydata/xarray/issues/4714#issuecomment-748477889	https://api.github.com/repos/pydata/xarray/issues/4714	MDEyOklzc3VlQ29tbWVudDc0ODQ3Nzg4OQ==	batterseapower 18488	2020-12-19T13:53:53Z	2020-12-19T13:53:53Z	NONE	I guess it would also make sense to have this in `reindex` if you did decide to add it.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Allow sel's method and tolerance to vary per-dimension 771382653

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);