github: issue_comments: 7 rows where issue = 180451196 and user = 1217238 sorted by updated

7 rows where issue = 180451196 and user = 1217238 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
261046945	https://github.com/pydata/xarray/pull/1024#issuecomment-261046945	https://api.github.com/repos/pydata/xarray/issues/1024	MDEyOklzc3VlQ29tbWVudDI2MTA0Njk0NQ==	shoyer 1217238	2016-11-16T19:30:53Z	2016-11-16T19:30:53Z	MEMBER	@kynan I think this is fixed in #1128, which has a slightly more robust solution.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable automatic cache with dask 180451196
260393416	https://github.com/pydata/xarray/pull/1024#issuecomment-260393416	https://api.github.com/repos/pydata/xarray/issues/1024	MDEyOklzc3VlQ29tbWVudDI2MDM5MzQxNg==	shoyer 1217238	2016-11-14T16:57:59Z	2016-11-14T16:57:59Z	MEMBER	Thanks for your patience! This is a nice improvement. I have an idea for a variation that might make for a cleaner (less dask specific) way to handle the remaining caching logic -- I'll add you a reviewer on that PR.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable automatic cache with dask 180451196
258679497	https://github.com/pydata/xarray/pull/1024#issuecomment-258679497	https://api.github.com/repos/pydata/xarray/issues/1024	MDEyOklzc3VlQ29tbWVudDI1ODY3OTQ5Nw==	shoyer 1217238	2016-11-06T13:01:50Z	2016-11-06T13:01:50Z	MEMBER	Awesome, thanks for your help! On Sat, Nov 5, 2016 at 6:56 PM crusaderky notifications@github.com wrote: roger that, getting to work :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1024#issuecomment-258647829, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1mu6Gjv5ehzr-d_3gwKr8PPIgqarks5q7QmcgaJpZM4KLurN .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable automatic cache with dask 180451196
258620328	https://github.com/pydata/xarray/pull/1024#issuecomment-258620328	https://api.github.com/repos/pydata/xarray/issues/1024	MDEyOklzc3VlQ29tbWVudDI1ODYyMDMyOA==	shoyer 1217238	2016-11-05T15:53:06Z	2016-11-05T15:53:06Z	MEMBER	Anyway, I can open a new issue to discuss more about this if you think it's worth it. Yes, please do! @crusaderky I think we are OK going ahead here with Option D. If we do eventually extend xarray with out of core indexes, that will be done with a separate layer (not in `IndexVariable`).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable automatic cache with dask 180451196
258524115	https://github.com/pydata/xarray/pull/1024#issuecomment-258524115	https://api.github.com/repos/pydata/xarray/issues/1024	MDEyOklzc3VlQ29tbWVudDI1ODUyNDExNQ==	shoyer 1217238	2016-11-04T19:19:00Z	2016-11-04T19:19:00Z	MEMBER	I admit that currently xarray is perhaps not very suited for handling unstructured meshes, but IMO it has great potential (especially considering multi-index support) and I'd love to use it here. Right now, xarray is not going to be great fit for such cases, because we already cache an index in memory for any labeled indexing operations. So at best, you could do something like `ds.isel(mesh_edge=slice(int(1e6)))`. Maybe people already do this? I doubt very many people are relying on this, though indeed, this would include some users of an array database we wrote at my former employer, which did not support different chunking schemes for different variables (which could make coordinate lookup very slow). CC @ToddSmall in case he has opinions here. For out-of-core operations with labels on big unstructured meshes, you really need a generalization of the `pandas.Index` that doesn't need to live in memory (or maybe that lives in memory on some remote server).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable automatic cache with dask 180451196
256125722	https://github.com/pydata/xarray/pull/1024#issuecomment-256125722	https://api.github.com/repos/pydata/xarray/issues/1024	MDEyOklzc3VlQ29tbWVudDI1NjEyNTcyMg==	shoyer 1217238	2016-10-25T18:25:30Z	2016-10-25T18:25:30Z	MEMBER	I'm going to ping the mailing list for input, but I think it would be pretty safe. On Tue, Oct 25, 2016 at 11:11 AM, crusaderky notifications@github.com wrote: Hi Stephen, Thank you for your thinking. IMHO option D is the cleanest and safest. Could you come up with any example where it may be problematic? On 21 Oct 2016 4:36 am, "Stephan Hoyer" notifications@github.com wrote: I've been thinking about this... Maybe the simple, clean solution is to simply invoke compute() on all coords as soon as they are assigned to the DataArray / Dataset? I'm nervous about eager loading, especially for non-index coordinates. They can have more than one dimension, and thus can contain a lot of data. So potentially eagerly loading non-index coordinates could break existing use cases. On the other hand, non-index coordinates indeed checked for equality in most xarray operations (e.g., for the coordinate merge in align). So it is indeed useful not to have to recompute them all the time. Even eagerly loading indexes is potentially problematic, if loading the index values is expensive. So I'm conflicted: - I like the current caching behavior for coords and indexes - But I also want to avoid implicit conversions from dask to numpy, which is problematic for all the reasons you pointed out earlier I'm going to start throwing out ideas for how to deal with this: Option A Add two new (public?) methods, something like .load_coords() and .load_indexes(). We would then insert calls to these methods at the start of each function that uses coordinates: - .load_indexes(): reindex, reindex_like, align and sel - .load_coords(): merge and anything that calls the functions in core/merge.py (this indirectly includes Dataset.init and Dataset.setitem) Hypothetically, we could even have options for turning this caching systematically on/off (e.g., with xarray.set_options(cache_coords=False, cache_indexes=True): ...). Your proposal is basically an extreme version of this, where we call .load_coords() immediately after constructing every new object. Advantages: - It's fairly predictable when caching happens (especially if we opt for calling .load_cords() immediately, as you propose). - Computing variables is all done at once, which is much more performant than what we currently do, e.g., loading variables as needed for .equals() checks in merge_variables one at a time. Downsides: - Caching is more aggressive than necessary -- we cache indexes even if that coord isn't actually indexed. Option B Like Option A, but someone infer the full set of variables that need to be cached (e.g., in a .merge() operation) before it's actually done. This seems hard, but maybe is possible using a variation of merge_variables. This solves the downside of A, but diminishes the predictability. We're basically back to how things work now. Option C Cache dask.array in IndexVariable but not Variable. This preserves performance for repeated indexing, because the hash table behind the pandas.Index doesn't get thrown away. Advantages: - Much simpler and easier to implement than the alternatives. - Implicit conversions are greatly diminished. Downsides: - Non-index coordinates get thrown away after being evaluated once. If you're doing lots of operations of the form [ds + other for ds in datasets] where ds and other has conflicting coordinates this would probably make you unhappy. Option D Load the contents of an IndexVariable immediately and eagerly. They no longer cache data or use lazy loading. This has the most predictable performance, but might cause trouble for some edge use cases? I need to think about this a little more, but right now I am leaning towards Option C or D. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1024#issuecomment-255286001, or mute the thread https://github.com/notifications/unsubscribe- auth/AF7OMLBh4eDuKRNv0x5HwRie_yaGh0Yzks5q2DMjgaJpZM4KLurN . — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1024#issuecomment-256114879, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1jUaNUCxHlCx86P4JjbhsLA99ZIqks5q3kZYgaJpZM4KLurN .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable automatic cache with dask 180451196
255286001	https://github.com/pydata/xarray/pull/1024#issuecomment-255286001	https://api.github.com/repos/pydata/xarray/issues/1024	MDEyOklzc3VlQ29tbWVudDI1NTI4NjAwMQ==	shoyer 1217238	2016-10-21T03:36:01Z	2016-10-21T03:36:01Z	MEMBER	I've been thinking about this... Maybe the simple, clean solution is to simply invoke compute() on all coords as soon as they are assigned to the DataArray / Dataset? I'm nervous about eager loading, especially for non-index coordinates. They can have more than one dimension, and thus can contain a lot of data. So potentially eagerly loading non-index coordinates could break existing use cases. On the other hand, non-index coordinates indeed checked for equality in most xarray operations (e.g., for the coordinate merge in align). So it is indeed useful not to have to recompute them all the time. Even eagerly loading indexes is potentially problematic, if loading the index values is expensive. So I'm conflicted: - I like the current caching behavior for `coords` and `indexes` - But I also want to avoid implicit conversions from dask to numpy, which is problematic for all the reasons you pointed out earlier I'm going to start throwing out ideas for how to deal with this: Option A Add two new (public?) methods, something like `.load_coords()` and `.load_indexes()`. We would then insert calls to these methods at the start of each function that uses coordinates: - `.load_indexes()`: `reindex`, `reindex_like`, `align` and `sel` - `.load_coords()`: `merge` and anything that calls the functions in `core/merge.py` (this indirectly includes `Dataset.__init__` and `Dataset.__setitem__`) Hypothetically, we could even have options for turning this caching systematically on/off (e.g., `with xarray.set_options(cache_coords=False, cache_indexes=True): ...`). Your proposal is basically an extreme version of this, where we call `.load_coords()` immediately after constructing every new object. Advantages: - It's fairly predictable when caching happens (especially if we opt for calling `.load_cords()` immediately, as you propose). - Computing variables is all done at once, which is much more performant than what we currently do, e.g., loading variables as needed for `.equals()` checks in `merge_variables` one at a time. Downsides: - Caching is more aggressive than necessary -- we cache indexes even if that coord isn't actually indexed. Option B Like Option A, but someone infer the full set of variables that need to be cached (e.g., in a `.merge()` operation) before it's actually done. This seems hard, but maybe is possible using a variation of `merge_variables`. This solves the downside of A, but diminishes the predictability. We're basically back to how things work now. Option C Cache dask.array in `IndexVariable` but not `Variable`. This preserves performance for repeated indexing, because the hash table behind the `pandas.Index` doesn't get thrown away. Advantages: - Much simpler and easier to implement than the alternatives. - Implicit conversions are greatly diminished. Downsides: - Non-index coordinates get thrown away after being evaluated once. If you're doing lots of operations of the form `[ds + other for ds in datasets]` where `ds` and `other` has conflicting coordinates this would probably make you unhappy. Option D Load the contents of an `IndexVariable` immediately and eagerly. They no longer cache data or use lazy loading. This has the most predictable performance, but might cause trouble for some edge use cases? I need to think about this a little more, but right now I am leaning towards Option C or D.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Disable automatic cache with dask 180451196

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);

issue_comments

7 rows where issue = 180451196 and user = 1217238 sorted by updated_at descending

some edge use cases?

Option A

Option B

Option C

Option D

Advanced export