github: issue_comments: 6 rows where issue = 226549366 sorted by updated

6 rows where issue = 226549366 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
300072972	https://github.com/pydata/xarray/issues/1399#issuecomment-300072972	https://api.github.com/repos/pydata/xarray/issues/1399	MDEyOklzc3VlQ29tbWVudDMwMDA3Mjk3Mg==	cchwala 102827	2017-05-09T06:26:36Z	2017-05-09T06:26:36Z	CONTRIBUTOR	Okay. I will try to come up with a PR within the next days.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed 226549366
299916837	https://github.com/pydata/xarray/issues/1399#issuecomment-299916837	https://api.github.com/repos/pydata/xarray/issues/1399	MDEyOklzc3VlQ29tbWVudDI5OTkxNjgzNw==	shoyer 1217238	2017-05-08T16:24:50Z	2017-05-08T16:24:50Z	MEMBER	This does not make me as nervous as Fabien since my data is always quite recent, but I see that this is far from ideal for a tool for climate scientists. @spencerkclark has been working on patch to natively support other datetime precisions in xarray (see https://github.com/pydata/xarray/pull/1252). The only thing that bothers me is that I am not sure if the "number of nanoseconds" is always the same in every day or hour in the view of datetime64, due to leap seconds or other particularities. For better or worse, NumPy's datetime64 ignores leap seconds. Does this sound reasonable or did I forget to take into account any side effects? This sounds pretty reasonable to me. The main challenge here will be guarding against integer overflow -- you might need to do the math twice, once with floats (to check for overflow) and then with integers. You could also experiment with doing the conversion with NumPy instead of pandas, using `.astype('timedelta64[{}]'.format(units))`.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed 226549366
299830748	https://github.com/pydata/xarray/issues/1399#issuecomment-299830748	https://api.github.com/repos/pydata/xarray/issues/1399	MDEyOklzc3VlQ29tbWVudDI5OTgzMDc0OA==	fmaussion 10050469	2017-05-08T10:26:26Z	2017-05-08T10:26:26Z	MEMBER	xarray rely on datetime64[ns] you cannot avoid nanoseconds, right? yes, you can ignore my comment!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed 226549366
299819380	https://github.com/pydata/xarray/issues/1399#issuecomment-299819380	https://api.github.com/repos/pydata/xarray/issues/1399	MDEyOklzc3VlQ29tbWVudDI5OTgxOTM4MA==	cchwala 102827	2017-05-08T09:32:58Z	2017-05-08T09:32:58Z	CONTRIBUTOR	Hmm... The "nanosecond"-issue seems to need a fix very much at the foundation. As long as pandas and xarray rely on `datetime64[ns]` you cannot avoid nanoseconds, right? `pd.to_datetime()` forces the conversion to nanoscends even if you pass integers but for a time `unit` different to `ns`. This does not make me as nervous as Fabien since my data is always quite recent, but I see that this is far from ideal for a tool for climate scientists. An intermediate fix (@shoyer, do you actually want one?) that I could think of for the performance issue right now would be to do the conversion to `datetime64[ns]` depending on the time unit, e.g. multiply raw values (most likely floats) with number of nanoseconds in time `unit` for units smaller then days (or hours?) and use these values as integers in `pd.to_datetime()` else, fall back to using netCDF4/netcdftime for months and years (as suggested by shoyer) casting the raw values to floats The only thing that bothers me is that I am not sure if the "number of nanoseconds" is always the same in every day or hour in the view of `datetime64`, due to leap seconds or other particularities. @shoyer: Does this sound reasonable or did I forget to take into account any side effects?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed 226549366
299510444	https://github.com/pydata/xarray/issues/1399#issuecomment-299510444	https://api.github.com/repos/pydata/xarray/issues/1399	MDEyOklzc3VlQ29tbWVudDI5OTUxMDQ0NA==	shoyer 1217238	2017-05-05T16:23:17Z	2017-05-05T16:23:17Z	MEMBER	Good catch! We should definitely speed this up. Hence, it would be great to automatically do the conversion from raw time value floats to integers in nanoseconds where possible (likely limited to resolutions bellow days or hours to avoid coping with different durations numbers of nanoseconds within e.g. different months). Yes, very much agreed. For units such as months or years, we already are giving the wrong result when we use pandas: `In [18]: pd.to_timedelta(1, unit='M') Out[18]: Timedelta('30 days 10:29:06')` In these cases, we should fall back to using netCDF4/netcdftime instead. We may also need to add more checks for numeric overflow. As alternative, maybe avoid forcing the cast to floats and indicate in the docstring that the raw values should be integers to speed up the conversion. Yes, this might also work. I no longer recall why we cast all inputs to floats (maybe just for consistency), but I suspect that that one of our time conversion libraries (probably netCDF4/netcdftime) expects a float array. Certainly we will still need to support floating point times saved in netCDF files, which are pretty common in my experience.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed 226549366
299483553	https://github.com/pydata/xarray/issues/1399#issuecomment-299483553	https://api.github.com/repos/pydata/xarray/issues/1399	MDEyOklzc3VlQ29tbWVudDI5OTQ4MzU1Mw==	fmaussion 10050469	2017-05-05T14:42:19Z	2017-05-05T14:42:19Z	MEMBER	Hi Christian! As alternative, maybe avoid forcing the cast to floats and indicate in the docstring that the raw values should be integers to speed up the conversion. This sounds much less error prone to me. In particular, I am getting a bit nervous when I hear "nanoseconds" ;-) (see https://github.com/pydata/xarray/issues/789)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`decode_cf_datetime()` slow because `pd.to_timedelta()` is slow if floats are passed 226549366

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);