github: issue_comments: 5 rows where author_association = "CONTRIBUTOR" and user = 1554921 sorted by updated

5 rows where author_association = "CONTRIBUTOR" and user = 1554921 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
400825442	https://github.com/pydata/xarray/issues/2254#issuecomment-400825442	https://api.github.com/repos/pydata/xarray/issues/2254	MDEyOklzc3VlQ29tbWVudDQwMDgyNTQ0Mg==	neishm 1554921	2018-06-27T20:53:27Z	2018-06-27T20:53:27Z	CONTRIBUTOR	So yes, it looks like we could fix this by checking chunks on each array independently like you suggest. There's no reason why all dask arrays need to have the same chunking for storing with to_netcdf(). I could throw together a pull request if that's all that's involved. This is because you need to indicate chunks for variables separately, via encoding: http://xarray.pydata.org/en/stable/io.html#writing-encoded-data Thanks! I was able to write chunked output the netCDF file by adding `chunksizes` to the `encoding` attribute of the variables. I found I also had to specify `original_shape` as a workaround for #2198.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Writing Datasets to netCDF4 with "inconsistent" chunks 336273865
399495668	https://github.com/pydata/xarray/issues/2242#issuecomment-399495668	https://api.github.com/repos/pydata/xarray/issues/2242	MDEyOklzc3VlQ29tbWVudDM5OTQ5NTY2OA==	neishm 1554921	2018-06-22T16:10:45Z	2018-06-22T16:10:45Z	CONTRIBUTOR	True, I would expect some performance hit due to writing chunk-by-chunk, however that same performance hit is present in both of the test cases. In addition to the snippet @shoyer mentioned, I found that xarray also intentionally uses `autoclose=True` when writing chunks to netCDF: https://github.com/pydata/xarray/blob/73b476e4db6631b2203954dd5b138cb650e4fb8c/xarray/backends/netCDF4_.py#L45-L48 However, `ensure_open` only uses `autoclose` if the file isn't already open: https://github.com/pydata/xarray/blob/73b476e4db6631b2203954dd5b138cb650e4fb8c/xarray/backends/common.py#L496-L503 So if the file is already open before getting to `BaseNetCDF4Array__setitem__`, it will remain open. If the file isn't yet opened, it will be opened, but then immediately closed after writing the chunk. I suspect this is what's happening in the delayed version - the starting state of `NetCDF4DataStore._isopen` is `False` for some reason, and so it is doomed to re-close itself for each chunk processed. If I remove the `autoclose=True` from `BaseNetCDF4Array__setitem__`, the file remains open and performance is comparable between the two tests.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	to_netcdf(compute=False) can be slow 334633212
350292555	https://github.com/pydata/xarray/issues/1763#issuecomment-350292555	https://api.github.com/repos/pydata/xarray/issues/1763	MDEyOklzc3VlQ29tbWVudDM1MDI5MjU1NQ==	neishm 1554921	2017-12-08T15:34:01Z	2017-12-08T15:34:01Z	CONTRIBUTOR	I think I've duplicated the logic from `_construct_dataarray` into `_encode_coordinates`. Test cases are passing, and my actual files are writing out properly. Hopefully nothing else got broken along the way.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Multi-dimensional coordinate mixup when writing to netCDF 279832457
350090601	https://github.com/pydata/xarray/pull/1768#issuecomment-350090601	https://api.github.com/repos/pydata/xarray/issues/1768	MDEyOklzc3VlQ29tbWVudDM1MDA5MDYwMQ==	neishm 1554921	2017-12-07T20:51:27Z	2017-12-07T20:51:27Z	CONTRIBUTOR	No fix yet, just added a test case.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Fix multidimensional coordinates 280274296
350015214	https://github.com/pydata/xarray/issues/1763#issuecomment-350015214	https://api.github.com/repos/pydata/xarray/issues/1763	MDEyOklzc3VlQ29tbWVudDM1MDAxNTIxNA==	neishm 1554921	2017-12-07T16:11:55Z	2017-12-07T16:11:55Z	CONTRIBUTOR	I can try putting together a pull request, hopefully without breaking any existing use cases. I just tested switching the any condition to all in the above code, and it does fix my one test case... ...However, it breaks other cases, such as if there's another axis in the data (such as a time axis). I think the all condition would require "time" to be one of the dimensions of the coordinates. Here's an updated test case: ```python import xarray as xr import numpy as np zeros1 = np.zeros((1,5,3)) zeros2 = np.zeros((1,6,3)) zeros3 = np.zeros((1,5,4)) d = xr.Dataset({ 'lon1': (['x1','y1'], zeros1.squeeze(0), {}), 'lon2': (['x2','y1'], zeros2.squeeze(0), {}), 'lon3': (['x1','y2'], zeros3.squeeze(0), {}), 'lat1': (['x1','y1'], zeros1.squeeze(0), {}), 'lat2': (['x2','y1'], zeros2.squeeze(0), {}), 'lat3': (['x1','y2'], zeros3.squeeze(0), {}), 'foo1': (['time','x1','y1'], zeros1, {'coordinates': 'lon1 lat1'}), 'foo2': (['time','x2','y1'], zeros2, {'coordinates': 'lon2 lat2'}), 'foo3': (['time','x1','y2'], zeros3, {'coordinates': 'lon3 lat3'}), 'time': ('time', [0.], {'units': 'hours since 2017-01-01'}), }) d = xr.conventions.decode_cf(d) `The resulting Dataset:` <xarray.Dataset> Dimensions: (time: 1, x1: 5, x2: 6, y1: 3, y2: 4) Coordinates: lat1 (x1, y1) float64 ... * time (time) datetime64[ns] 2017-01-01 lat3 (x1, y2) float64 ... lat2 (x2, y1) float64 ... lon1 (x1, y1) float64 ... lon3 (x1, y2) float64 ... lon2 (x2, y1) float64 ... Dimensions without coordinates: x1, x2, y1, y2 Data variables: foo1 (time, x1, y1) float64 ... foo2 (time, x2, y1) float64 ... foo3 (time, x1, y2) float64 ... `saved to netCDF using`python d.to_netcdf("test.nc") ``` With the any condition, I have too many coordinates: `~$ ncdump -h test.nc netcdf test { dimensions: x1 = 5 ; y1 = 3 ; time = 1 ; y2 = 4 ; x2 = 6 ; variables: ... double foo1(time, x1, y1) ; foo1:_FillValue = NaN ; foo1:coordinates = "lat1 lat3 lat2 lon1 lon3 lon2" ; double foo2(time, x2, y1) ; foo2:_FillValue = NaN ; foo2:coordinates = "lon1 lon2 lat1 lat2" ; double foo3(time, x1, y2) ; foo3:_FillValue = NaN ; foo3:coordinates = "lon1 lon3 lat1 lat3" ; ... }` With the all condition, I don't get any variable coordinates (they're dumped into the global attributes): ``` ~$ ncdump -h test.nc netcdf test { dimensions: x1 = 5 ; y1 = 3 ; time = 1 ; y2 = 4 ; x2 = 6 ; variables: ... double foo1(time, x1, y1) ; foo1:_FillValue = NaN ; double foo2(time, x2, y1) ; foo2:_FillValue = NaN ; double foo3(time, x1, y2) ; foo3:_FillValue = NaN ; // global attributes: :_NCProperties = "version=1\|netcdflibversion=4.4.1.1\|hdf5libversion=1.8.18" ; :coordinates = "lat1 lat3 lat2 lon1 lon3 lon2" ; } ``` So the update may be a bit more tricky to get right. I know the DataArray objects (foo1,foo2,foo3) already have the right coordinates associated with them before writing to netCDF, so maybe the logic in `_encode_coordinates` could be changed to utilize `v.coords` somehow? I'll see if I can get something working for my test cases...	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Multi-dimensional coordinate mixup when writing to netCDF 279832457

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);