github: issue_comments: 7 rows where user = 15016780 sorted by updated

7 rows where user = 15016780 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
576422784	https://github.com/pydata/xarray/issues/3686#issuecomment-576422784	https://api.github.com/repos/pydata/xarray/issues/3686	MDEyOklzc3VlQ29tbWVudDU3NjQyMjc4NA==	abarciauskas-bgse 15016780	2020-01-20T20:35:47Z	2020-01-20T20:35:47Z	NONE	Closing as using `mask_and_scale=False` produced precise results	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Different data values from xarray open_mfdataset when using chunks 548475127
573458081	https://github.com/pydata/xarray/issues/3686#issuecomment-573458081	https://api.github.com/repos/pydata/xarray/issues/3686	MDEyOklzc3VlQ29tbWVudDU3MzQ1ODA4MQ==	abarciauskas-bgse 15016780	2020-01-12T21:17:11Z	2020-01-12T21:17:11Z	NONE	Thanks @rabernat I would like to use assert_allclose to test the output but at first pass it seems that might be prohibitively slow to test for large datasets, do you recommend sampling or other good testing strategies (e.g. to assert the xarray datasets are equal to some precision)	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Different data values from xarray open_mfdataset when using chunks 548475127
573444233	https://github.com/pydata/xarray/issues/3686#issuecomment-573444233	https://api.github.com/repos/pydata/xarray/issues/3686	MDEyOklzc3VlQ29tbWVudDU3MzQ0NDIzMw==	abarciauskas-bgse 15016780	2020-01-12T18:37:59Z	2020-01-12T18:37:59Z	NONE	@dmedv Thanks for this, it all makes sense to me and I see the same results, however I wasn't able to "convert back" using `scale_factor` and `add_offset` ``` from netCDF4 import Dataset d = Dataset(fileObjs[0]) v = d.variables['analysed_sst'] print("Result with mask_and_scale=True") ds_unchunked = xr.open_dataset(fileObjs[0]) print(ds_unchunked.analysed_sst.sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print("Result with mask_and_scale=False") ds_unchunked = xr.open_dataset(fileObjs[0], mask_and_scale=False) scaled = ds_unchunked.analysed_sst * v.scale_factor + v.add_offset scaled.sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values `` ^^ That returns a different result than what I expect. I wonder if this is because of the_FillValue` missing from trying to convert back. However this led me to another seemingly related issue: https://github.com/pydata/xarray/issues/2304 Loss of precision seems to be the key here, so coercing the `float32`s to `float64`s appears to get the same results from both chunked and unchunked versions - but still not ``` print("results from unchunked dataset") ds_unchunked = xr.open_mfdataset(fileObjs, combine='by_coords') ds_unchunked['analysed_sst'] = ds_unchunked['analysed_sst'].astype(np.float64) print(ds_unchunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print(f"results from chunked dataset using {chunks}") ds_chunked = xr.open_mfdataset(fileObjs, chunks=chunks, combine='by_coords') ds_chunked['analysed_sst'] = ds_chunked['analysed_sst'].astype(np.float64) print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) print("results from chunked dataset using 'auto'") ds_chunked = xr.open_mfdataset(fileObjs, chunks={'time': 'auto', 'lat': 'auto', 'lon': 'auto'}, combine='by_coords') ds_chunked['analysed_sst'] = ds_chunked['analysed_sst'].astype(np.float64) print(ds_chunked.analysed_sst[1,:,:].sel(lat=slice(20,50),lon=slice(-170,-110)).mean().values) ``` returns: `results from unchunked dataset 290.1375818862207 results from chunked dataset using {'time': 1, 'lat': 1799, 'lon': 3600} 290.1375818862207 results from chunked dataset using 'auto' 290.1375818862207`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Different data values from xarray open_mfdataset when using chunks 548475127
531617569	https://github.com/pydata/xarray/issues/3306#issuecomment-531617569	https://api.github.com/repos/pydata/xarray/issues/3306	MDEyOklzc3VlQ29tbWVudDUzMTYxNzU2OQ==	abarciauskas-bgse 15016780	2019-09-16T01:22:09Z	2019-09-16T01:22:09Z	NONE	Thanks @rabernat. I tried what you suggested (with a small subset, the source files are quite large) and it seems to work on smaller subsets, writing locally. Which leads me to suspect trying to run the same process with larger datasets might be overloading memory, but I can't assert the root cause yet. This isn't blocking my current strategy so closing for now.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531493820	https://github.com/pydata/xarray/issues/3306#issuecomment-531493820	https://api.github.com/repos/pydata/xarray/issues/3306	MDEyOklzc3VlQ29tbWVudDUzMTQ5MzgyMA==	abarciauskas-bgse 15016780	2019-09-14T16:34:56Z	2019-09-14T16:34:56Z	NONE	I recall this also happening when storing locally but I can't reproduce that at the moment since the kubernetes cluster I am using now is not a pangeo hub and not setup to use EFS.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531486715	https://github.com/pydata/xarray/issues/3306#issuecomment-531486715	https://api.github.com/repos/pydata/xarray/issues/3306	MDEyOklzc3VlQ29tbWVudDUzMTQ4NjcxNQ==	abarciauskas-bgse 15016780	2019-09-14T15:03:04Z	2019-09-14T15:03:04Z	NONE	@rabernat good points. One thing I'm not sure of how to make reproducible is calling a remote file store, since I think it usually requires calling to a write-protected cloud storage provider. Any tips on this? I have what should be an otherwise working example here: https://gist.github.com/abarciauskas-bgse/d0aac2ae9bf0b06f52a577d0a6251b2d - let me know if this is an ok format to share for reproducing the issue.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488
531435069	https://github.com/pydata/xarray/issues/3306#issuecomment-531435069	https://api.github.com/repos/pydata/xarray/issues/3306	MDEyOklzc3VlQ29tbWVudDUzMTQzNTA2OQ==	abarciauskas-bgse 15016780	2019-09-14T01:42:22Z	2019-09-14T01:42:22Z	NONE	Update: I've made some progress on determining the source of this issue. It seems related to the source dataset's variables. When I use 2 opendap urls with 4 parameterized variables things work fine Using 2 urls like: `https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/2002/152/20020601090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc?time[0:1:0],lat[0:1:17998],lon[0:1:35999],analysed_sst[0:1:0][0:1:17998][0:1:35999],analysis_error[0:1:0][0:1:17998][0:1:35999],mask[0:1:0][0:1:17998][0:1:35999],sea_ice_fraction[0:1:0][0:1:17998][0:1:35999]` I get back a dataset : <xarray.Dataset> Dimensions: (lat: 17999, lon: 36000, time: 2) Coordinates: * lat (lat) float32 -89.99 -89.98 -89.97 ... 89.97 89.98 89.99 * lon (lon) float32 -179.99 -179.98 -179.97 ... 179.99 180.0 * time (time) datetime64[ns] 2018-04-22T09:00:00 2018-04-23T09:00:00 Data variables: analysed_sst (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> analysis_error (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> Attributes: Conventions: CF-1.5 title: Daily MUR SST, Final product however if I omit the parameterized data variables using urls such as: `https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/JPL/MUR/v4.1/090000-JPL-L4_GHRSST-SSTfnd-MUR-GLOB-v02.0-fv04.1.nc` I get back an additional variable: <xarray.Dataset> Dimensions: (lat: 17999, lon: 36000, time: 2) Coordinates: * lat (lat) float32 -89.99 -89.98 -89.97 ... 89.97 89.98 89.99 * lon (lon) float32 -179.99 -179.98 -179.97 ... 179.99 180.0 * time (time) datetime64[ns] 2018-04-22T09:00:00 2018-04-23T09:00:00 Data variables: analysed_sst (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> analysis_error (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> mask (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> sea_ice_fraction (time, lat, lon) float32 dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> dt_1km_data (time, lat, lon) timedelta64[ns] dask.array<shape=(2, 17999, 36000), chunksize=(1, 1000, 1000)> Attributes: Conventions: CF-1.5 title: Daily MUR SST, Final product In the first case (with the parameterized variables) I achieve the expected result (data is stored on S3). In the second case (no parameterized variables), `store` store is never included in the graph the workers seem to stall.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	`ds.load()` with local files stalls and fails, and `to_zarr` does not include `store` in the dask graph 493058488

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);