github: issues: 4 rows where repo = 13221727, state = "closed" and user = 1117224 sorted by updated

4 rows where repo = 13221727, state = "closed" and user = 1117224 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	closed_at	author_association	draft	pull_request	body	reactions	state_reason	repo	type
309100522	MDU6SXNzdWUzMDkxMDA1MjI=	2018	MemoryError when using save_mfdataset()	NicWayand 1117224	closed	1	2018-03-27T19:22:28Z	2020-03-28T07:51:17Z	2020-03-28T07:51:17Z	NONE			Code Sample, a copy-pastable example if possible ```python import xarray as xr import dask Dummy data that on disk is about ~200GB da = xr.DataArray(dask.array.random.normal(0, 1, size=(12,408,1367,304,448), chunks=(1, 1, 1, 304, 448)), dims=('ensemble', 'init_time', 'fore_time', 'x', 'y')) Perform some calculation on the dask data da_sum = da.sum(dim='x').sum(dim='y')(2525)/(10*6) Write to multiple files c_e, datasets = zip(da_sum.to_dataset(name='sic').groupby('ensemble')) paths = ['file_%s.nc' % e for e in c_e] xr.save_mfdataset(datasets, paths) ``` Problem description Results in a MemoryError, when dask should handle writing this OOM DataArray to multiple within-memory-sized netcdf files. Related SO post here Expected Output 12 netcdf files (grouped by the ensemble dim). Output of `xr.show_versions()` INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Linux OS-release: 4.14.12 machine: x86_64 processor: byteorder: little LC_ALL: C LANG: C LOCALE: None.None xarray: 0.10.2 pandas: 0.22.0 numpy: 1.14.1 scipy: 1.0.0 netCDF4: 1.3.1 h5netcdf: 0.5.0 h5py: 2.7.1 Nio: None zarr: None bottleneck: 1.2.1 cyordereddict: None dask: 0.17.1 distributed: 1.21.1 matplotlib: 2.2.2 cartopy: None seaborn: 0.8.1 setuptools: 38.5.1 pip: 9.0.1 conda: None pytest: None IPython: 6.2.1 sphinx: None	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2018/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue
186326698	MDExOlB1bGxSZXF1ZXN0OTE2Mzk0OTY=	1070	Feature/rasterio	NicWayand 1117224	closed	11	2016-10-31T16:14:55Z	2017-05-22T08:47:40Z	2017-05-22T08:47:40Z	NONE	0	pydata/xarray/pulls/1070	@jhamman started a backend for RasterIO that I have been working on. There are two issues I am stuck on that I could use some help: 1) Lat/long coords are not being decoded correctly (missing from output dataset). Lat/lon projection are correctly calculated and added here (https://github.com/NicWayand/xray/blob/feature/rasterio/xarray/backends/rasterio_.py#L117). But, it appears (with my limited knowledge of xarray) that the lat/long coords contained within `obj` are lost at this line (https://github.com/NicWayand/xray/blob/feature/rasterio/xarray/conventions.py#L930). 2) Lazy-loading needs to be enabled. How can I setup/test this? Are there examples from other backends I could follow? 790	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1070/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
170688064	MDExOlB1bGxSZXF1ZXN0ODA5ODgxNzA=	961	Update time-series.rst	NicWayand 1117224	closed	3	2016-08-11T16:26:58Z	2017-04-03T05:31:06Z	2017-04-03T05:31:06Z	NONE	0	pydata/xarray/pulls/961	Thought it would be helpful to users to know that timezones are not handled here, rather than googling and finding this: https://github.com/pydata/xarray/issues/552	{ "url": "https://api.github.com/repos/pydata/xarray/issues/961/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		xarray 13221727	pull
171504099	MDU6SXNzdWUxNzE1MDQwOTk=	970	Multiple preprocessing functions in open_mfdataset?	NicWayand 1117224	closed	3	2016-08-16T20:01:22Z	2016-08-17T07:01:02Z	2016-08-16T21:46:43Z	NONE			I would like to have multiple functions applied during a open_mfdataset call. Using one works great: `Python ds = xr.open_mfdataset(files,concat_dim='time',engine='pynio', preprocess=lambda x: x.load())` Does the current behavior include multiple calls? (apologizes if this is defined somewhere, I couldn't find any multiple calls examples) Something like: `Python ds = xr.open_mfdataset(files,concat_dim='time',engine='pynio', preprocess=[lambda x: x.load(),lambda y: y['time']=100])`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/970/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	completed	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

4 rows where repo = 13221727, state = "closed" and user = 1117224 sorted by updated_at descending

Code Sample, a copy-pastable example if possible

Dummy data that on disk is about ~200GB

Perform some calculation on the dask data

Write to multiple files

Problem description

Expected Output

Output of xr.show_versions()

790

Advanced export

Output of `xr.show_versions()`