issues: 196541604

This data as json

id	node_id	number	title	user	state	locked	assignee	milestone	comments	created_at	updated_at	closed_at	author_association	active_lock_reason	draft	pull_request	body	reactions	performed_via_github_app	state_reason	repo	type
196541604	MDU6SXNzdWUxOTY1NDE2MDQ=	1173	Some queries	7300413	closed	0			11	2016-12-19T22:53:32Z	2019-01-13T06:27:38Z	2019-01-13T06:00:22Z	NONE				Hello @shoyer @pwolfram @mrocklin @rabernat , I was trying to write a design/requirements doc with ref. to the Columbia meetup, and I had a few queries, on which I wanted your inputs (basically to ask whether they make sense or not!) If you serialize a labeled n-d data array using netCDF or HFD5, it gets written into a single file, which is not really a good option if you want to eventually do distributed processing of the data. Things like HDFS/lustreFS can split files, but that is not really what we want. How do you think this issue could be solved within the xarray+dask framework? is it a matter of adding some code to the dataset.to_netcdf() method or adding a new method that would split the DataArray (based on some user guidelines) into multiple files? Or does it make more sense to add a new serialization format like Zarr? Continuing along similar lines, how does xarray+dask currently decide on how to distribute the workload between dask workers? are there any heuristics to handle data locality? or does experience say that network I/O is fast enough that this is not an issue? I'm asking this question because of this article by Matt: http://blaze.pydata.org/blog/2015/10/28/distributed-hdfs/ If this is desirable, how would one go about implementing it?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/1173/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		completed	13221727	issue

Links from other tables

2 rows from issues_id in issues_labels
11 rows from issue in issue_comments