issue_comments: 253133290

This data as json

html_url	issue_url	id	node_id	user	created_at	updated_at	author_association	body	reactions	performed_via_github_app	issue
https://github.com/pydata/xarray/pull/1024#issuecomment-253133290	https://api.github.com/repos/pydata/xarray/issues/1024	253133290	MDEyOklzc3VlQ29tbWVudDI1MzEzMzI5MA==	6213168	2016-10-12T06:49:23Z	2016-10-12T06:49:23Z	MEMBER	I've been thinking about this... Maybe the simple, clean solution is to simply invoke compute() on all coords as soon as they are assigned to the DataArray / Dataset? On 12 Oct 2016 02:18, "Stephan Hoyer" notifications@github.com wrote: @shoyer commented on this pull request. Apologies for the delay here -- my comments were stuck as a "pending" GitHub review. I am still wondering what the right behavior is for variables used as indexes. (These can be dask arrays, too.) I think there is a good case for skipping these variables in .chunk(), but we probably do want to make indexes still cache as pandas.Index objects, because otherwise repeated evaluation of dask arrays to build the index for alignment or indexing gets expensive. In xarray/core/dataset.py https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240: @@ -792,13 +806,19 @@ def chunks(self): array. """ chunks = {} - for v in self.variables.values(): - for v in self.data_vars.values(): I am concerned about skipping non-data_vars here. Coordinates could still be chunked, e.g., if they were loaded from a file, or created directly from dask arrays. In xarray/core/dataset.py https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240: `if v.chunks is not None: new_chunks = list(zip(v.dims, v.chunks)) if any(chunk != chunks[d] for d, chunk in new_chunks if d in chunks): raise ValueError('inconsistent chunks') chunks.update(new_chunks)` - if chunks: I guess this method is inconsistent with Variable.chunks, but it currently always returns a dict. I would either skip this change or use something like my version. In xarray/core/dataset.py https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240: @@ -851,6 +871,9 @@ def selkeys(dict_, keys): return dict((d, dict_[d]) for d in keys if d in dict_) `def maybe_chunk(name, var, chunks):` - if name not in self.data_vars: I see your point about performance, but I think that mostly holds true for indexes. So I would be inclined to adjust this to only skip variables in self.dims (aka indexes used for alignment). I am still concerned about skipping coords if they are already dask arrays. If they are already dask arrays, then .chunk() should probably adjust their chunks anyways. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240, or mute the thread https://github.com/notifications/unsubscribe-auth/AF7OMBL7_F2IV5P04Em8NhPy-K8aNrGZks5qzDVlgaJpZM4KLurN .	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }		180451196