home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 253133290

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/1024#issuecomment-253133290 https://api.github.com/repos/pydata/xarray/issues/1024 253133290 MDEyOklzc3VlQ29tbWVudDI1MzEzMzI5MA== 6213168 2016-10-12T06:49:23Z 2016-10-12T06:49:23Z MEMBER

I've been thinking about this... Maybe the simple, clean solution is to simply invoke compute() on all coords as soon as they are assigned to the DataArray / Dataset?

On 12 Oct 2016 02:18, "Stephan Hoyer" notifications@github.com wrote:

@shoyer commented on this pull request.

Apologies for the delay here -- my comments were stuck as a "pending" GitHub review.

I am still wondering what the right behavior is for variables used as indexes. (These can be dask arrays, too.)

I think there is a good case for skipping these variables in .chunk(), but we probably do want to make indexes still cache as pandas.Index objects, because otherwise repeated evaluation of dask arrays to build the

index for alignment or indexing gets expensive.

In xarray/core/dataset.py https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240:

@@ -792,13 +806,19 @@ def chunks(self): array. """ chunks = {} - for v in self.variables.values(): - for v in self.data_vars.values():

I am concerned about skipping non-data_vars here. Coordinates could still be chunked, e.g., if they were loaded from a file, or created directly from

dask arrays.

In xarray/core/dataset.py https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240:

if v.chunks is not None: new_chunks = list(zip(v.dims, v.chunks)) if any(chunk != chunks[d] for d, chunk in new_chunks if d in chunks): raise ValueError('inconsistent chunks') chunks.update(new_chunks) - if chunks:

I guess this method is inconsistent with Variable.chunks, but it currently always returns a dict.

I would either skip this change or use something like my version.

In xarray/core/dataset.py https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240:

@@ -851,6 +871,9 @@ def selkeys(dict_, keys): return dict((d, dict_[d]) for d in keys if d in dict_)

def maybe_chunk(name, var, chunks): - if name not in self.data_vars:

I see your point about performance, but I think that mostly holds true for indexes. So I would be inclined to adjust this to only skip variables in self.dims (aka indexes used for alignment).

I am still concerned about skipping coords if they are already dask arrays. If they are already dask arrays, then .chunk() should probably adjust their chunks anyways.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1024#pullrequestreview-3794240, or mute the thread https://github.com/notifications/unsubscribe-auth/AF7OMBL7_F2IV5P04Em8NhPy-K8aNrGZks5qzDVlgaJpZM4KLurN .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  180451196
Powered by Datasette · Queries took 0.607ms · About: xarray-datasette