home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

6 rows where comments = 4, type = "issue" and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 5
  • open 1

type 1

  • issue · 6 ✖

repo 1

  • xarray 6
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
309686915 MDU6SXNzdWUzMDk2ODY5MTU= 2027 square-bracket slice a Dataset with a DataArray crusaderky 6213168 open 0     4 2018-03-29T09:39:57Z 2022-04-18T03:51:25Z   MEMBER      

Given this: ``` ds = xarray.Dataset( data_vars={ 'vote': ('pupil', [5, 7, 8]), 'age': ('pupil', [15, 14, 16]) }, coords={ 'pupil': ['Alice', 'Bob', 'Charlie'] })

<xarray.Dataset> Dimensions: (pupil: 3) Coordinates: * pupil (pupil) <U7 'Alice' 'Bob' 'Charlie' Data variables: vote (pupil) int64 5 7 8 age (pupil) int64 15 14 16 ```

Why does this work: ``` ds.age[ds.vote >= 6]

<xarray.DataArray 'age' (pupil: 2)> array([14, 16]) Coordinates: * pupil (pupil) <U7 'Bob' 'Charlie' ```

But this doesn't? ``` ds[ds.vote >= 6]

KeyError: False `ds.vote >= 6`` is a DataArray with dims=('pupil', ) and dtype=bool, so I can't think of any ambiguity in what I want to achieve?

Workaround: ``` ds.sel(pupil=ds.vote >= 6)

<xarray.Dataset> Dimensions: (pupil: 2) Coordinates: * pupil (pupil) <U7 'Bob' 'Charlie' Data variables: vote (pupil) int64 7 8 age (pupil) int64 14 16 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2027/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
506885041 MDU6SXNzdWU1MDY4ODUwNDE= 3397 "How Do I..." formatting issues crusaderky 6213168 closed 0     4 2019-10-14T21:32:27Z 2019-10-16T21:41:06Z 2019-10-16T21:41:06Z MEMBER      

@dcherian The new page http://xarray.pydata.org/en/stable/howdoi.html (#3357) is somewhat painful to read on readthedocs. The table goes out of the screen and one is forced to scroll left and right non stop.

Maybe a better alternative could be with Sphinx definitions syntax (which allows for automatic reflowing)?

rst How do I ... ============ Add variables from other datasets to my dataset? :py:meth:`Dataset.merge` (that's a 4 spaces indent)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3397/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
168469112 MDU6SXNzdWUxNjg0NjkxMTI= 926 stack() on dask array produces inefficient chunking crusaderky 6213168 closed 0     4 2016-07-30T14:12:34Z 2019-02-01T16:04:43Z 2019-02-01T16:04:43Z MEMBER      

Whe the stack() method is used on a xarray with dask backend, one would expect that every output chunk is produced by exactly 1 input chunk.

This is not the case, as stack() actually produces an extremely fragmented dask array: https://gist.github.com/crusaderky/07991681d49117bfbef7a8870e3cba67

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/926/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
172291585 MDU6SXNzdWUxNzIyOTE1ODU= 979 align() should align chunks crusaderky 6213168 closed 0     4 2016-08-20T21:25:01Z 2019-01-24T17:19:30Z 2019-01-24T17:19:30Z MEMBER      

In the xarray docs I read

With the current version of dask, there is no automatic alignment of chunks when performing operations between dask arrays with different chunk sizes. If your computation involves multiple dask arrays with different chunks, you may need to explicitly rechunk each array to ensure compatibility.

While chunk auto-alignment could be done within the dask library, that would be limited to arrays with the same dimensionality and same dims order. For example it would not be possible to have a dask library call to align the chunks on xarrays with the following dims: - (time, latitude, longitude) - (time) - (longitude, latitude)

even if it makes perfect sense in xarray.

I think xarray.align() should take care of it automatically.

A safe algorithm would be to always scale down the chunksize when in conflict. This would prevent having chunks larger than expected, and should minimise (in a greedy way) the number of operations. It's also a good idea on dask.distributed, where merging two chunks could cause one of them to travel on the network - which is very expensive.

e.g. to reconcile chunksizes a: (5, 10, 6) b: (5, 7, 9) the algorithm would rechunk both arrays to (5, 7, 3, 6).

Finally, when served with a numpy-based array and a dask-based array, align() should convert the numpy array to dask. The critical use case that would benefit from this behaviour is when align() is invoked inside a broadcast() between a tiny constant you just loaded from csv/pandas/pure python list/whatever - e.g. dims=(time, ) shape=(100, ) - and a huge dask-backed array e.g. dims=(time, scenario) shape=(100, 2**30) chunks=(25, 2**20).

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/979/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
339611449 MDU6SXNzdWUzMzk2MTE0NDk= 2273 to_netcdf uses deprecated and unnecessary dask call crusaderky 6213168 closed 0     4 2018-07-09T21:20:20Z 2018-07-31T20:03:41Z 2018-07-31T19:42:20Z MEMBER      

```

ds = xarray.Dataset({'x': 1}) ds.to_netcdf('foo.nc') dask/utils.py:1010: UserWarning: Deprecated, see dask.base.get_scheduler instead ```

Stack trace: ```

xarray/backends/common.py(44)get_scheduler() 43 from dask.utils import effective_get ---> 44 actual_get = effective_get(get, collection) ``` There are two separate problems here:

  • dask recently changed API from get(get=callable) to get(scheduler=str). Should we
  • just increase the minimum version of dask (I doubt anybody will complain)
  • go through the hoops of dynamically invoking a different API depending on the dask version :sweat:
  • silence the warning now, and then increase the minimum version of dask the day that dask removes the old API entirely (risky)?
  • xarray is calling dask even when it's unnecessary, as none of the variables in the example Dataset had a dask backend. I don't think there are any CI suites for NetCDF without dask. I'm also wondering if they would bring any actual added value, as dask is small, has no exotic dependencies, and is pure Python; so I doubt anybody will have problems installing it whatever his setup is.

@shoyer opinion?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2273/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
172290413 MDU6SXNzdWUxNzIyOTA0MTM= 978 broadcast() broken on dask backend crusaderky 6213168 closed 0     4 2016-08-20T20:56:33Z 2016-12-09T20:28:42Z 2016-12-09T20:28:42Z MEMBER      

``` python

a = xarray.DataArray([1,2]).chunk(1) a <xarray.DataArray (dim_0: 2)> dask.array<xarray-..., shape=(2,), dtype=int64, chunksize=(1,)> Coordinates: * dim_0 (dim_0) int64 0 1 xarray.broadcast(a) (<xarray.DataArray (dim_0: 2)> array([1, 2]) Coordinates: * dim_0 (dim_0) int64 0 1,) ```

The problem is actually somewhere in the constructor of DataArray. In alignment.py:362, we have return DataArray(data, ...) where data is a Variable with dask backend. The returned DataArray object has a numpy backend. As a workaround, changing that line to return DataArray(data.data, ...) (thus passing a dask array) fixes the problem.

After that however there's a new issue: whenever broadcast adds a dimension to an array, it creates it in a single chunk, as opposed to copying the chunking of the other arrays. This can easily call a host to go out of memory, and makes it harder to work with the arrays afterwards because chunks won't match.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/978/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 1130.403ms · About: xarray-datasette