home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where comments = 8 and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 4
  • pull 1

state 2

  • closed 4
  • open 1

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
272004812 MDU6SXNzdWUyNzIwMDQ4MTI= 1699 apply_ufunc(dask='parallelized') output_dtypes for datasets crusaderky 6213168 open 0     8 2017-11-07T22:18:23Z 2020-04-06T15:31:17Z   MEMBER      

When a Dataset has variables with different dtypes, there's no way to tell apply_ufunc that the same function applied to different variables will produce different dtypes:

``` ds1 = xarray.Dataset(data_vars={'a': ('x', [1, 2]), 'b': ('x', [3.0, 4.5])}).chunk() ds2 = xarray.apply_ufunc(lambda x: x + 1, ds1, dask='parallelized', output_dtypes=[float]) ds2

<xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: a (x) float64 dask.array<shape=(2,), chunksize=(2,)> b (x) float64 dask.array<shape=(2,), chunksize=(2,)>

ds2.compute()

<xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: a (x) int64 2 3 b (x) float64 4.0 5.5 ```

Proposed solution

When the output is a dataset, apply_ufunc could accept either output_dtypes=[t] (if all output variables will have the same dtype) or output_dtypes=[{var1: t1, var2: t2, ...}]. In the example above, it would be output_dtypes=[{'a': int, 'b': float}].

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1699/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
193294729 MDU6SXNzdWUxOTMyOTQ3Mjk= 1152 Scalar coords seep into index coords crusaderky 6213168 closed 0     8 2016-12-03T15:43:53Z 2019-02-01T16:02:12Z 2019-02-01T16:02:12Z MEMBER      

Is this by design? I can't put any sense in it ```

a = xarray.DataArray([1, 2, 3], dims=['x'], coords={'x': [1, 2, 3], 'y': 10}) a.coords['x'] <xarray.DataArray 'x' (x: 3)> array([1, 2, 3]) Coordinates: * x (x) int64 1 2 3 y int64 10 ```

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1152/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
317421267 MDU6SXNzdWUzMTc0MjEyNjc= 2079 New feature: interp1d crusaderky 6213168 closed 0     8 2018-04-24T22:45:03Z 2018-05-06T19:30:32Z 2018-05-06T19:30:32Z MEMBER      

I've written a series of wrappers for the 1-dimensional scipy interpolators.

Prototype code and colourful demo plots: https://gist.github.com/crusaderky/b0aa6b8fdf6e036cb364f6f40476cc67

Features

  • Interpolate a ND array on any arbitrary dimension
  • Nearest-neighbour, linear, quadratic, cubic, Akima, PCHIP, and custom interpolators are supported
  • dask supported on both on the interpolated array and x_new
  • Supports ND x_new arrays
  • The CPU-heavy interpolator generation (splrep) is executed only once and then can be applied to multiple x_new (splev)
  • Pickleable and distributed-friendly

Design hacks

  • Depends on dask module, even when all inputs are based on plain numpy.
  • Abuses attrs and the ability to invoke a.attrname to get the user experience of a new DataArray method.
  • Abuses the fact that the chunks of a dask.array.Array can contain anything and you won't notice until you compute them.

Limitations

  • Can't dump to netcdf. Not solvable without hacking into the implementation details of scipy.
  • Datasets are not supported. Trivial to fix after solving #1699.
  • Chunks are not supported on x_new. Trivial to fix after solving #1995.
  • Chunks are not supported along the interpolated dimension. This is very complicated to solve. If x and x_new were always monotonic ascending,it would be (not trivially) solvable with dask.array.ghost.ghost. If you make no assumptions about monotonicity, things become way more complicated. A solution would need to go in the dask module, and then be invoked trivially from here with dask='allowed'.
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2079/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
253349435 MDExOlB1bGxSZXF1ZXN0MTM3OTYwNDEw 1532 Avoid computing dask variables on __repr__ and __getattr__ crusaderky 6213168 closed 0   0.10 2415632 8 2017-08-28T14:37:20Z 2017-09-21T22:30:02Z 2017-09-21T20:55:43Z MEMBER   0 pydata/xarray/pulls/1532
  • [x] Fixes #1522
  • [x] Tests added / passed
  • [x] Passes git diff upstream/master | flake8 --diff
  • [x] Fully documented, including whats-new.rst for all changes and api.rst for new API

Stop dataset data vars and non-index dataset/dataarray coords from being loaded by repr() and getattr(). The latter is particularly acute when working in Jupyter, which does a dozen or so getattr() when printing an object.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1532/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
252543868 MDU6SXNzdWUyNTI1NDM4Njg= 1522 Dataset.__repr__ computes dask variables crusaderky 6213168 closed 0     8 2017-08-24T09:37:12Z 2017-09-21T20:55:43Z 2017-09-21T20:55:43Z MEMBER      

DataArray.__repr__ and Variable.__repr__ print a placeholder if the data uses the dask backend. Not so Dataset.__repr__, which tries computing the data before printing a tiny preview of it. This issue is extremely annoying when working in Jupyter, and particularly acute if the chunks are very big or are at the end of a very long chain of computation.

For data variables, the expected behaviour is to print a placeholder just like DataArray does. For coords, we could either - print a placeholders (same treatment as data variables) - automatically invoke load() when the coord is added to the dataset - see #1521 for discussion.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1522/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 28.582ms · About: xarray-datasette