home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where comments = 14, type = "issue" and user = 6213168 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 4 ✖

state 1

  • closed 4

repo 1

  • xarray 4
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
502130982 MDU6SXNzdWU1MDIxMzA5ODI= 3370 Hundreds of Sphinx errors crusaderky 6213168 closed 0     14 2019-10-03T15:17:09Z 2022-04-17T20:33:05Z 2022-04-17T20:33:05Z MEMBER      

sphinx-build emits a ton of errors that need to be polished out:

https://readthedocs.org/projects/xray/builds/ -> latest -> open last step

Options for the long term: - Change the "Docs" azure pipelines job to crash if there are new failures. From past experience though, this should come together with a sensible way to whitelist errors that can't be fixed. This will severely slow down development as PRs will systematically fail on such a check. - Add a task in the release process where, immediately before closing a release, the maintainer needs to manually go through the sphinx-build log and fix any new issues. This would be a major extra piece of work for the maintainer.

I am honestly not excited by either of the above. Alternative suggestions are welcome.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3370/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
466750687 MDU6SXNzdWU0NjY3NTA2ODc= 3092 black formatting crusaderky 6213168 closed 0     14 2019-07-11T08:43:55Z 2019-08-08T22:34:53Z 2019-08-08T22:34:53Z MEMBER      

I, like many others, have irreversibly fallen in love with black. Can we apply it to the existing codebase and as an enforced CI test? The only (big) problem is that developers will need to manually apply it to any open branches and then merge from master - and even then, merging likely won't be trivial. How did the dask project tackle the issue?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3092/reactions",
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
166439490 MDU6SXNzdWUxNjY0Mzk0OTA= 906 unstack() sorts data alphabetically crusaderky 6213168 closed 0     14 2016-07-19T21:25:26Z 2019-02-23T12:47:00Z 2019-02-23T12:47:00Z MEMBER      

DataArray.unstack() sorts the data alphabetically by label. Besides being poor for performance, this is very problematic whenever the order matters, and the labels are not in alphabetical order to begin with.

``` python

import xarray import pandas

index = [ ['x1', 'first' ], ['x1', 'second'], ['x1', 'third' ], ['x1', 'fourth'], ['x0', 'first' ], ['x0', 'second'], ['x0', 'third' ], ['x0', 'fourth'], ] index = pandas.MultiIndex.from_tuples(index, names=['x', 'count']) s = pandas.Series(list(range(8)), index) a = xarray.DataArray(s) a ```

<xarray.DataArray (dim_0: 8)> array([0, 1, 2, 3, 4, 5, 6, 7], dtype=int64) Coordinates: * dim_0 (dim_0) object ('x1', 'first') ('x1', 'second') ('x1', 'third') ...

python a.unstack('dim_0')

<xarray.DataArray (x: 2, count: 4)> array([[4, 7, 5, 6], [0, 3, 1, 2]], dtype=int64) Coordinates: * x (x) object 'x0' 'x1' * count (count) object 'first' 'fourth' 'second' 'third'

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/906/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue
252541496 MDU6SXNzdWUyNTI1NDE0OTY= 1521 open_mfdataset reads coords from disk multiple times crusaderky 6213168 closed 0     14 2017-08-24T09:29:57Z 2017-10-09T21:15:31Z 2017-10-09T21:15:31Z MEMBER      

I have 200x of the below dataset, split on the 'scenario' axis:

<xarray.Dataset> Dimensions: (fx_id: 39, instr_id: 16095, scenario: 2501) Coordinates: currency (instr_id) object 'GBP' 'USD' 'GBP' 'GBP' 'GBP' 'EUR' 'CHF' ... * fx_id (fx_id) object 'USD' 'EUR' 'JPY' 'ARS' 'AUD' 'BRL' 'CAD' ... * instr_id (instr_id) object 'property_standard_gbp' ... * scenario (scenario) object 'Base Scenario' 'SSMC_1' 'SSMC_2' ... type (instr_id) object 'Common Stock' 'Fixed Amortizing Bond' ... Data variables: fx_rates (fx_id, scenario) float64 1.236 1.191 1.481 1.12 1.264 ... instruments (instr_id, scenario) float64 1.0 1.143 0.9443 1.013 1.176 ... Attributes: base_currency: GBP

I individually dump them to disk with Dataset.to_netcdf(fname, engine='h5netcdf'). Then I try loading them back up with open_mfdataset, but it's mortally slow:

``` %%time xarray.open_mfdataset('*.nc', engine='h5netcdf')

Wall time: 30.3 s ```

The problem is caused by the coords being read from disk multiple times. Workaround:

%%time def load_coords(ds): for coord in ds.coords.values(): coord.load() return ds xarray.open_mfdataset('*.nc', engine='h5netcdf', preprocess=load_coords) Wall time: 12.3 s

Proposed solutions: 1. Implement the above workaround directly inside open_mfdataset() 2. change open_dataset() to always eagerly load the coords to memory, regardless of the chunks parameter. Is there any valid use case where lazy coords are actually desirable?

An additional, more radical observation is that, very frequently, a user knows in advance that all coords are aligned. In this use case, the user could explicitly request xarray to blindly trust this assumption, and thus skip loading the coords not based on concat_dim in all datasets beyond the first.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1521/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 27.091ms · About: xarray-datasette