home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

3 rows where issue = 359240638 and user = 514522 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • horta · 3 ✖

issue 1

  • Updated text for indexing page · 3 ✖

author_association 1

  • CONTRIBUTOR 3
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
422368970 https://github.com/pydata/xarray/issues/2410#issuecomment-422368970 https://api.github.com/repos/pydata/xarray/issues/2410 MDEyOklzc3VlQ29tbWVudDQyMjM2ODk3MA== horta 514522 2018-09-18T12:17:06Z 2018-09-18T12:17:06Z CONTRIBUTOR

I will first try to have both together. I'm well aware that learning by examples (that is true for me at least and apparently to most of people: tldr library), so at first I will try to combine all in one page:

  1. Starts with examples, going from simple ones to more complicated one with no definition whasoever.
  2. Begins a section defining terms and giving examples that ellucidate them (the first section we have here)
  3. Ends with a formal description of the algorithm (the second section we have here)

I prefer starting with 2 and 3 for me to actually understand xarray...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Updated text for indexing page 359240638
421998857 https://github.com/pydata/xarray/issues/2410#issuecomment-421998857 https://api.github.com/repos/pydata/xarray/issues/2410 MDEyOklzc3VlQ29tbWVudDQyMTk5ODg1Nw== horta 514522 2018-09-17T12:40:52Z 2018-09-17T12:40:52Z CONTRIBUTOR

I have updated mainly the Indexing and selection data section. I'm proposing an indexing notation using [] operator vs () function call to differentiate between dimension lookup. But more importantly, I'm working out a precise definition of data array indexing in section Formal indexing definition.

Xarray definition

A data array a has D dimensions, ordered from 0 to D. It contains an array of dimensionality D. The first dimension of that array is associated with the first dimension of the data array, and so forth. That array is returned by the data array attribute values . A named data array is a data array with the name attribute of string value: ```python

import xarray as xr

a = xr.DataArray([[0, 1], [2, 3], [4, 5]]) a.name = "My name" a <xarray.DataArray 'My name' (dim_0: 3, dim_1: 2)> array([[0, 1], [2, 3], [4, 5]]) Dimensions without coordinates: dim_0, dim_1 ```

Each data array dimension has an unique name attribute of string type and can be accessed via data array dims attribute of tuple type. The name of the dimension i is a.dims[i] : ```python

a.dims[0] 'dim_0' ```

A data array can have zero or more coordinates, represented by a dict-like coords attribute. A coordinate is a named data array, referred also as coordinate data array. Coordinate data arrays have unique names among other coordinate data arrays. A coordinate data array of name x can be retrieved by a.coords[x] .

A coordinate can have zero or more dimensions associated with. A dimension data array is a unidimensional coordinate data array associated with one, and only one, dimension having the same name as the coordinate data array itself. A dimension data array has always one, and only one, coordinate. That coordinate has again a dimension data array associated with: ```python

import numpy as np

a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=["x", "y"], coords={"x": list("abc")}) a <xarray.DataArray (x: 3, y: 2)> array([[0, 1], [2, 3], [4, 5]]) Coordinates: * x (x) <U1 'a' 'b' 'c' Dimensions without coordinates: y ```

The above data array a has two dimensions: "x" and "y". It has a single coordinate "x", with its associated dimension data array a.coords["x"]. The dimension data array definition implies in the following recursion: ```python

a.coords["x"] <xarray.DataArray 'x' (x: 3)> array(['a', 'b', 'c'], dtype='<U1') Coordinates: * x (x) <U1 'a' 'b' 'c' a.coords["x"].coords["x"] <xarray.DataArray 'x' (x: 3)> array(['a', 'b', 'c'], dtype='<U1') Coordinates: * x (x) <U1 'a' 'b' 'c' ```

Coordinate data arrays are meant to provide labels to array positions, allowing for convenient access to array elements: ```python

a.loc["b", :] <xarray.DataArray (y: 2)> array([2, 3]) Coordinates: x <U1 'b' Dimensions without coordinates: y ```

Note that there is no asterisk symbol for coordinate "x" of the above resulting data array, as the coordinate is not associated with any dimension. In other words, the coordinate data array a.loc["b", :].coords["x"] is not a dimension data array.

Indexing and selecting data

There are four different but equally powerful ways of selecting data from a data array. They differ only on the type of dimension and index lookups: position-based lookup and label-based lookup: | Dimension lookup | Index lookup | Data array | |------------------|----------------|--------------------| | Position-based | Position-based | a[:, 0] | | Position-based | Label-based | a.loc[:, "UK"] | | Label-based | Position-based | a(country=0) | | Label-based | Label-based | a.loc(country="UK")|

A dimension position-based lookup is determined by the used position in the index operator: a[first_dim, second_dim, ...] and a.loc[first_dim, second_dim, ...]. An index position-based lookup is determined by the provided integers or slices: a[0, [3, 9], :, ...] and a.loc(country=0, time=[3, 9], space=slice(None)).

A dimension label-based lookup is determined by the provided dimension name: a(country=0) and a.loc(country="UK"). An index label-based loookup is determined by the provided index labels or slices [1]: a.loc[:, "UK"] and a.loc(countr="UK").

[1] An index label is any Numpy data type object.

Consider the following data array: ```python

a = xr.DataArray(np.arange(6).reshape((3, 2)), dims=["year", "country"], coords={"year": [1990, 1994, 1998], "country": ["UK", "US"]}) a <xarray.DataArray (year: 3, country: 2)> array([[0, 1], [2, 3], [4, 5]]) Coordinates: * year (year) int64 1990 1994 1998 * country (country) <U2 'UK' 'US' ```

The expressions a[:, 0], a.loc[:, "UK"], a(country=0), and a.loc(country="UK") will all produce the same result: ```python

a.loc[:, "UK"] <xarray.DataArray (year: 3)> array([0, 2, 4]) Coordinates: * year (year) int64 1990 1994 1998 country <U2 'UK' ```

Formal indexing definition

Let A be the dimensionality of the a, the data array being indexed. Let b be the resulting data array. The Python operations for indexing is formally translated into an A-tuple: (i_1, i_2, ..., i_A) for which i_j is a named data array whose values are index labels. This data array, as usual, can have 0, 1, or more dimensions. Its construction is described later in this section.

Let (r_1, r_2, ..., r_A) be the tuple representing the indices of a. Precisely, temporarily create dimension data arrays with labels from 0 to the dimension size for those dimensions without an associated dimension data array. Therefore, r_j is a dimension data array for dimension j. Also, it is required that i_j values are a subset of r_j values.

For each j, define the lists I_j = [(i_j0, 0), (i_j1, 1), ...] and R_j = [(r_j0, 0), (r_j1, 1), ...] with pairs of data array values and positions. Perform a SQL JOIN as follows:

  1. Apply the Cartesian product between the values of I_j and the values of R_j.
  2. Preserve only those tuples that have equal values for the first dimension.

Consider i_0 and r_0 defined as follows: ```python

i_0 <xarray.DataArray (apples: 2, oranges: 1)> array([['a'], ['c']], dtype='<U1') Dimensions without coordinates: apples, oranges r_0 <xarray.DataArray (dim_0: 3)> array(['a', 'b', 'c'], dtype='<U1') Coordinates: * dim_0 (dim_0) <U1 'a' 'b' 'c' `` Performing operations 1 and 2 will result in the listIR_j = [(('a', 0), ('a', 0)), (('c', 1), ('c', 3))]`.

The positions IR_j[0][1][1], IR_j[1][1][1], and so forth are used to access the values of a and assign them to the positions IR_j[0][0][1], IR_j[1][0][1], and so forth of b. Precisely, for each (t_0, t_1, ..., t_A) in itertools.product(IR_0, IR_1, ..., IR_A), assign python b[reshape(t_0[0][1], i_0.shape), ..., reshape(t_A[0][1], i_A.shape)] = a[t_0[1][1], ..., t_A[1][1]]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Updated text for indexing page 359240638
420446944 https://github.com/pydata/xarray/issues/2410#issuecomment-420446944 https://api.github.com/repos/pydata/xarray/issues/2410 MDEyOklzc3VlQ29tbWVudDQyMDQ0Njk0NA== horta 514522 2018-09-11T22:25:23Z 2018-09-11T22:25:23Z CONTRIBUTOR

Thanks guys! Just to make sure, this is a work in progress. i realise that I made some wrong assumptions, and there are more to add into it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Updated text for indexing page 359240638

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 11.688ms · About: xarray-datasette