home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

4 rows where issue = 195050684 and user = 743508 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • mangecoeur · 4 ✖

issue 1

  • Generated Dask graph is huge - performance issue? · 4 ✖

author_association 1

  • CONTRIBUTOR 4
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
266598007 https://github.com/pydata/xarray/issues/1161#issuecomment-266598007 https://api.github.com/repos/pydata/xarray/issues/1161 MDEyOklzc3VlQ29tbWVudDI2NjU5ODAwNw== mangecoeur 743508 2016-12-13T00:29:16Z 2016-12-13T00:29:16Z CONTRIBUTOR

Seems to run a lot faster for me too...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generated Dask graph is huge - performance issue? 195050684
266596464 https://github.com/pydata/xarray/issues/1161#issuecomment-266596464 https://api.github.com/repos/pydata/xarray/issues/1161 MDEyOklzc3VlQ29tbWVudDI2NjU5NjQ2NA== mangecoeur 743508 2016-12-13T00:20:12Z 2016-12-13T00:20:12Z CONTRIBUTOR

Done with PR #1162

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generated Dask graph is huge - performance issue? 195050684
266587849 https://github.com/pydata/xarray/issues/1161#issuecomment-266587849 https://api.github.com/repos/pydata/xarray/issues/1161 MDEyOklzc3VlQ29tbWVudDI2NjU4Nzg0OQ== mangecoeur 743508 2016-12-12T23:32:19Z 2016-12-12T23:33:03Z CONTRIBUTOR

Thanks, I've been looking around and I think i'm getting close, however i'm not sure the best way to turn the array slice i get from vindex into a DataArray variable. I'm thinking I might but together a draft PR for comments. This is what i have so far:

```python

def isel_points(self, dim='points', **indexers): """Returns a new dataset with each array indexed pointwise along the specified dimension(s).

This method selects pointwise values from each array and is akin to
the NumPy indexing behavior of `arr[[0, 1], [0, 1]]`, except this
method does not require knowing the order of each array's dimensions.

Parameters
----------
dim : str or DataArray or pandas.Index or other list-like object, optional
    Name of the dimension to concatenate along. If dim is provided as a
    string, it must be a new dimension name, in which case it is added
    along axis=0. If dim is provided as a DataArray or Index or
    list-like object, its name, which must not be present in the
    dataset, is used as the dimension to concatenate along and the
    values are added as a coordinate.
**indexers : {dim: indexer, ...}
    Keyword arguments with names matching dimensions and values given
    by array-like objects. All indexers must be the same length and
    1 dimensional.

Returns
-------
obj : Dataset
    A new Dataset with the same contents as this dataset, except each
    array and dimension is indexed by the appropriate indexers. With
    pointwise indexing, the new Dataset will always be a copy of the
    original.

See Also
--------
Dataset.sel
Dataset.isel
Dataset.sel_points
DataArray.isel_points
"""
from .dataarray import DataArray

indexer_dims = set(indexers)

def relevant_keys(mapping):
    return [k for k, v in mapping.items()
            if any(d in indexer_dims for d in v.dims)]

data_vars = relevant_keys(self.data_vars)
coords = relevant_keys(self.coords)

# all the indexers should be iterables
keys = indexers.keys()
indexers = [(k, np.asarray(v)) for k, v in iteritems(indexers)]
# Check that indexers are valid dims, integers, and 1D
for k, v in indexers:
    if k not in self.dims:
        raise ValueError("dimension %s does not exist" % k)
    if v.dtype.kind != 'i':
        raise TypeError('Indexers must be integers')
    if v.ndim != 1:
        raise ValueError('Indexers must be 1 dimensional')

# all the indexers should have the same length
lengths = set(len(v) for k, v in indexers)
if len(lengths) > 1:
    raise ValueError('All indexers must be the same length')

# Existing dimensions are not valid choices for the dim argument
if isinstance(dim, basestring):
    if dim in self.dims:
        # dim is an invalid string
        raise ValueError('Existing dimension names are not valid '
                         'choices for the dim argument in sel_points')
elif hasattr(dim, 'dims'):
    # dim is a DataArray or Coordinate
    if dim.name in self.dims:
        # dim already exists
        raise ValueError('Existing dimensions are not valid choices '
                         'for the dim argument in sel_points')

if not utils.is_scalar(dim) and not isinstance(dim, DataArray):
    dim = as_variable(dim, name='points')

variables = OrderedDict()
indexers_dict = dict(indexers)
non_indexed = list(set(self.dims) - indexer_dims)

# TODO need to figure out how to make sure we get the indexed vs non indexed dimensions in the right order
for name, var in self.variables.items():
    slc = []

    for k in var.dims:
        if k in indexers_dict:
            slc.append(indexers_dict[k])
        else:
            slc.append(slice(None, None))
    if hasattr(var.data, 'vindex'):
        variables[name] = DataArray(var.data.vindex[tuple(slc)], name=name)
    else:
        variables[name] = var[tuple(slc)]

points_len = lengths.pop()

new_variables = OrderedDict()
for name, var in variables.items():
    if name not in self.dims:
        coords = [variables[k] for k in non_indexed]
        new_variables[name] = DataArray(var, coords=[np.arange(points_len)] + coords, dims=[dim] + non_indexed)

return xr.merge([v for k,v in new_variables.items() if k not in selection.dims])
# TODO: This would be sped up with vectorized indexing. This will
# require dask to support pointwise indexing as well.

return concat([self.isel(**d) for d in

[dict(zip(keys, inds)) for inds in

zip(*[v for k, v in indexers])]],

dim=dim, coords=coords, data_vars=data_vars)

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generated Dask graph is huge - performance issue? 195050684
266519121 https://github.com/pydata/xarray/issues/1161#issuecomment-266519121 https://api.github.com/repos/pydata/xarray/issues/1161 MDEyOklzc3VlQ29tbWVudDI2NjUxOTEyMQ== mangecoeur 743508 2016-12-12T18:59:15Z 2016-12-12T18:59:15Z CONTRIBUTOR

Ok I will have a look, where is this implemented (I always seem to have trouble pinpointing the dask-specific bits in the codebase :S )

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Generated Dask graph is huge - performance issue? 195050684

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 10.434ms · About: xarray-datasette