home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where issue = 216215022 and user = 1386642 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • nbren12 · 5 ✖

issue 1

  • API for reshaping DataArrays as 2D "data matrices" for use in machine learning · 5 ✖

author_association 1

  • CONTRIBUTOR 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
337970838 https://github.com/pydata/xarray/issues/1317#issuecomment-337970838 https://api.github.com/repos/pydata/xarray/issues/1317 MDEyOklzc3VlQ29tbWVudDMzNzk3MDgzOA== nbren12 1386642 2017-10-19T16:56:37Z 2017-10-19T16:56:37Z CONTRIBUTOR

Sorry. I guess I should have made my last comment in the PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API for reshaping DataArrays as 2D "data matrices" for use in machine learning 216215022
337796691 https://github.com/pydata/xarray/issues/1317#issuecomment-337796691 https://api.github.com/repos/pydata/xarray/issues/1317 MDEyOklzc3VlQ29tbWVudDMzNzc5NjY5MQ== nbren12 1386642 2017-10-19T04:32:03Z 2017-10-19T04:32:03Z CONTRIBUTOR

After using my own version of this code for the past month or so, it has occurred to me that this API probably will not support stacking arrays of with different sizes along shared arrays. For instance, I need to "stack" humidity below an altitude of 10km with temperature between 0 and 16 km. IMO, the easiest way to do this would be to change these methods into top-level functions which can take any dict or iterable of datarrays. We could leave that for a later PR of course.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API for reshaping DataArrays as 2D "data matrices" for use in machine learning 216215022
330282841 https://github.com/pydata/xarray/issues/1317#issuecomment-330282841 https://api.github.com/repos/pydata/xarray/issues/1317 MDEyOklzc3VlQ29tbWVudDMzMDI4Mjg0MQ== nbren12 1386642 2017-09-18T16:45:55Z 2017-09-18T16:46:37Z CONTRIBUTOR

@shoyer I wrote a class that does this a while ago. It is available here: data_matrix.py. It is used like this ```python

D is a dataset

the signature for DataMatrix.init is

DataMatrix(feature_dims, sample_dims, variables)

mat = DataMatrix(['z'], ['x'], ['a', 'b']) y = mat.dataset_to_mat(D) x = mat.mat_to_dataset(y) `` One of the problems I had to handle was with concatenating/stacking DataArrays with different numbers of dimensions---stackandunstackcombined withto_arraycan only handle the case where the desired feature variables all have the same dimensionality. ATM my code stacks the desired dimensions for each variable and then manually callsnp.hstack` to produce the final matrix, but I bet it would be easy to create a pandas Index object which can handle this use case.

Would you be open to a PR along these lines?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API for reshaping DataArrays as 2D "data matrices" for use in machine learning 216215022
288607926 https://github.com/pydata/xarray/issues/1317#issuecomment-288607926 https://api.github.com/repos/pydata/xarray/issues/1317 MDEyOklzc3VlQ29tbWVudDI4ODYwNzkyNg== nbren12 1386642 2017-03-23T03:32:50Z 2017-03-23T03:40:22Z CONTRIBUTOR

I had the chance to play around with stack and unstack, and it appears that these actually do nearly all the work needed here, so you can disregard my last comment. The only logic which is somewhat unwieldy is code which creates a DataArray from the eofs dask array. This is a complete example using the air dataset: ```python air = load_dataset("air_temperature").air

A = air.stack(features=['lat', 'lon']).chunk() A-= A.mean('features')

,,eofs = svd_compressed(A.data, 4)

wrap eofs in dataarray

dims = ['modes', 'features'] coords = {}

for i, dim in enumerate(dims): if dim in A.dims: coords[dim] = A[dim] elif dim in coords: pass else: coords[dim] = np.arange(eofs.shape[i])

eofs = xr.DataArray(eofs, dims=dims, coords=coords).unstack('features') `` This is pretty compact as is, so maybe the ugly final bit could be replaced with a convenience function likeunstack_array(eofs, dims, coords)or a method callA.unstack_array(eofs, dims, new_coords={})`.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API for reshaping DataArrays as 2D "data matrices" for use in machine learning 216215022
288590846 https://github.com/pydata/xarray/issues/1317#issuecomment-288590846 https://api.github.com/repos/pydata/xarray/issues/1317 MDEyOklzc3VlQ29tbWVudDI4ODU5MDg0Ng== nbren12 1386642 2017-03-23T01:32:55Z 2017-03-23T01:32:55Z CONTRIBUTOR

Cool! Thanks for that link. As far as the API is concerned, I think I like the ReshapeCoder approach a little better because it does not require keeping track of a feature_dims vector list throughout the code, like my class does. It also could generalize beyond just creating a 2D array.

To produce a dataset B(samples,features) from a dataset A(x,y,z,t) how do you feel about a syntax like this: ```python rs = Reshaper(dict(samples=('t',), features=('x', 'y', 'z')), coords=A.coords)

B = rs.encode(A)

,,eofs =svd(B.data)

eofs is now a 2D dask array so we need to give

it dimension information

eof_dims = ['mode', 'features'] rs.decode(eofs, eof_dims)

to decode XArray object we don't need to pass dimension info

rs.decode(B) ```

On the other hand, it would be nice to be able to reshape data through a syntax like

A.reshape.encode(dict(...))
{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API for reshaping DataArrays as 2D "data matrices" for use in machine learning 216215022

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3040.861ms · About: xarray-datasette