home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 243964948 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 8

  • mitar 3
  • darothen 2
  • shoyer 1
  • dcherian 1
  • fujiisoup 1
  • fmaussion 1
  • fmfreeze 1
  • Material-Scientist 1

author_association 2

  • NONE 7
  • MEMBER 4

issue 1

  • Support for jagged array · 11 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1458213875 https://github.com/pydata/xarray/issues/1482#issuecomment-1458213875 https://api.github.com/repos/pydata/xarray/issues/1482 IC_kwDOAMm_X85W6pPz dcherian 2448579 2023-03-07T13:54:23Z 2023-03-07T13:54:23Z MEMBER

@Material-Scientist We have decent support for pydata/sparse arrays. It seems like these would work for you

We do not support the pandas extension arrays at the moment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
1457965323 https://github.com/pydata/xarray/issues/1482#issuecomment-1457965323 https://api.github.com/repos/pydata/xarray/issues/1482 IC_kwDOAMm_X85W5skL Material-Scientist 40465719 2023-03-07T10:58:12Z 2023-03-07T10:58:12Z NONE

As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.

Long time ago I asked there a question about how xarray supports sparse arrays. But what I actually meant were "Jagged Arrays". I just was not aware of that term and stumbled over it some days ago the very first time.

I also recently came across awkward/jagged/ragged arrays, and that's exactly how I would like to operate on multi-dimensional (2D in referenced case) sparse data:

Instead of allocating memory with NaNs, empty slots are just not materialized by using pd.SparseDtype("float", np.nan) dtype.

You basically create a dense duck array from sparse dtypes, as the Pandas sparse user guide shows:

So, all the shape, dtype, and ndim requirements are satisfied, and xarray could implement this as a duck array.

And while you can already wrap sparse duck arrays with xr.Variable, I'm not sure if the wrapper maintains the dtype:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
1014487633 https://github.com/pydata/xarray/issues/1482#issuecomment-1014487633 https://api.github.com/repos/pydata/xarray/issues/1482 IC_kwDOAMm_X848d9pR fmfreeze 18172466 2022-01-17T12:51:45Z 2022-01-17T12:51:45Z NONE

As I am not aware of implementation details I am not sure there is a useful link, but maybe progress in #3213 supporting sparse arrays can solve also the jagged array issue.

Long time ago I asked there a question about how xarray supports sparse arrays. But what I actually meant were "Jagged Arrays". I just was not aware of that term and stumbled over it some days ago the very first time.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
320997555 https://github.com/pydata/xarray/issues/1482#issuecomment-320997555 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMyMDk5NzU1NQ== shoyer 1217238 2017-08-08T15:46:55Z 2017-08-08T15:46:55Z MEMBER

I understand why this could be useful, but I don't see how we could possibly make it work.

The notion of "fixed dimension size" is fundamental to both NumPy arrays (upon which xarray is based) and the xarray Dataset/DataArray. There are lots of various workarounds (e.g., using padding or a MultiIndex) but first class support for jagged arrays would break our existing data model too severely.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
320891992 https://github.com/pydata/xarray/issues/1482#issuecomment-320891992 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMyMDg5MTk5Mg== mitar 585279 2017-08-08T08:44:36Z 2017-08-08T08:44:36Z NONE

then what advantage is there (aside from convenience) of dumping them in some giant array with forced dimensions/shape per slice?

I was mostly thinking of using xarray as a basic data format for reusable code. So if I build ML pipelines using reusable components, I have to pass data around. And so initially data might be in jagged arrays and then with various preprocessing before training model, I can get it to be in a more suitable format where images are of the same size so that I can try easier. I hoped I could use the same format for all of these places where I need to pass data around.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
316376598 https://github.com/pydata/xarray/issues/1482#issuecomment-316376598 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjM3NjU5OA== darothen 4992424 2017-07-19T12:54:30Z 2017-07-19T12:54:30Z NONE

@mitar it depends on your data/application, right? But that information would also be helpful in figuring out alternative pathways. If you're always going to process the images individually or sequentially, then what advantage is there (aside from convenience) of dumping them in some giant array with forced dimensions/shape per slice?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
316372189 https://github.com/pydata/xarray/issues/1482#issuecomment-316372189 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjM3MjE4OQ== mitar 585279 2017-07-19T12:37:43Z 2017-07-19T12:37:43Z NONE

Hm, padding might use a lot of extra space, no?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
316371416 https://github.com/pydata/xarray/issues/1482#issuecomment-316371416 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjM3MTQxNg== darothen 4992424 2017-07-19T12:34:32Z 2017-07-19T12:34:32Z NONE

The problem is that these sorts of arrays break the common data model on top of which xarray (and NetCDF) is built.

If I understand correctly, I could batch all images of the same size into its own dimension? That might be also acceptable.

Yes, if you can pre-process all the images and align them on some common set of dimensions (maybe just xi and yi, denoting integer index in the x and y directions), and pad unused space for each image with NaNs, then you could concatenate everything into a Dataset.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
316328685 https://github.com/pydata/xarray/issues/1482#issuecomment-316328685 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjMyODY4NQ== fujiisoup 6815844 2017-07-19T09:33:41Z 2017-07-19T09:42:23Z MEMBER

I have a similar use case and I often use MultiIndex, which (partly) enables to handle hierarchical data structure.

For example, ```python In [1]: import xarray as xr ...: import numpy as np ...: ...: # image 0, size [3, 4] ...: data0 = xr.DataArray(np.arange(12).reshape(3, 4), dims=['x', 'y'], ...: coords={'x': np.linspace(0, 1, 3), ...: 'y': np.linspace(0, 1, 4), ...: 'image_index': 0}) ...: # image 1, size [4, 5] ...: data1 = xr.DataArray(np.arange(20).reshape(4, 5), dims=['x', 'y'], ...: coords={'x': np.linspace(0, 1, 4), ...: 'y': np.linspace(0, 1, 5), ...: 'image_index': 1}) ...: ...: data = xr.concat([data0.expand_dims('image_index').stack(xy=['x', 'y', 'image_index']), ...: data1.expand_dims('image_index').stack(xy=['x', 'y', 'image_index'])], ...: dim='xy')

In [2]: data Out[2]: <xarray.DataArray (xy: 32)> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) Coordinates: * xy (xy) MultiIndex - x (xy) float64 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 1.0 1.0 1.0 ... - y (xy) float64 0.0 0.3333 0.6667 1.0 0.0 0.3333 0.6667 1.0 ... - image_index (xy) int64 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 ...

In [3]: data.sel(image_index=0) # gives data0 Out[3]: <xarray.DataArray (xy: 12)> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]) Coordinates: * xy (xy) MultiIndex - x (xy) float64 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 1.0 1.0 1.0 1.0 - y (xy) float64 0.0 0.3333 0.6667 1.0 0.0 0.3333 0.6667 1.0 0.0 ...

In [4]: data.sel(x=0.0) # x==0.0 for both images Out[4]: <xarray.DataArray (xy: 9)> array([0, 1, 2, 3, 0, 1, 2, 3, 4]) Coordinates: * xy (xy) MultiIndex - y (xy) float64 0.0 0.3333 0.6667 1.0 0.0 0.25 0.5 0.75 1.0 - image_index (xy) int64 0 0 0 0 1 1 1 1 1 ```

<s>I think the above solution is essentially equivalent with

all images of the same size into its own dimension</s>

EDIT: I didn't understand the comment correctly. The above corresponds to that all the images are flattened out and combined along one large dimension.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
316323858 https://github.com/pydata/xarray/issues/1482#issuecomment-316323858 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjMyMzg1OA== mitar 585279 2017-07-19T09:15:00Z 2017-07-19T09:15:00Z NONE

If you want to store them all in a Dataset, you'll have to give a different dimension name for each new dimension, which can be clumsy.

But I cannot combine multiple dimensions into same Variable, no? So if I have a dataset of multiple variables, each variable seems that it has to have uniform dimensions for all its values? Maybe I am misunderstanding dimensions concept.

What kind of "support" exactly were you thinking of?

Maybe examples how to create such jagged dataset? For example, how to have a variable which stores 2D images of different sizes.

If I understand correctly, I could batch all images of the same size into its own dimension? That might be also acceptable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948
316317971 https://github.com/pydata/xarray/issues/1482#issuecomment-316317971 https://api.github.com/repos/pydata/xarray/issues/1482 MDEyOklzc3VlQ29tbWVudDMxNjMxNzk3MQ== fmaussion 10050469 2017-07-19T08:52:20Z 2017-07-19T08:52:20Z MEMBER

"Supported", yes, in the sense that you can create a DataArray for each of your differently sized arrays without any problem. If you want to store them all in a Dataset, you'll have to give a different dimension name for each new dimension, which can be clumsy.

However, it is true that xarray shines at handling more structured data and that most examples in the docs are those of dataset variables sharing similar dimensions. What kind of "support" exactly were you thinking of?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support for jagged array 243964948

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 3523.798ms · About: xarray-datasette
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows