home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

19 rows where user = 1053153 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 9

  • Handle empty containers in zarr chunk checks 4
  • xarray.DataArray.where always returns array of float64 regardless of input dtype 3
  • Scalar non-dimension coords forget their heritage 3
  • Handle empty containers in zarr chunk checks 3
  • multiple arrays with common nan-shaped dimension 2
  • Reusing coordinate doesn't show in the dimensions 1
  • zarr and xarray chunking compatibility and `to_zarr` performance 1
  • Implement interp for interpolating between chunks of data (dask) 1
  • Stack: avoid re-chunking (dask) and insert new coordinates arbitrarily 1

user 1

  • chrisroat · 19 ✖

author_association 1

  • CONTRIBUTOR 19
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1003391064 https://github.com/pydata/xarray/pull/5526#issuecomment-1003391064 https://api.github.com/repos/pydata/xarray/issues/5526 IC_kwDOAMm_X847zohY chrisroat 1053153 2021-12-31T14:36:26Z 2021-12-31T14:36:26Z CONTRIBUTOR

Is this change still of potential value?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle empty containers in zarr chunk checks 929518413
870868874 https://github.com/pydata/xarray/pull/5526#issuecomment-870868874 https://api.github.com/repos/pydata/xarray/issues/5526 MDEyOklzc3VlQ29tbWVudDg3MDg2ODg3NA== chrisroat 1053153 2021-06-29T19:48:48Z 2021-06-29T19:48:48Z CONTRIBUTOR

In my original PR, I wrote:

In putting this together, I noted that open_zarr to re-read the data triggers this bug, while open_dataset(..., engine='zarr') does not. I'm not sure if my proposed fix is a band-aid, or if something in open_zarr is the real culprit.

I just want to be sure we think this is the correct fix, and there won't be any unintended consequences. I'm really not familiar with the interaction of chunks between zarr, dask, and xarray. I don't feel 100% sure this is the correct solution, though it fixes the immediate solution.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle empty containers in zarr chunk checks 929518413
867876696 https://github.com/pydata/xarray/pull/5526#issuecomment-867876696 https://api.github.com/repos/pydata/xarray/issues/5526 MDEyOklzc3VlQ29tbWVudDg2Nzg3NjY5Ng== chrisroat 1053153 2021-06-24T18:54:23Z 2021-06-24T18:54:23Z CONTRIBUTOR

@jhamman Ready for review, though not urgent.

Mostly, I'm curious if this the right way to go about solving the linked issue, or if there is something deeper in xarray to update.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle empty containers in zarr chunk checks 929518413
867798517 https://github.com/pydata/xarray/pull/5019#issuecomment-867798517 https://api.github.com/repos/pydata/xarray/issues/5019 MDEyOklzc3VlQ29tbWVudDg2Nzc5ODUxNw== chrisroat 1053153 2021-06-24T16:52:29Z 2021-06-24T16:52:29Z CONTRIBUTOR

Sure thing. I'll revive it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle empty containers in zarr chunk checks 827233565
840975200 https://github.com/pydata/xarray/pull/5019#issuecomment-840975200 https://api.github.com/repos/pydata/xarray/issues/5019 MDEyOklzc3VlQ29tbWVudDg0MDk3NTIwMA== chrisroat 1053153 2021-05-14T03:08:02Z 2021-05-14T03:08:02Z CONTRIBUTOR

Is this solution desired, or is there a deeper fix?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle empty containers in zarr chunk checks 827233565
821655160 https://github.com/pydata/xarray/issues/5168#issuecomment-821655160 https://api.github.com/repos/pydata/xarray/issues/5168 MDEyOklzc3VlQ29tbWVudDgyMTY1NTE2MA== chrisroat 1053153 2021-04-16T22:38:08Z 2021-04-16T22:38:08Z CONTRIBUTOR

It may run even deeper -- there seem to be several checks on dimension sizes that would need special casing. Even simply doing a variable[dim] lookup fails!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple arrays with common nan-shaped dimension 859577556
821285344 https://github.com/pydata/xarray/issues/5168#issuecomment-821285344 https://api.github.com/repos/pydata/xarray/issues/5168 MDEyOklzc3VlQ29tbWVudDgyMTI4NTM0NA== chrisroat 1053153 2021-04-16T16:13:09Z 2021-04-16T16:13:09Z CONTRIBUTOR

There seems to be some support, but now you have me worried. I have a used xarray mainly for labelling, but not for much computation -- I'm dropping into dask because I need map_overlap.

FWIW, calling dask.compute(arr) works with unknown chunk sizes, but now I see arr.compute() does not. This fooled me into thinking I could use unknown chunk sizes. Now I see that writing to zarr does not work, either. This might torpedo my current design.

I see the compute_chunk_sizes method, but that seems to trigger computation. I'm running on a dask cluster -- is there anything I can do to salvage the pattern arr_with_nan_shape.to_dataset().to_zarr(compute=False) (with our without xarray)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  multiple arrays with common nan-shaped dimension 859577556
795014523 https://github.com/pydata/xarray/pull/5019#issuecomment-795014523 https://api.github.com/repos/pydata/xarray/issues/5019 MDEyOklzc3VlQ29tbWVudDc5NTAxNDUyMw== chrisroat 1053153 2021-03-10T07:24:58Z 2021-03-10T07:24:58Z CONTRIBUTOR

I do not see the failures in my client that are seen in the checks, so there must be some mismatch in the environments.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle empty containers in zarr chunk checks 827233565
794986294 https://github.com/pydata/xarray/pull/5019#issuecomment-794986294 https://api.github.com/repos/pydata/xarray/issues/5019 MDEyOklzc3VlQ29tbWVudDc5NDk4NjI5NA== chrisroat 1053153 2021-03-10T06:56:36Z 2021-03-10T07:01:08Z CONTRIBUTOR

In putting this together, I noted that open_zarr to re-read the data triggers this bug, while open_dataset(..., engine='zarr') does not. I'm not sure if my proposed fix is a band-aid, or if something in open_zarr is the real culprit.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handle empty containers in zarr chunk checks 827233565
708686170 https://github.com/pydata/xarray/issues/4501#issuecomment-708686170 https://api.github.com/repos/pydata/xarray/issues/4501 MDEyOklzc3VlQ29tbWVudDcwODY4NjE3MA== chrisroat 1053153 2020-10-14T22:07:38Z 2020-10-14T22:07:38Z CONTRIBUTOR

My mental model of what's happening may not be correct. I did want sel(), isel(), and squeeze() to all operate the same way (and maybe someday even work on non-dim coordinates!). Replacing squeeze() with isel() in my initial example gives the same failure, which I would want it to work:

``` import numpy as np import xarray as xr

arr1 = xr.DataArray(np.zeros((1,5)), dims=['y', 'x'], coords={'e': ('y', [10])}) arr2 = arr1.isel(y=0).expand_dims('y') xr.testing.assert_identical(arr1, arr2) ```

``` AssertionError: Left and right DataArray objects are not identical

Differing coordinates: L e (y) int64 10 R e int64 10 ```

The non-dim coordinate e has forgotten that it was associated with y. I'd prefer that this association remained.

Where it gets really interesting is in the following example where the non-dim coordinate moves from one dim to another. I understand the logic here (since the isel() were done in a way that correlates 'y' and 'z'). In my proposal, this would not happen without explicit user intervention -- which may actually be desired here (it's sorta surprising):

``` import numpy as np import xarray as xr

arr = xr.DataArray(np.zeros((2, 2, 5)), dims=['z', 'y', 'x'], coords={'e': ('y', [10, 20])}) print(arr.coords) print()

arr0 = arr.isel(z=0,y=0) arr1 = arr.isel(z=1,y=1)

arr_concat = xr.concat([arr0, arr1], 'z') print(arr_concat.coords) ```

``` Coordinates: e (y) int64 10 20

Coordinates: e (z) int64 10 20 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Scalar non-dimension coords forget their heritage 718716799
708561579 https://github.com/pydata/xarray/issues/4501#issuecomment-708561579 https://api.github.com/repos/pydata/xarray/issues/4501 MDEyOklzc3VlQ29tbWVudDcwODU2MTU3OQ== chrisroat 1053153 2020-10-14T17:51:15Z 2020-10-14T17:51:15Z CONTRIBUTOR

One problem with this -- at least for now -- is that xarray currently doesn't allow coordinates on DataArray objects to have dimensions that don't appear on the DataArray itself.

Ah, then that would be the desire here.

It might also be surprising that this would make squeeze('y') inconsistent with isel(y=0)

The suggestion here is that both of these would behave the same. The MCVE was just for the squeeze case, but I expect that isel and sel would both allow non-dim coords to maintain the reference to their original dim (even if it becomes a non-dim coord itself).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Scalar non-dimension coords forget their heritage 718716799
706640485 https://github.com/pydata/xarray/issues/4501#issuecomment-706640485 https://api.github.com/repos/pydata/xarray/issues/4501 MDEyOklzc3VlQ29tbWVudDcwNjY0MDQ4NQ== chrisroat 1053153 2020-10-11T02:37:00Z 2020-10-11T02:37:00Z CONTRIBUTOR

I'm not a huge fan of adding arguments for a case that rarely comes up (I presume). 

One difference in your example is that the 'e' coord is never based on 'y', so I would not want it expanded -- so I'd still like that test to pass.

The case I'm interested in is where the non-dimension coords are based on existing dimension coords that gets squeezed.

So in this example: ``` import numpy as np import xarray as xr

arr1 = xr.DataArray(np.zeros((1,5)), dims=['y', 'x'], coords={'y': [42], 'e': ('y', [10])}) arr1.squeeze() ```

The squeezed array looks like:

<xarray.DataArray (x: 5)> array([0., 0., 0., 0., 0.]) Coordinates: y int64 42 e int64 10 Dimensions without coordinates: x

What I think would be more useful:

<xarray.DataArray (x: 5)> array([0., 0., 0., 0., 0.]) Coordinates: y int64 42 e (y) int64 10 <---- Note the (y) Dimensions without coordinates: x

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Scalar non-dimension coords forget their heritage 718716799
685173725 https://github.com/pydata/xarray/issues/4389#issuecomment-685173725 https://api.github.com/repos/pydata/xarray/issues/4389 MDEyOklzc3VlQ29tbWVudDY4NTE3MzcyNQ== chrisroat 1053153 2020-09-01T22:46:02Z 2020-09-01T22:46:02Z CONTRIBUTOR

There has been some discussion on the dask chunking issue here: https://github.com/dask/dask/issues/3650 https://github.com/dask/dask/issues/5544

Regarding the position of the inserted variable, it is not related to the chunking. It seems possible to do this. Would this be an acceptable change? If so, the first problem is the signature, as multiple dimensions may be passed in. :/

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stack: avoid re-chunking (dask) and insert new coordinates arbitrarily 688640232
667255046 https://github.com/pydata/xarray/pull/4155#issuecomment-667255046 https://api.github.com/repos/pydata/xarray/issues/4155 MDEyOklzc3VlQ29tbWVudDY2NzI1NTA0Ng== chrisroat 1053153 2020-07-31T17:56:15Z 2020-07-31T17:56:15Z CONTRIBUTOR

Hi! This work is interesting to me, as I was implementing in dask an image processing algo which needs an intermediate 1-d linear interpolation step. This bottlenecks the calculation through a single node. Your work here on distributed interpolation is intriguing, and I'm wondering if it would be useful in my work and if it could possibly become part of dask itself.

Here is the particular function, which you'll note has a dask.delayed wrapper around np.interp.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Implement interp for interpolating between chunks of data (dask) 638909879
650903191 https://github.com/pydata/xarray/issues/3390#issuecomment-650903191 https://api.github.com/repos/pydata/xarray/issues/3390 MDEyOklzc3VlQ29tbWVudDY1MDkwMzE5MQ== chrisroat 1053153 2020-06-29T04:52:11Z 2020-06-29T04:52:11Z CONTRIBUTOR

What about the case of no missing values, when other wouldn't be needed? Could the same dtype be returned then? This is my case, since I'm re-purposing where to do sel for non-dimension coordinates.

Could you give a concrete example of what this would look like?

It seems rather unlikely to me to have an example of where with drop=True where the condition is exactly aligned with the grid, such that there are no missing values.

I guess it could happen if you're trying to index out exactly one element along a dimension?

That's exactly right. I am just selecting one slice of a data array, using data.where(data.coords['stain'] == 'DAPI').

In the long term, the cleaner solution for this will be some form for support for more flexibly / multi-dimensional indexing.

Agreed. Once I actually get things running, I'll be ready to try and contribute fixes for all my TODOs that reference xarray github issues. :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.DataArray.where always returns array of float64 regardless of input dtype 505493879
650889746 https://github.com/pydata/xarray/issues/3390#issuecomment-650889746 https://api.github.com/repos/pydata/xarray/issues/3390 MDEyOklzc3VlQ29tbWVudDY1MDg4OTc0Ng== chrisroat 1053153 2020-06-29T03:49:27Z 2020-06-29T03:49:27Z CONTRIBUTOR

What about the case of no missing values, when other wouldn't be needed? Could the same dtype be returned then? This is my case, since I'm re-purposing where to do sel for non-dimension coordinates.

I'm capable of just recasting for my use case, if this is becoming an idea that would be difficult to maintain/document.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.DataArray.where always returns array of float64 regardless of input dtype 505493879
649861589 https://github.com/pydata/xarray/issues/3390#issuecomment-649861589 https://api.github.com/repos/pydata/xarray/issues/3390 MDEyOklzc3VlQ29tbWVudDY0OTg2MTU4OQ== chrisroat 1053153 2020-06-25T23:08:52Z 2020-06-25T23:37:47Z CONTRIBUTOR

If drop=True, would it be problematic to return the same dtype or allow other?

My use case is a simple slicing of a dataset -- no missing values. The use of where is due to one of selections being on a non-dimension coordinate (#2028).

I can workaround using astype, but will say I was mildly surprised by this feature. I now understand why it's there. Our code is old and the data is intermediate and never deeply inspected -- I only noticed this when we started using a memory-intensive algorithm and surprised how much space was taken by our supposed uint16 data. :)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xarray.DataArray.where always returns array of float64 regardless of input dtype 505493879
637635620 https://github.com/pydata/xarray/issues/2300#issuecomment-637635620 https://api.github.com/repos/pydata/xarray/issues/2300 MDEyOklzc3VlQ29tbWVudDYzNzYzNTYyMA== chrisroat 1053153 2020-06-02T15:42:43Z 2020-06-02T15:42:43Z CONTRIBUTOR

If there is a non-dimension coordinate, the error is also tickled.

``` import xarray as xr import numpy as np ds=xr.Dataset({'foo': (['bar'], np.zeros((100,)))})

Problem also affects non-dimension coords

ds.coords['baz'] = ('bar', ['mah'] * 100)

ds.to_zarr('test.zarr', mode='w') ds2 = xr.open_zarr('test.zarr')

ds3 = ds2.chunk({'bar': 2})

ds3.foo.encoding = {} ds3.coords['baz'].encoding = {} # Need this, too.

ds3.to_zarr('test3.zarr', mode='w') ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  zarr and xarray chunking compatibility and `to_zarr` performance 342531772
528699124 https://github.com/pydata/xarray/issues/1499#issuecomment-528699124 https://api.github.com/repos/pydata/xarray/issues/1499 MDEyOklzc3VlQ29tbWVudDUyODY5OTEyNA== chrisroat 1053153 2019-09-06T04:06:05Z 2019-09-06T04:07:01Z CONTRIBUTOR

Just flying by and dropping a note because I just ran into this with Imaris Open files being created by a microscope camera. I wanted to use one of my favorite packages (xarray) to dig into the data, and noted the dimension reuse. Not a big blocker, but this functionality of the data format might be growing in usage.

More details on Imaris. http://open.bitplane.com/Default.aspx?tabid=268

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Reusing coordinate doesn't show in the dimensions 247697176

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.502ms · About: xarray-datasette