home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

5 rows where user = 22492773 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 2

  • Array indexing with dask arrays 4
  • Xarray combine_by_coords return the monotonic global index error 1

user 1

  • pl-marasco · 5 ✖

author_association 1

  • NONE 5
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
935769790 https://github.com/pydata/xarray/issues/2511#issuecomment-935769790 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843xra- pl-marasco 22492773 2021-10-06T08:47:24Z 2021-10-06T08:47:24Z NONE

@bzah I've been testing your code and I can confirm the increment of timing once the .compute() isn't in use. I've noticed that using your modification, seems that dask array is computed more than one time per sample. I've made some tests using a modified version from #3237 and here are my observations:

Assuming that we have only one sample object after the resample the expected result should be 1 compute and that's what we obtain if we call the computation before the .argmax() If .compute() is removed then I got 3 total computations. Just as a confirmation if you increase the sample you will get a multiple of 3 as a result of computes.

I still don't know the reason and if is correct or not but sounds weird to me; though it could explain the time increase.

@dcherian @shyer do you know if all this make any sense? should the .isel() automatically trig the computation or should give back a lazy array?

Here is the code I've been using (works only adding the modification proposed by @bzah)

``` import numpy as np import dask import xarray as xr

class Scheduler: """ From: https://stackoverflow.com/questions/53289286/ """

def __init__(self, max_computes=20):
    self.max_computes = max_computes
    self.total_computes = 0

def __call__(self, dsk, keys, **kwargs):
    self.total_computes += 1
    if self.total_computes > self.max_computes:
        raise RuntimeError(
            "Too many dask computations were scheduled: {}".format(
                self.total_computes
            )
        )
    return dask.get(dsk, keys, **kwargs)

scheduler = Scheduler()

with dask.config.set(scheduler=scheduler):

COORDS = dict(dim_0=pd.date_range("2042-01-01", periods=31, freq='D'),
              dim_1= range(0,500),
              dim_2= range(0,500))

da = xr.DataArray(np.random.rand(31 * 500 * 500).reshape((31, 500, 500)),
                  coords=COORDS).chunk(dict(dim_0=-1, dim_1=100, dim_2=100))

print(da)

resampled = da.resample(dim_0="MS")

for label, sample in resampled:

    #sample = sample.compute()
    idx = sample.argmax('dim_0')
    sampled = sample.isel(dim_0=idx)

print("Total number of computes: %d" % scheduler.total_computes)

```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
932169790 https://github.com/pydata/xarray/issues/2511#issuecomment-932169790 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843j8g- pl-marasco 22492773 2021-10-01T12:04:55Z 2021-10-01T12:04:55Z NONE

@bzah I tested your patch with the following code:

``` import xarray as xr from distributed import Client client = Client()

da = xr.DataArray(np.random.rand(2035003500).reshape((20,3500,3500)), dims=('time', 'x', 'y')).chunk(dict(time=-1, x=100, y=100))

idx = da.argmax('time').compute() da.isel(time=idx) ```

In my case seems that with or without it takes the same time but I would like to know if is the same for you.

L.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
930309991 https://github.com/pydata/xarray/issues/2511#issuecomment-930309991 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843c2dn pl-marasco 22492773 2021-09-29T15:56:33Z 2021-09-29T15:56:33Z NONE

@pl-marasco Ok that's strange. I should have saved my use case :/ I will try to reproduce it and will provide a gist of it soon.

What I noticed, on my use case, is that it provoke a computation. Is that the reason for what you consider slow? Could be possible that is related to #3237 ?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
930124657 https://github.com/pydata/xarray/issues/2511#issuecomment-930124657 https://api.github.com/repos/pydata/xarray/issues/2511 IC_kwDOAMm_X843cJNx pl-marasco 22492773 2021-09-29T12:22:06Z 2021-09-29T12:22:06Z NONE

@bzah I've been testing your solution and doesn't seems to slow as you are mentioning. Do you have a specific test to be conducted so that we can make a more robust comparison?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
781553004 https://github.com/pydata/xarray/issues/4213#issuecomment-781553004 https://api.github.com/repos/pydata/xarray/issues/4213 MDEyOklzc3VlQ29tbWVudDc4MTU1MzAwNA== pl-marasco 22492773 2021-02-18T18:37:07Z 2021-02-19T07:38:04Z NONE

@TomNicholas I've landed on this discussion looking for a solution for what I consider the exact same problem. Indeed the overlapping is something that all the users of Sentinel 2 Level 1c will figure out. All the observations are deployed to users through a series of tiles following the MGRS grid system. Each tile has an overlapping area with the bordered once and is varying according to the position of the tile in relation to the reference system. Indeed the approach you are describing can solve the problem but would require the analysis of the bounding box and a consequential selection through the .sel(). In Rasterio this can be easily obtained through the .merge module. To have a quick example of how ti is used have a look here

You are right in pointing that there are multiple ways to treat the overlapping values but I would stick with the most common one that is as well reported in the link you mentioned. In other words (min, max, average, first, last) would be already a huge plus.

About dask, indeed is helping a lot to create a delayed object of the tiles (consider that at least for S2 data are in jp2 and we are forced to use open_rasterio instead of open_mfdataset) so the solution should be compatible with this kind of approach. If you need further explanation or I wasn't too clear please let me know.

About Pangeo, indeed a topic should be opened on it and eventually we can move there the discussion but, at least in my opinion, for the moment the right place to discuss is within xarray.

Seems that Sinergise for the AWS service has used the average algorithm to solve the same issue. Seems that all the users that will use the AWS S2 Products will not need to care about the overlap issue.

Edit: update on AWS service

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray combine_by_coords return the monotonic global index error 654150730

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 12.691ms · About: xarray-datasette