home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where user = 8881170 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 8

  • apply_ufunc(dask='parallelized') with multiple outputs 4
  • Add map_blocks example to docs 3
  • Bottleneck and dask objects ignore `min_periods` on `rolling` 2
  • Add docstring example for xr.open_mfdataset 2
  • `where` function mis-broadcasts and alters data type on dataset 1
  • xr.DataArray.values fails with latest versions of netcdf4 1
  • Xarray operations produce read-only array 1
  • Add template xarray object kwarg to map_blocks 1

user 1

  • bradyrx · 15 ✖

author_association 1

  • CONTRIBUTOR 15
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1194615618 https://github.com/pydata/xarray/pull/6825#issuecomment-1194615618 https://api.github.com/repos/pydata/xarray/issues/6825 IC_kwDOAMm_X85HNGNC bradyrx 8881170 2022-07-25T20:52:55Z 2022-07-25T20:52:55Z CONTRIBUTOR

Thanks @dcherian!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add docstring example for xr.open_mfdataset 1317320059
1194581162 https://github.com/pydata/xarray/pull/6825#issuecomment-1194581162 https://api.github.com/repos/pydata/xarray/issues/6825 IC_kwDOAMm_X85HM9yq bradyrx 8881170 2022-07-25T20:22:28Z 2022-07-25T20:22:28Z CONTRIBUTOR

Is there some #noqa equivalent to avoid testing the docstring example here? Or should I be pointing to a test dataset to open?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add docstring example for xr.open_mfdataset 1317320059
791465015 https://github.com/pydata/xarray/issues/4922#issuecomment-791465015 https://api.github.com/repos/pydata/xarray/issues/4922 MDEyOklzc3VlQ29tbWVudDc5MTQ2NTAxNQ== bradyrx 8881170 2021-03-05T14:47:46Z 2021-03-05T14:47:46Z CONTRIBUTOR

I feel like this should not work i.e. rolling window length (6) < size along axis (3). So the bottleneck error seems right.

This is normally the case, but with min_periods=1 it should just return the given value so long as there's at least one observation (as in case #2, where the boundaries return as normal and the middle number is smoothed).

Thanks for the pointer on #4977!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Bottleneck and dask objects ignore `min_periods` on `rolling` 811321550
790986252 https://github.com/pydata/xarray/issues/4922#issuecomment-790986252 https://api.github.com/repos/pydata/xarray/issues/4922 MDEyOklzc3VlQ29tbWVudDc5MDk4NjI1Mg== bradyrx 8881170 2021-03-04T22:21:37Z 2021-03-04T22:32:01Z CONTRIBUTOR

@dcherian, to add to the complexity here, it's even weirder than originally reported. See my test cases below. This might alter how this bug is approached.

```python import xarray as xr

def _rolling(ds): return ds.rolling(time=6, center=False, min_periods=1).mean()

Length 3 array to test that min_periods is called in, despite asking

for 6 time-steps of smoothing

ds = xr.DataArray([1, 2, 3], dims='time') ds['time'] = xr.cftime_range(start='2021-01-01', freq='D', periods=3) ```

1. With bottleneck installed, min_periods is ignored as a kwarg with in-memory arrays.

(bottleneck installed) ```python

Just apply rolling to the base array.

ds.rolling(time=6, center=False, min_periods=1).mean()

ValueError: Moving window (=6) must between 1 and 3, inclusive

Group into single day climatology groups and apply

ds.groupby('time.dayofyear').map(_rolling)

ValueError: Moving window (=6) must between 1 and 1, inclusive ```

2. With bottleneck uninstalled, min_periods works with in-memory arrays.

(bottleneck uninstalled) ```python

Just apply rolling to the base array.

ds.rolling(time=6, center=False, min_periods=1).mean()

<xarray.DataArray (time: 3)> array([1. , 1.5, 2. ]) Coordinates: * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00

Group into single day climatology groups and apply

ds.groupby('time.dayofyear').map(_rolling)

<xarray.DataArray (time: 3)> array([1., 2., 3.]) Coordinates: * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00 ```

3. Regardless of bottleneck, dask objects ignore min_period when a groupby object.

This specifically seems like an issue with .map()

(independent of bottleneck installation) ```python

Just apply rolling to the base array.

ds.chunk().rolling(time=6, center=False, min_periods=1).mean().compute()

<xarray.DataArray (time: 3)> array([1. , 1.5, 2. ]) Coordinates: * time (time) object 2021-01-01 00:00:00 ... 2021-01-03 00:00:00

Group into single day climatology groups and apply

ds.chunk().groupby('time.dayofyear').map(_rolling)

ValueError: For window size 6, every chunk should be larger than 3, but the smallest chunk size is 1. Rechunk your array with a larger chunk size or a chunk size that more evenly divides the shape of your array. ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Bottleneck and dask objects ignore `min_periods` on `rolling` 811321550
655142333 https://github.com/pydata/xarray/issues/3813#issuecomment-655142333 https://api.github.com/repos/pydata/xarray/issues/3813 MDEyOklzc3VlQ29tbWVudDY1NTE0MjMzMw== bradyrx 8881170 2020-07-07T21:22:30Z 2020-07-07T21:22:30Z CONTRIBUTOR

FYI, this is also seen on xr.apply_ufunc, but only when vectorize=True. It seems like ndarrays write switch are turned off when vectorize=True. This is also solved by .copy(), which is good anways to avoid mutating the original ndarrays. Perhaps also a copy=bool could be added to apply_ufunc to create copies of the ndarrays? I'd be happy to lead that PR if it makes sense.

Example:

``python def match_nans(a, b): """Pairwise matching of nans between two time series.""" # Try with and without.copy` commands. # a = a.copy() # b = b.copy() if np.isnan(a).any() or np.isnan(b).any(): idx = np.logical_or(np.isnan(a), np.isnan(b)) a[idx], b[idx] = np.nan, np.nan return a, b

A = xr.DataArray(np.random.rand(10, 5), dims=['time', 'space']) B = xr.DataArray(np.random.rand(10, 5), dims=['time', 'space']) A[0, 1] = np.nan B[5, 0] = np.nan

xr.apply_ufunc(match_nans, A, B, input_core_dims=[['time'], ['time']], output_core_dims=[['time'], ['time']], # Try with and without vectorize. vectorize=True,) ```

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Xarray operations produce read-only array 573031381
628135082 https://github.com/pydata/xarray/issues/1815#issuecomment-628135082 https://api.github.com/repos/pydata/xarray/issues/1815 MDEyOklzc3VlQ29tbWVudDYyODEzNTA4Mg== bradyrx 8881170 2020-05-13T17:27:06Z 2020-05-13T17:27:06Z CONTRIBUTOR

So would you be re-doing the same computation by running .compute() separately on these objects?

Yes. but you can do dask.compute(xarray_obj1, xarray_obj2,...) or combine those objects appropriately into a Dataset and then call compute on that.

Good call. I figured there was a workaround.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc(dask='parallelized') with multiple outputs 287223508
628070696 https://github.com/pydata/xarray/issues/1815#issuecomment-628070696 https://api.github.com/repos/pydata/xarray/issues/1815 MDEyOklzc3VlQ29tbWVudDYyODA3MDY5Ng== bradyrx 8881170 2020-05-13T15:33:56Z 2020-05-13T15:33:56Z CONTRIBUTOR

One issue I see is that this would return multiple dask objects, correct? So to get the results from them, you'd have to run .compute() on each separately. I think it's a valid assumption to expect that the multiple output objects would share a lot of the same computational pipeline. So would you be re-doing the same computation by running .compute() separately on these objects?

The earlier mentioned code snippets provide a nice path forward, since you can just run compute on one object, and then split its result (or however you name it) dimension into multiple individual objects. Thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc(dask='parallelized') with multiple outputs 287223508
624158963 https://github.com/pydata/xarray/pull/3816#issuecomment-624158963 https://api.github.com/repos/pydata/xarray/issues/3816 MDEyOklzc3VlQ29tbWVudDYyNDE1ODk2Mw== bradyrx 8881170 2020-05-05T16:28:26Z 2020-05-05T16:28:26Z CONTRIBUTOR

I missed this originally @dcherian, but thanks for the great work here. The docs changes are a great help.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add template xarray object kwarg to map_blocks 573768194
614244205 https://github.com/pydata/xarray/issues/1815#issuecomment-614244205 https://api.github.com/repos/pydata/xarray/issues/1815 MDEyOklzc3VlQ29tbWVudDYxNDI0NDIwNQ== bradyrx 8881170 2020-04-15T19:45:50Z 2020-04-15T19:45:50Z CONTRIBUTOR

I think ideally it would be nice to return multiple DataArrays or a Dataset of variables. But I'm really happy with this solution. I'm using it on a 600GB dataset of particle trajectories and was able to write a ufunc to go through and return each particle's x, y, z location when it met a certain condition.

I think having something simple like the stackoverflow snippet I posted would be great for the docs as an apply_ufunc example. I'd be happy to lead this if folks think it's a good idea.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc(dask='parallelized') with multiple outputs 287223508
614216243 https://github.com/pydata/xarray/issues/1815#issuecomment-614216243 https://api.github.com/repos/pydata/xarray/issues/1815 MDEyOklzc3VlQ29tbWVudDYxNDIxNjI0Mw== bradyrx 8881170 2020-04-15T18:49:51Z 2020-04-15T18:49:51Z CONTRIBUTOR

This looks essentially the same to @stefraynaud's answer, but I came across this stackoverflow response here: https://stackoverflow.com/questions/52094320/with-xarray-how-to-parallelize-1d-operations-on-a-multidimensional-dataset.

@andersy005, I imagine you're far past this now. And this might have been related to discussions with Genevieve and I anyways.

```python def new_linregress(x, y): # Wrapper around scipy linregress to use in apply_ufunc slope, intercept, r_value, p_value, std_err = stats.linregress(x, y) return np.array([slope, intercept, r_value, p_value, std_err])

return a new DataArray

stats = xr.apply_ufunc(new_linregress, ds[x], ds[y], input_core_dims=[['year'], ['year']], output_core_dims=[["parameter"]], vectorize=True, dask="parallelized", output_dtypes=['float64'], output_sizes={"parameter": 5}, ) ```

{
    "total_count": 3,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 3,
    "rocket": 0,
    "eyes": 0
}
  apply_ufunc(dask='parallelized') with multiple outputs 287223508
573107748 https://github.com/pydata/xarray/pull/3667#issuecomment-573107748 https://api.github.com/repos/pydata/xarray/issues/3667 MDEyOklzc3VlQ29tbWVudDU3MzEwNzc0OA== bradyrx 8881170 2020-01-10T16:32:47Z 2020-01-10T16:32:47Z CONTRIBUTOR

Thanks @dcherian -- done in https://github.com/pydata/xarray/pull/3682.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add map_blocks example to docs 546451185
572688941 https://github.com/pydata/xarray/pull/3667#issuecomment-572688941 https://api.github.com/repos/pydata/xarray/issues/3667 MDEyOklzc3VlQ29tbWVudDU3MjY4ODk0MQ== bradyrx 8881170 2020-01-09T18:23:14Z 2020-01-09T18:23:14Z CONTRIBUTOR

Oops, forgot to add to whats-new, but this is a pretty minor addition.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add map_blocks example to docs 546451185
572137657 https://github.com/pydata/xarray/pull/3667#issuecomment-572137657 https://api.github.com/repos/pydata/xarray/issues/3667 MDEyOklzc3VlQ29tbWVudDU3MjEzNzY1Nw== bradyrx 8881170 2020-01-08T16:04:54Z 2020-01-08T16:04:54Z CONTRIBUTOR

What's going on here? I use travis on my repos so I'm not familiar with the Azure setup. I only modified a docstring so I'm not sure why it would break the testing suite? Unless it's testing my code snippet in the docs?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Add map_blocks example to docs 546451185
561261583 https://github.com/pydata/xarray/issues/3580#issuecomment-561261583 https://api.github.com/repos/pydata/xarray/issues/3580 MDEyOklzc3VlQ29tbWVudDU2MTI2MTU4Mw== bradyrx 8881170 2019-12-03T17:02:39Z 2019-12-03T17:02:39Z CONTRIBUTOR

I can't seem to replicate this issue for some reason. I have the same versions of xarray, numpy, and netCDF4 installed.

python-traceback IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().

This implies that it's having issues slicing numpy-style with a dask array. I bet if you load it into memory and slice that way it'll work. But at ~22GB you might not be able to do that.

The preferred way to slice in xarray is to use .sel() and .isel() to leverage the label-aware nature of xarray. So you should have no problem doing this operation explicitly with the following:

fullda['sst'].isel(M=0, S=0, X=0, Y=0). You of course don't need to slice the L dimension since you are taking the full thing, but the equivalent notation there is :fullda['sst'].isel(L=slice(0, None)).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.DataArray.values fails with latest versions of netcdf4 529644880
494059784 https://github.com/pydata/xarray/issues/2969#issuecomment-494059784 https://api.github.com/repos/pydata/xarray/issues/2969 MDEyOklzc3VlQ29tbWVudDQ5NDA1OTc4NA== bradyrx 8881170 2019-05-20T16:30:02Z 2019-05-20T16:30:02Z CONTRIBUTOR

Thanks for the feedback and link to the other issue. I wasn't sure what to search to find other issues on this. The coordinate transformation seems like the most straightforward approach.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `where` function mis-broadcasts and alters data type on dataset 445175953

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.629ms · About: xarray-datasette