home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

17 rows where author_association = "CONTRIBUTOR" and user = 367900 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 8

  • Fix open_dataset regression 4
  • Sum and prod with min_count forces evaluation 3
  • Handling of non-string dimension names 3
  • Fix behaviour of min_count in reducing functions 2
  • plugins.py:98: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select. 2
  • Backend caching should not use a relative path 1
  • HTML formatting of non-string attribute names fails 1
  • Convert attribute and dimension names to strings when generating HTML repr 1

user 1

  • bcbnz · 17 ✖

author_association 1

  • CONTRIBUTOR · 17 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1109945309 https://github.com/pydata/xarray/issues/6514#issuecomment-1109945309 https://api.github.com/repos/pydata/xarray/issues/6514 IC_kwDOAMm_X85CKGvd bcbnz 367900 2022-04-26T15:33:40Z 2022-04-26T15:33:40Z CONTRIBUTOR

The Compatibility Note I linked above appears to point out the the entry_points(group="xarray.backends") is available since Python 3.6, so we not need the versione check at all.

There's some unfortunate version numbers there. That refers to version 3.6 of the third-party importlib_metadata library, a later version of which was used for the importlib.metadata library included with Python 3.10. If you try to run entry_points(group="xarray.backends") on Python 3.9 it fails with a TypeError.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  plugins.py:98: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select. 1215082284
1109723579 https://github.com/pydata/xarray/issues/6514#issuecomment-1109723579 https://api.github.com/repos/pydata/xarray/issues/6514 IC_kwDOAMm_X85CJQm7 bcbnz 367900 2022-04-26T12:16:48Z 2022-04-26T12:16:48Z CONTRIBUTOR

Just came to report this after seeing it in some tests of one of my projects. Based on a change I made for plugins in that project, I think that changing

https://github.com/pydata/xarray/blob/d479009d79374dc4a56c9f4346b1af38f5ac182c/xarray/backends/plugins.py#L96-L99

to

python @functools.lru_cache(maxsize=1) def list_engines(): try: entrypoints = entry_points(group="xarray.backends") # Python >= 3.10 except TypeError: entrypoints = entry_points().get("xarray.backends", ()) return build_engines(entrypoints)

will take care of this. Not tested as I don't have access to an older Python environment right now, but I can make a pull request later if this would be a suitable workaround and nobody else does so first.

{
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  plugins.py:98: DeprecationWarning: SelectableGroups dict interface is deprecated. Use select. 1215082284
820156925 https://github.com/pydata/xarray/pull/5135#issuecomment-820156925 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgyMDE1NjkyNQ== bcbnz 367900 2021-04-15T06:36:38Z 2021-04-15T06:36:38Z CONTRIBUTOR

@aurghs it fixes it for both the test script in #5132 (using all three netCDF engines, netcdf4, h5netcdf and scipy) and for my original unit tests where I found the problem.

{
    "total_count": 3,
    "+1": 3,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
818840091 https://github.com/pydata/xarray/pull/5149#issuecomment-818840091 https://api.github.com/repos/pydata/xarray/issues/5149 MDEyOklzc3VlQ29tbWVudDgxODg0MDA5MQ== bcbnz 367900 2021-04-13T15:45:10Z 2021-04-13T15:45:10Z CONTRIBUTOR

Added conversion of dimension names since non-string dimensions are intended to be supported. This will still fail for datasets as the dimension sorting does not yet support this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Convert attribute and dimension names to strings when generating HTML repr 856901056
818821420 https://github.com/pydata/xarray/issues/5148#issuecomment-818821420 https://api.github.com/repos/pydata/xarray/issues/5148 MDEyOklzc3VlQ29tbWVudDgxODgyMTQyMA== bcbnz 367900 2021-04-13T15:20:50Z 2021-04-13T15:29:54Z CONTRIBUTOR

I've changed the title.

I think improving the error message is the way to go.

Agreed. ~~Just playing with it, the attributes also need to be strings for netCDF (I'm not familiar with the other backends) so an equivalent error message should be added for them.~~ There is already a good error message for attributes, at least for netCDF, I wasn't looking at it properly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handling of non-string dimension names 856900805
818800250 https://github.com/pydata/xarray/issues/5148#issuecomment-818800250 https://api.github.com/repos/pydata/xarray/issues/5148 MDEyOklzc3VlQ29tbWVudDgxODgwMDI1MA== bcbnz 367900 2021-04-13T14:53:14Z 2021-04-13T14:55:03Z CONTRIBUTOR

This is also something that the backends will need to consider:

```python traceback

xr.DataArray(np.ones(5)).rename(dim_0=1).to_netcdf("/tmp/test.nc") ... /usr/lib/python3.9/site-packages/xarray/backends/netCDF4_.py in set_dimension(self, name, length, is_unlimited) 439 def set_dimension(self, name, length, is_unlimited=False): 440 dim_length = length if not is_unlimited else None --> 441 self.ds.createDimension(name, size=dim_length) 442 443 def set_attribute(self, key, value):

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.createDimension()

src/netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dimension.init()

TypeError: expected bytes, int found ` I guess they could either convert to string (which will result in a different array when loaded) or a better error message (netCDF files only support strings for dimension names`` or similar).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handling of non-string dimension names 856900805
818800770 https://github.com/pydata/xarray/issues/5148#issuecomment-818800770 https://api.github.com/repos/pydata/xarray/issues/5148 MDEyOklzc3VlQ29tbWVudDgxODgwMDc3MA== bcbnz 367900 2021-04-13T14:53:55Z 2021-04-13T14:53:55Z CONTRIBUTOR

Should I change the title here to be more general or would you prefer to close this and continue discussion in the other issue you linked?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Handling of non-string dimension names 856900805
818688260 https://github.com/pydata/xarray/issues/5146#issuecomment-818688260 https://api.github.com/repos/pydata/xarray/issues/5146 MDEyOklzc3VlQ29tbWVudDgxODY4ODI2MA== bcbnz 367900 2021-04-13T12:15:26Z 2021-04-13T12:15:26Z CONTRIBUTOR

I've created #5149 to fix this for attributes. Dimensions are inconsistent as to whether they are always strings or not, I've opened #5148 about this inconsistency.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  HTML formatting of non-string attribute names fails 856728083
816032139 https://github.com/pydata/xarray/pull/5135#issuecomment-816032139 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgxNjAzMjEzOQ== bcbnz 367900 2021-04-08T18:10:58Z 2021-04-08T18:11:16Z CONTRIBUTOR

LGTM. I don't know how we would test this...

For ensuring absolute paths, my MVCE in #5132 could be adapted.

Expansion of ~ might be trickier, but the os.path.expanduser docstring says

On Unix, an initial ~ is replaced by the environment variable HOME if it is set

and

On Windows, USERPROFILE will be used if set

so tests run with appropriate environment variables could be used with OS checking.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
816024426 https://github.com/pydata/xarray/pull/5135#issuecomment-816024426 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgxNjAyNDQyNg== bcbnz 367900 2021-04-08T17:58:23Z 2021-04-08T17:58:23Z CONTRIBUTOR

@aurghs & @dcherian note that at the moment most backends accept pathlib.Path (but it is not tested), so the isintance(..., str) doesn't run expandpath in that case.

What do you suggest?

For Python 3.6+ expanduser works with paths, returning a string:

```

os.path.expanduser(pathlib.Path("~/file.nc")) '/home/username/file.nc' ``` If its preferred to keep a Path instance, then the Path.expanduser method is the equivalent.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
816011259 https://github.com/pydata/xarray/pull/5135#issuecomment-816011259 https://api.github.com/repos/pydata/xarray/issues/5135 MDEyOklzc3VlQ29tbWVudDgxNjAxMTI1OQ== bcbnz 367900 2021-04-08T17:36:57Z 2021-04-08T17:36:57Z CONTRIBUTOR

I think this would also fix #5132 if os.path.abspath was added around the os.path.expanduser calls.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix open_dataset regression 853644364
816010232 https://github.com/pydata/xarray/issues/5132#issuecomment-816010232 https://api.github.com/repos/pydata/xarray/issues/5132 MDEyOklzc3VlQ29tbWVudDgxNjAxMDIzMg== bcbnz 367900 2021-04-08T17:35:15Z 2021-04-08T17:35:15Z CONTRIBUTOR

@kmuehlbauer that doesn't as-is but is related. os.path.expanduser("test.nc") returns test.nc. Adding os.path.abspath around the expanduser call would probably do the trick. I'll comment over there.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Backend caching should not use a relative path 853473276
779750462 https://github.com/pydata/xarray/pull/4911#issuecomment-779750462 https://api.github.com/repos/pydata/xarray/issues/4911 MDEyOklzc3VlQ29tbWVudDc3OTc1MDQ2Mg== bcbnz 367900 2021-02-16T10:40:28Z 2021-02-16T10:40:28Z CONTRIBUTOR

Not for this PR but I wonder if xr.DataArray([1]).sum(min_count=2) should actually return NA.

It would make it more consistent, the current behaviour is

  • xr.DataArray([1]).sum(min_count=2) -> 1
  • xr.DataArray([1.0]).sum(min_count=2) -> nan

due to the different dtype defaults for skipna. I guess this would require rework of the dispatch logic in xarray.core.duck_array_ops._create_nan_agg_method to make it call nansum/nanprod when appropriate (e.g., should setting min_count imply skipna=True?).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix behaviour of min_count in reducing functions 808558647
779709464 https://github.com/pydata/xarray/pull/4911#issuecomment-779709464 https://api.github.com/repos/pydata/xarray/issues/4911 MDEyOklzc3VlQ29tbWVudDc3OTcwOTQ2NA== bcbnz 367900 2021-02-16T09:36:32Z 2021-02-16T09:36:32Z CONTRIBUTOR
  • Changed to use duck_array_ops.where
  • Docstring updated and entry added to breaking changes of whats new (I assumed this was appropriate). Note its a bit more nuanced than @mathause stated: for an integer array, skipna defaults to False and so xarray.core.duck_array_ops._create_nan_agg_method does not call nansum. This means you have to force skipna=True to see the difference:
    • master: xr.DataArray([1]).sum(min_count=1).dtype -> int64
    • this PR: xr.DataArray([1]).sum(min_count=1).dtype -> int64
    • master: xr.DataArray([1]).sum(skipna=True, min_count=1).dtype -> int64
    • this PR: xr.DataArray([1]).sum(skipna=True, min_count=1).dtype -> float64

Hopefully this is clear in the docstring changes, suggestions welcome.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Fix behaviour of min_count in reducing functions 808558647
778341660 https://github.com/pydata/xarray/issues/4898#issuecomment-778341660 https://api.github.com/repos/pydata/xarray/issues/4898 MDEyOklzc3VlQ29tbWVudDc3ODM0MTY2MA== bcbnz 367900 2021-02-12T17:44:55Z 2021-02-12T17:55:48Z CONTRIBUTOR

@dcherian it looks like that works. A better test script:

```python import numpy as np import xarray as xr from xarray.tests import raise_if_dask_computes

def worker(da): if da.shape == (0, 0): return da

return da.where(da > 1)

np.random.seed(1023) da = xr.DataArray( np.random.normal(size=(20, 500)), dims=("x", "y"), coords=(np.arange(20), np.arange(500)), )

da = da.chunk(dict(x=5)) lazy = da.map_blocks(worker)

with raise_if_dask_computes(): result = lazy.sum("x", skipna=True, min_count=5)

result.load()

assert np.isnan(result[0]) assert not np.isnan(result[6]) ```

If I then remove the if null_mask.any() check and the following block, and replace it with

python dtype, fill_value = dtypes.maybe_promote(result.dtype) result = result.astype(dtype) result = np.where(null_mask, fill_value, result) it passes. I can start working on a pull request with these tests and changes if that looks acceptable to you.

~~How would you suggest handling the possible type promotion from the current dtype, fill_value = dtypes.maybe_promote(result.dtype) line? Currently it only tries promoting if the mask is True anywhere. Always promote, or just use the fill value and hope it works out?~~

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sum and prod with min_count forces evaluation 807089005
778329536 https://github.com/pydata/xarray/issues/4898#issuecomment-778329536 https://api.github.com/repos/pydata/xarray/issues/4898 MDEyOklzc3VlQ29tbWVudDc3ODMyOTUzNg== bcbnz 367900 2021-02-12T17:23:51Z 2021-02-12T17:23:51Z CONTRIBUTOR

A quick check with the debugger and it is the null_mask.any() call that is causing it to compute.

I think I've found another problem with _maybe_null_out if it is reducing over all dimensions. With the altered MCVE

```python import numpy as np import xarray as xr

def worker(da): if da.shape == (0, 0): return da

res = xr.full_like(da, np.nan)
res[0, 0] = 1
return res

da = xr.DataArray( np.random.normal(size=(20, 500)), dims=("x", "y"), coords=(np.arange(20), np.arange(500)), )

da = da.chunk(dict(x=5)) lazy = da.map_blocks(worker) result_allaxes = lazy.sum(skipna=True, min_count=5) result_allaxes.load() ```

I would expect result_allaxes to be nan since there are four blocks and therefore four non-nan values, less than min_count. Instead it is 4.

The problem seems to be the dtype check:

https://github.com/pydata/xarray/blob/5296ed18272a856d478fbbb3d3253205508d1c2d/xarray/core/nanops.py#L39

The test returns True for float64 and so the block isn't run. Another MCVE:

```python import numpy as np from xarray.core import dtypes

print(dtypes.NAT_TYPES) print(np.dtype("float64") in dtypes.NAT_TYPES) ```

Output: console (numpy.datetime64('NaT'), numpy.timedelta64('NaT')) True where I think False would be expected. Should I open a separate issue for this or can we track it here too?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sum and prod with min_count forces evaluation 807089005
778312719 https://github.com/pydata/xarray/issues/4898#issuecomment-778312719 https://api.github.com/repos/pydata/xarray/issues/4898 MDEyOklzc3VlQ29tbWVudDc3ODMxMjcxOQ== bcbnz 367900 2021-02-12T16:55:11Z 2021-02-12T16:55:11Z CONTRIBUTOR

grepping the code, the only other function that calls _maybe_null_out is prod, and I can confirm the problem also exists there. Updated the title, MCVE for prod:

```python import numpy as np import xarray as xr

def worker(da): if da.shape == (0, 0): return da

raise RuntimeError("I was evaluated")

da = xr.DataArray( np.random.normal(size=(20, 500)), dims=("x", "y"), coords=(np.arange(20), np.arange(500)), )

da = da.chunk(dict(x=5)) lazy = da.map_blocks(worker) result1 = lazy.prod("x", skipna=True) result2 = lazy.prod("x", skipna=True, min_count=5) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Sum and prod with min_count forces evaluation 807089005

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1039.779ms · About: xarray-datasette