home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 518761396

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/3159#issuecomment-518761396 https://api.github.com/repos/pydata/xarray/issues/3159 518761396 MDEyOklzc3VlQ29tbWVudDUxODc2MTM5Ng== 1217238 2019-08-06T17:13:27Z 2019-08-06T17:13:27Z MEMBER
  • Use a scalar array

This is the case that I'm not sure we want to support.

I think the rule we want is something like "scalar values are repeated automatically," but 0-dimensional arrays are kind of a strange case -- are they really scalars or multi-dimensional arrays? My inclination is to treat these like multi-dimensional arrays, in which case we should raise an error to avoid hiding errors.

In particular, one thing that an xarray user might expect, but which I think don't want to support, is full broadcasting of multi-dimensional arrays to match the shape of coordinates.

  • Use None to get an empty array

Rather than using None, I would suggest using a custom sentinel value. Somebody might actually want an array full of all None values! If users want an empty DataArray, make them omit the argument entirely, e.g., xr.DataArray(coord=coords, dims=dims).

The way we do this in xarray is with a ReprObject, e.g., see here for apply_ufunc: https://github.com/pydata/xarray/blob/1757dffac2fa493d7b9a074b84cf8c830a706688/xarray/core/computation.py#L26 https://github.com/pydata/xarray/blob/1757dffac2fa493d7b9a074b84cf8c830a706688/xarray/core/computation.py#L692

There is also the question of what values should be inside such an empty array. Here I think there are roughly two options: 1. Fill the unspecified array with np.nan, to indicate invalid values. 2. Just use np.empty, which means the array can be filled with arbitrary invalid data.

It looks like you've currently implemented option (2), but again I'm not sure that is the most sensible default behavior for xarray. The performance gains from not filling in array values with a constant are typically very small (writing constant values into memory is very fast). Pandas also seems to use NaN as the default value: ```

pandas.Series(index=[1, 2]) 1 NaN 2 NaN dtype: float64 ```

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  472100381
Powered by Datasette · Queries took 0.974ms · About: xarray-datasette