home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1266308714

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
1266308714 I_kwDOAMm_X85LelZq 6680 Datatype for a 'shape specification' of a Dataset / DataArray 4502 open 0     2 2022-06-09T15:25:36Z 2022-07-13T18:17:06Z   NONE      

Is your feature request related to a problem?

Often with xarray I find myself having to create a template Dataset or DataArray with dummy data in it just to specify the dimensions/sizes/coordinates/variable names that are required in some situation.

Describe the solution you'd like

It would be very useful to have a datatype that represents a shape specification (dimensions, sizes and coordinates) independently of the data so that we can do things like:

  • Implement xarray equivalents of functions like np.ones, np.zeros, np.random.normal(size=...) that are given a shape specification which the return value should conform to. (I have some more sophisticated / less trivial examples of this too, functions which currently need to be given templates for the return value but only depend on the shape of the template)
  • Test if two DataArrays / Datasets have the same shape
  • Memoize or cache things based on shape (this implies the shape spec would need to be hashable)
  • Make it easier to use xarray with libraries like tree / PyTree that can be used to flatten and unflatten a Dataset into its underlying arrays together with some specification of the shape of the data structure that can be used to unflatten it back again. (Right now I have to implement my own shape specification objects to do this)
  • Manipulate shape specifications e.g. by adding or removing dimensions from them without having to manipulate dummy template data in slightly arbitrary ways (e.g. template.isel(dim_to_be_dropped=0, drop=True)) in order to do this.

Describe alternatives you've considered

I realise that using lazy dask arrays largely removes the performance overhead of manipulating fake data, but (A) it still feels kinda ugly and adds boilerplate to construct the fake data, and (B) not everyone wants to depend on dask.

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/6680/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    13221727 issue

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 2 rows from issue in issue_comments
Powered by Datasette · Queries took 1.311ms · About: xarray-datasette