home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where state = "open" and user = 306380 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 1

  • issue 5

state 1

  • open · 5 ✖

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2045596856 I_kwDOAMm_X8557VS4 8555 Docs look odd in dark mode mrocklin 306380 open 0     1 2023-12-18T02:31:26Z 2023-12-19T15:32:11Z   MEMBER      

What happened?

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • [ ] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • [ ] Complete example — the example is self-contained, including all data and the text of any traceback.
  • [ ] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • [ ] New issue — a search of GitHub Issues suggests this is not a duplicate.
  • [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8555/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
1821467933 I_kwDOAMm_X85skWUd 8021 Specify chunks in bytes mrocklin 306380 open 0     4 2023-07-26T02:29:43Z 2023-10-06T10:09:33Z   MEMBER      

Is your feature request related to a problem?

I'm playing around with xarray performance and would like a way to easily tweak chunk sizes. I'm able to do this by backing out what xarray chooses in an open_zarr call and then provide the right chunks= argument. I'll admit though that I wouldn't mind giving Xarray a value like "1 GiB" though and having it use that when determining "auto" chunk sizes.

Dask array does this in two ways. We can provide a value in chunks as like the following:

python x = da.random.random(..., chunks="1 GiB")

We also refer to a value in Dask config

```python In [1]: import dask

In [2]: dask.config.get("array.chunk-size") Out[2]: '128MiB' ```

This is not very important (I'm unblocked) but I thought I'd mention it in case someone is looking for some fun work 🙂

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8021/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
908971901 MDU6SXNzdWU5MDg5NzE5MDE= 5426 Implement dask.sizeof for xarray.core.indexing.ImplicitToExplicitIndexingAdapter mrocklin 306380 open 0     17 2021-06-02T01:55:23Z 2021-11-16T15:08:03Z   MEMBER      

I'm looking at a pangeo gallery workflow that suffers from poor load balancing because objects of type xarray.core.indexing.ImplicitToExplicitIndexingAdapter are being interpretted as 48B when in fact, I suspect, they are significantly larger to move around.

I'm seeing number of processing tasks charts that look like the following, which is a common sign of the load balancer not making good decisions, which is most commonly caused by poor data size measurements:

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/5426/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
207021356 MDU6SXNzdWUyMDcwMjEzNTY= 1262 Logical DTypes mrocklin 306380 open 0     11 2017-02-12T01:26:23Z 2020-12-26T14:26:00Z   MEMBER      

tl;dr: Can XArray enable user-defined logical dtypes on top of physical NumPy arrays ?

The Need for New Datatypes

NumPy's dtypes (int, float, etc.) are appropriate for many, but not all cases. There are a variety of situations where we want numpy-like array semantics (broadcasting, memory layout) but with different element properties. Use cases include the following:

  1. Datetimes with timezones
  2. Categorical values (such as for land-use in climate data)
  3. IPv4 or IPv6 addresses
  4. ...

Currently dtypes need to be added directly to the NumPy source code. This is a high barrier for many community members, requires general approval (there can be only one datetime implementation) (this is good and bad), and limits experimentation. There is value to supporting user-definable datatypes.

This is hard to do in NumPy

Ideally we would implement extensible user-defined dtypes within NumPy (and there may be long-standing plans to do just this). However, changing NumPy today is hard, both because it's hard to find developers who are comfortable operating at that level and because the backwards compatibility pressure on NumPy is large.

So as an alternative, we might consider lightly wrapping NumPy arrays in a new object that also includes extra dtype information. For example we might wrap an int64 numpy array with some datetime/timezone metadata to achieve a logical datetime array using a physical int64 array. We continue using NumPy as is but use this higher layer when necessary for more complex dtypes.

However "lightly wrapping" NumPy arrays is hard to do while still maintaining a closed system where all operations remain consistent (raw NumPy arrays inevitably leak through). Additionally, asking communities to switch to new libraries is socially quite challenging.

XArray is well placed

Fortunately XArray appears to have already solved some of these technical and social challenges. XArray lightly wraps NumPy arrays in a consistent manner. NumPy-like operations on XArrays remain XArrays. Interactions with other NumPy arrays are well defined. XArray has also attracted an active user/developer community and has attained general respect from the broader ecosystem. XArray seems to be hackable, benefits from a decently active community, and is not yet under as much backwards compatibility pressure.

So question: Is it sensible to add logical dtype information to XArray? Can this be done with only moderate effort and maintenance costs to the XArray project? If the answer is "yes, probably", then what is the right way to go about this?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/1262/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
456239422 MDU6SXNzdWU0NTYyMzk0MjI= 3022 LazilyOuterIndexedArray doesn't support slicing with slice objects mrocklin 306380 open 0     2 2019-06-14T13:05:56Z 2019-06-14T21:48:12Z   MEMBER      

Code Sample, a copy-pastable example if possible

python from xarray.core.indexing import LazilyOuterIndexedArray import numpy as np x = LazilyOuterIndexedArray(np.ones((5, 5))) x[:3]

```python-traceback

AttributeError Traceback (most recent call last) <ipython-input-4-42bee9beb30a> in <module> ----> 1 x[:3]

~/workspace/xarray/xarray/core/indexing.py in getitem(self, indexer) 518 array = LazilyVectorizedIndexedArray(self.array, self.key) 519 return array[indexer] --> 520 return type(self)(self.array, self._updated_key(indexer)) 521 522 def setitem(self, key, value):

~/workspace/xarray/xarray/core/indexing.py in _updated_key(self, new_key) 483 484 def _updated_key(self, new_key): --> 485 iter_new_key = iter(expanded_indexer(new_key.tuple, self.ndim)) 486 full_key = [] 487 for size, k in zip(self.array.shape, self.key.tuple):

AttributeError: 'slice' object has no attribute 'tuple' ```

Problem description

Dask array meta computations like to run x[:0, :0] on input arrays. This breaks with this class.

This is on master

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3022/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 38.495ms · About: xarray-datasette