home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

4 rows where repo = 13221727, state = "open" and user = 4711805 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date)

type 1

  • issue 4

state 1

  • open · 4 ✖

repo 1

  • xarray · 4 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
406812274 MDU6SXNzdWU0MDY4MTIyNzQ= 2745 reindex doesn't preserve chunks davidbrochart 4711805 open 0     1 2019-02-05T14:37:24Z 2023-12-04T20:46:36Z   CONTRIBUTOR      

The following code creates a small (100x100) chunked DataArray, and then re-indexes it into a huge one (100000x100000):

```python import xarray as xr import numpy as np

n = 100 x = np.arange(n) y = np.arange(n) da = xr.DataArray(np.zeros(n*n).reshape(n, n), coords=[x, y], dims=['x', 'y']).chunk(n, n)

n2 = 100000 x2 = np.arange(n2) y2 = np.arange(n2) da2 = da.reindex({'x': x2, 'y': y2}) da2 ```

But the re-indexed DataArray has chunksize=(100000, 100000) instead of chunksize=(100, 100):

<xarray.DataArray (x: 100000, y: 100000)> dask.array<shape=(100000, 100000), dtype=float64, chunksize=(100000, 100000)> Coordinates: * x (x) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999 * y (y) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999

Which immediately leads to a memory error when trying to e.g. store it to a zarr archive:

python ds2 = da2.to_dataset(name='foo') ds2.to_zarr(store='foo', mode='w')

Trying to re-chunk to 100x100 before storing it doesn't help, but this time it takes a lot more time before crashing with a memory error:

python da3 = da2.chunk(n, n) ds3 = da3.to_dataset(name='foo') ds3.to_zarr(store='foo', mode='w')

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2745/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
414641120 MDU6SXNzdWU0MTQ2NDExMjA= 2789 Appending to zarr with string dtype davidbrochart 4711805 open 0     2 2019-02-26T14:31:42Z 2022-04-09T02:18:05Z   CONTRIBUTOR      

```python import xarray as xr

da = xr.DataArray(['foo']) ds = da.to_dataset(name='da') ds.to_zarr('ds') # no special encoding specified

ds = xr.open_zarr('ds') print(ds.da.values) ```

The following code prints ['foo'] (string type). The encoding chosen by zarr is "dtype": "|S3", which corresponds to bytes, but it seems to be decoded to a string, which is what we want.

$ cat ds/da/.zarray { "chunks": [ 1 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "|S3", "fill_value": null, "filters": null, "order": "C", "shape": [ 1 ], "zarr_format": 2 }

The problem is that if I want to append to the zarr archive, like so:

```python import zarr

ds = zarr.open('ds', mode='a') da_new = xr.DataArray(['barbar']) ds.da.append(da_new)

ds = xr.open_zarr('ds') print(ds.da.values) ```

It prints ['foo' 'bar']. Indeed the encoding was kept as "dtype": "|S3", which is fine for a string of 3 characters but not for 6.

If I want to specify the encoding with the maximum length, e.g:

python ds.to_zarr('ds', encoding={'da': {'dtype': '|S6'}})

It solves the length problem, but now my strings are kept as bytes: [b'foo' b'barbar']. If I specify a Unicode encoding:

python ds.to_zarr('ds', encoding={'da': {'dtype': 'U6'}})

It is not taken into account. The zarr encoding is "dtype": "|S3" and I am back to my length problem: ['foo' 'bar'].

The solution with 'dtype': '|S6' is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2789/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
777670351 MDU6SXNzdWU3Nzc2NzAzNTE= 4756 feat: reindex multiple DataArrays davidbrochart 4711805 open 0     1 2021-01-03T16:23:01Z 2021-01-03T19:05:03Z   CONTRIBUTOR      

When e.g. creating a Dataset from multiple DataArrays that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable NaNs inserted in the data set. For instance, consider the following data arrays that are not exactly aligned: ```python import xarray as xr

da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1') da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2') da1.plot.imshow() da2.plot.imshow() ![image](https://user-images.githubusercontent.com/4711805/103482830-542bbe80-4de3-11eb-814b-bb1f705967c4.png) ![image](https://user-images.githubusercontent.com/4711805/103482836-61e14400-4de3-11eb-804b-f549c2551562.png) They show gaps when combined in a data set:python ds = xr.Dataset({'da1': da1, 'da2': da2}) ds['da1'].plot.imshow() ds['da2'].plot.imshow() ![image](https://user-images.githubusercontent.com/4711805/103482959-3f9bf600-4de4-11eb-9513-900319cb485a.png) ![image](https://user-images.githubusercontent.com/4711805/103482966-47f43100-4de4-11eb-853b-2b44f7bc8d7f.png) I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a `reindex_like` method, which accepts a tolerance, but calling it successively on every data array, like so:python da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2) da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2) ``` would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following:

```python import numpy as np from functools import reduce

def reindex_all(arrays, dims, tolerance): coords = {} for dim in dims: coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim]) diff = coord[:-1] - coord[1:] keep = np.abs(diff) > tolerance coords[dim] = np.append(coord[:-1][keep], coord[-1]) reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays] return reindexed

da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2) dsr = xr.Dataset({'da1': da1r, 'da2': da2r}) dsr['da1'].plot.imshow() dsr['da2'].plot.imshow() ``` I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4756/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
415614806 MDU6SXNzdWU0MTU2MTQ4MDY= 2793 Fit bounding box to coarser resolution davidbrochart 4711805 open 0     2 2019-02-28T13:07:09Z 2019-04-11T14:37:47Z   CONTRIBUTOR      

When using coarsen, we often need to align the original DataArray with the coarser coordinates. For instance: ```python import xarray as xr import numpy as np

da = xr.DataArray(np.arange(4*4).reshape(4, 4), coords=[np.arange(4, 0, -1) + 0.5, np.arange(4) + 0.5], dims=['lat', 'lon'])

<xarray.DataArray (lat: 4, lon: 4)>

array([[ 0, 1, 2, 3],

[ 4, 5, 6, 7],

[ 8, 9, 10, 11],

[12, 13, 14, 15]])

Coordinates:

* lat (lat) float64 4.5 3.5 2.5 1.5

* lon (lon) float64 0.5 1.5 2.5 3.5

da.coarsen(lat=2, lon=2).mean()

<xarray.DataArray (lat: 2, lon: 2)>

array([[ 2.5, 4.5],

[10.5, 12.5]])

Coordinates:

* lat (lat) float64 4.0 2.0

* lon (lon) float64 1.0 3.0

But if the coarser coordinates are aligned like: lat: ... 5 3 1 ... lon: ... 1 3 5 ... Then directly applying `coarsen` will not work (here on the `lat` dimension). The following function extends the original DataArray so that it is aligned with the coarser coordinates:python def adjust_bbox(da, dims): """Adjust the bounding box of a DaskArray to a coarser resolution.

Args:
    da: the DaskArray to adjust.
    dims: a dictionary where keys are the name of the dimensions on which to adjust, and the values are of the form [unsigned_coarse_resolution, signed_original_resolution]
Returns:
    The DataArray bounding box adjusted to the coarser resolution.
"""
coords = {}
for k, v in dims.items():
    every, step = v
    offset = step / 2
    dim0 = da[k].values[0] - offset
    dim1 = da[k].values[-1] + offset
    if step < 0: # decreasing coordinate
        dim0 = dim0 + (every - dim0 % every) % every
        dim1 = dim1 - dim1 % every
    else: # increasing coordinate
        dim0 = dim0 - dim0 % every
        dim1 = dim1 + (every - dim1 % every) % every
    coord0 = np.arange(dim0+offset, da[k].values[0]-offset, step)
    coord1 = da[k].values
    coord2 = np.arange(da[k].values[-1]+step, dim1, step)
    coord = np.hstack((coord0, coord1, coord2))
    coords[k] = coord
return da.reindex(**coords).fillna(0)

da = adjust_bbox(da, {'lat': (2, -1), 'lon': (2, 1)})

<xarray.DataArray (lat: 6, lon: 4)>

array([[ 0., 0., 0., 0.],

[ 0., 1., 2., 3.],

[ 4., 5., 6., 7.],

[ 8., 9., 10., 11.],

[12., 13., 14., 15.],

[ 0., 0., 0., 0.]])

Coordinates:

* lat (lat) float64 5.5 4.5 3.5 2.5 1.5 0.5

* lon (lon) float64 0.5 1.5 2.5 3.5

da.coarsen(lat=2, lon=2).mean()

<xarray.DataArray (lat: 3, lon: 2)>

array([[0.25, 1.25],

[6.5 , 8.5 ],

[6.25, 7.25]])

Coordinates:

* lat (lat) float64 5.0 3.0 1.0

* lon (lon) float64 1.0 3.0

`` Nowcoarsengives the right result. Butadjust_bbox` is rather complicated and specific to this use case (evenly spaced coordinate points...). Do you know of a better/more general way of doing it?

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/2793/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 28.711ms · About: xarray-datasette