github: issues: 4 rows where repo = 13221727, state = "open" and user = 4711805 sorted by updated

4 rows where repo = 13221727, state = "open" and user = 4711805 sorted by updated_at descending

Search:

descending

id	node_id	number	title	user	state	comments	created_at	updated_at ▲	author_association	body	reactions	repo	type
406812274	MDU6SXNzdWU0MDY4MTIyNzQ=	2745	reindex doesn't preserve chunks	davidbrochart 4711805	open	1	2019-02-05T14:37:24Z	2023-12-04T20:46:36Z	CONTRIBUTOR	The following code creates a small (100x100) chunked `DataArray`, and then re-indexes it into a huge one (100000x100000): ```python import xarray as xr import numpy as np n = 100 x = np.arange(n) y = np.arange(n) da = xr.DataArray(np.zeros(nn).reshape(n, n), coords=[x, y], dims=['x', 'y']).chunk(n, n) n2 = 100000 x2 = np.arange(n2) y2 = np.arange(n2) da2 = da.reindex({'x': x2, 'y': y2}) da2 ``` But the re-indexed `DataArray` has `chunksize=(100000, 100000)` instead of `chunksize=(100, 100)`: `<xarray.DataArray (x: 100000, y: 100000)> dask.array<shape=(100000, 100000), dtype=float64, chunksize=(100000, 100000)> Coordinates: x (x) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999 * y (y) int64 0 1 2 3 4 5 6 ... 99994 99995 99996 99997 99998 99999` Which immediately leads to a memory error when trying to e.g. store it to a `zarr` archive: `python ds2 = da2.to_dataset(name='foo') ds2.to_zarr(store='foo', mode='w')` Trying to re-chunk to 100x100 before storing it doesn't help, but this time it takes a lot more time before crashing with a memory error: `python da3 = da2.chunk(n, n) ds3 = da3.to_dataset(name='foo') ds3.to_zarr(store='foo', mode='w')`	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2745/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
414641120	MDU6SXNzdWU0MTQ2NDExMjA=	2789	Appending to zarr with string dtype	davidbrochart 4711805	open	2	2019-02-26T14:31:42Z	2022-04-09T02:18:05Z	CONTRIBUTOR	```python import xarray as xr da = xr.DataArray(['foo']) ds = da.to_dataset(name='da') ds.to_zarr('ds') # no special encoding specified ds = xr.open_zarr('ds') print(ds.da.values) ``` The following code prints `['foo']` (string type). The encoding chosen by zarr is `"dtype": "\|S3"`, which corresponds to bytes, but it seems to be decoded to a string, which is what we want. `$ cat ds/da/.zarray { "chunks": [ 1 ], "compressor": { "blocksize": 0, "clevel": 5, "cname": "lz4", "id": "blosc", "shuffle": 1 }, "dtype": "\|S3", "fill_value": null, "filters": null, "order": "C", "shape": [ 1 ], "zarr_format": 2 }` The problem is that if I want to append to the zarr archive, like so: ```python import zarr ds = zarr.open('ds', mode='a') da_new = xr.DataArray(['barbar']) ds.da.append(da_new) ds = xr.open_zarr('ds') print(ds.da.values) ``` It prints `['foo' 'bar']`. Indeed the encoding was kept as `"dtype": "\|S3"`, which is fine for a string of 3 characters but not for 6. If I want to specify the encoding with the maximum length, e.g: `python ds.to_zarr('ds', encoding={'da': {'dtype': '\|S6'}})` It solves the length problem, but now my strings are kept as bytes: `[b'foo' b'barbar']`. If I specify a Unicode encoding: `python ds.to_zarr('ds', encoding={'da': {'dtype': 'U6'}})` It is not taken into account. The zarr encoding is `"dtype": "\|S3"` and I am back to my length problem: `['foo' 'bar']`. The solution with `'dtype': '\|S6'` is acceptable, but I need to encode my strings to bytes when indexing, which is annoying.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2789/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
777670351	MDU6SXNzdWU3Nzc2NzAzNTE=	4756	feat: reindex multiple DataArrays	davidbrochart 4711805	open	1	2021-01-03T16:23:01Z	2021-01-03T19:05:03Z	CONTRIBUTOR	When e.g. creating a `Dataset` from multiple `DataArray`s that are supposed to share the same grid, but are not exactly aligned (as is often the case with floating point coordinates), we usually end up with undesirable `NaN`s inserted in the data set. For instance, consider the following data arrays that are not exactly aligned: ```python import xarray as xr da1 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[0, 1, 2], [0, 1, 2]], dims=['x', 'y']).rename('da1') da2 = xr.DataArray([[0, 1, 2], [3, 4, 5], [6, 7, 8]], coords=[[1.1, 2.1, 3.1], [1.1, 2.1, 3.1]], dims=['x', 'y']).rename('da2') da1.plot.imshow() da2.plot.imshow() `![image](https://user-images.githubusercontent.com/4711805/103482830-542bbe80-4de3-11eb-814b-bb1f705967c4.png) ![image](https://user-images.githubusercontent.com/4711805/103482836-61e14400-4de3-11eb-804b-f549c2551562.png) They show gaps when combined in a data set:`python ds = xr.Dataset({'da1': da1, 'da2': da2}) ds['da1'].plot.imshow() ds['da2'].plot.imshow() ![image](https://user-images.githubusercontent.com/4711805/103482959-3f9bf600-4de4-11eb-9513-900319cb485a.png) ![image](https://user-images.githubusercontent.com/4711805/103482966-47f43100-4de4-11eb-853b-2b44f7bc8d7f.png) I think this is a frequent enough situation that we would like a function to re-align all the data arrays together. There is a `reindex_like` method, which accepts a tolerance, but calling it successively on every data array, like so:python da1r = da1.reindex_like(da2, method='nearest', tolerance=0.2) da2r = da2.reindex_like(da1r, method='nearest', tolerance=0.2) ``` would result in the intersection of the coordinates, rather than their union. What I would like is a function like the following: ```python import numpy as np from functools import reduce def reindex_all(arrays, dims, tolerance): coords = {} for dim in dims: coord = reduce(np.union1d, [array[dim] for array in arrays[1:]], arrays[0][dim]) diff = coord[:-1] - coord[1:] keep = np.abs(diff) > tolerance coords[dim] = np.append(coord[:-1][keep], coord[-1]) reindexed = [array.reindex(coords, method='nearest', tolerance=tolerance) for array in arrays] return reindexed da1r, da2r = reindex_all([da1, da2], ['x', 'y'], 0.2) dsr = xr.Dataset({'da1': da1r, 'da2': da2r}) dsr['da1'].plot.imshow() dsr['da2'].plot.imshow() ``` I have not found something equivalent. If you think this is worth it, I could try and send a PR to implement such a feature.	{ "url": "https://api.github.com/repos/pydata/xarray/issues/4756/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue
415614806	MDU6SXNzdWU0MTU2MTQ4MDY=	2793	Fit bounding box to coarser resolution	davidbrochart 4711805	open	2	2019-02-28T13:07:09Z	2019-04-11T14:37:47Z	CONTRIBUTOR	When using coarsen, we often need to align the original DataArray with the coarser coordinates. For instance: ```python import xarray as xr import numpy as np da = xr.DataArray(np.arange(44).reshape(4, 4), coords=[np.arange(4, 0, -1) + 0.5, np.arange(4) + 0.5], dims=['lat', 'lon']) <xarray.DataArray (lat: 4, lon: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) Coordinates: lat (lat) float64 4.5 3.5 2.5 1.5 * lon (lon) float64 0.5 1.5 2.5 3.5 da.coarsen(lat=2, lon=2).mean() <xarray.DataArray (lat: 2, lon: 2)> array([[ 2.5, 4.5], [10.5, 12.5]]) Coordinates: * lat (lat) float64 4.0 2.0 * lon (lon) float64 1.0 3.0 `But if the coarser coordinates are aligned like:` lat: ... 5 3 1 ... lon: ... 1 3 5 ... Then directly applying `coarsen` will not work (here on the `lat` dimension). The following function extends the original DataArray so that it is aligned with the coarser coordinates:python def adjust_bbox(da, dims): """Adjust the bounding box of a DaskArray to a coarser resolution. Args: da: the DaskArray to adjust. dims: a dictionary where keys are the name of the dimensions on which to adjust, and the values are of the form [unsigned_coarse_resolution, signed_original_resolution] Returns: The DataArray bounding box adjusted to the coarser resolution. """ coords = {} for k, v in dims.items(): every, step = v offset = step / 2 dim0 = da[k].values[0] - offset dim1 = da[k].values[-1] + offset if step < 0: # decreasing coordinate dim0 = dim0 + (every - dim0 % every) % every dim1 = dim1 - dim1 % every else: # increasing coordinate dim0 = dim0 - dim0 % every dim1 = dim1 + (every - dim1 % every) % every coord0 = np.arange(dim0+offset, da[k].values[0]-offset, step) coord1 = da[k].values coord2 = np.arange(da[k].values[-1]+step, dim1, step) coord = np.hstack((coord0, coord1, coord2)) coords[k] = coord return da.reindex(*coords).fillna(0) da = adjust_bbox(da, {'lat': (2, -1), 'lon': (2, 1)}) <xarray.DataArray (lat: 6, lon: 4)> array([[ 0., 0., 0., 0.], [ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [12., 13., 14., 15.], [ 0., 0., 0., 0.]]) Coordinates: lat (lat) float64 5.5 4.5 3.5 2.5 1.5 0.5 * lon (lon) float64 0.5 1.5 2.5 3.5 da.coarsen(lat=2, lon=2).mean() <xarray.DataArray (lat: 3, lon: 2)> array([[0.25, 1.25], [6.5 , 8.5 ], [6.25, 7.25]]) Coordinates: * lat (lat) float64 5.0 3.0 1.0 * lon (lon) float64 1.0 3.0 `` Nowcoarsen`gives the right result. But`adjust_bbox` is rather complicated and specific to this use case (evenly spaced coordinate points...). Do you know of a better/more general way of doing it?	{ "url": "https://api.github.com/repos/pydata/xarray/issues/2793/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	xarray 13221727	issue

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);

issues

4 rows where repo = 13221727, state = "open" and user = 4711805 sorted by updated_at descending

<xarray.DataArray (lat: 4, lon: 4)>

array([[ 0, 1, 2, 3],

[ 4, 5, 6, 7],

[ 8, 9, 10, 11],

[12, 13, 14, 15]])

Coordinates:

* lat (lat) float64 4.5 3.5 2.5 1.5

* lon (lon) float64 0.5 1.5 2.5 3.5

<xarray.DataArray (lat: 2, lon: 2)>

array([[ 2.5, 4.5],

[10.5, 12.5]])

Coordinates:

* lat (lat) float64 4.0 2.0

* lon (lon) float64 1.0 3.0

<xarray.DataArray (lat: 6, lon: 4)>

array([[ 0., 0., 0., 0.],

[ 0., 1., 2., 3.],

[ 4., 5., 6., 7.],

[ 8., 9., 10., 11.],

[12., 13., 14., 15.],

[ 0., 0., 0., 0.]])

Coordinates:

* lat (lat) float64 5.5 4.5 3.5 2.5 1.5 0.5

* lon (lon) float64 0.5 1.5 2.5 3.5

<xarray.DataArray (lat: 3, lon: 2)>

array([[0.25, 1.25],

[6.5 , 8.5 ],

[6.25, 7.25]])

Coordinates:

* lat (lat) float64 5.0 3.0 1.0

* lon (lon) float64 1.0 3.0

Advanced export