issues
8 rows where state = "open" and user = 39069044 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1307112340 | I_kwDOAMm_X85N6POU | 6799 | `interp` performance with chunked dimensions | slevang 39069044 | open | 0 | 9 | 2022-07-17T14:25:17Z | 2024-04-26T21:41:31Z | CONTRIBUTOR | What is your issue?I'm trying to perform 2D interpolation on a large 3D array that is heavily chunked along the interpolation dimensions and not the third dimension. The application could be extracting a timeseries from a reanalysis dataset chunked in space but not time, to compare to observed station data with more precise coordinates. I use the advanced interpolation method as described in the documentation, with the interpolation coordinates specified by DataArray's with a shared dimension like so: ```python %load_ext memory_profiler import numpy as np import dask.array as da import xarray as xr Synthetic dataset chunked in the two interpolation dimensionsnt = 40000 nx = 200 ny = 200 ds = xr.Dataset( data_vars = { 'foo':( ('t', 'x', 'y'), da.random.random(size=(nt, nx, ny), chunks=(-1, 10, 10))), }, coords = { 't': np.linspace(0, 1, nt), 'x': np.linspace(0, 1, nx), 'y': np.linspace(0, 1, ny), } ) Interpolate to some random 2D locationsni = 10 xx = xr.DataArray(np.random.random(ni), dims='z', name='x') yy = xr.DataArray(np.random.random(ni), dims='z', name='y') interpolated = ds.foo.interp(x=xx, y=yy) %memit interpolated.compute() ``` With just 10 interpolation points, this example calculation uses about This could definitely work better, as each interpolated point usually only requires a single chunk of the input dataset, and at most 4 if it is right on the corner of a chunk. For example we can instead do it in a loop and get very reasonable memory usage, but this isn't very scalable:
I tried adding a Any tips to make this calculation work better with existing options, or otherwise ways we might improve the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6799/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
2220689594 | PR_kwDOAMm_X85rcmw1 | 8904 | Handle extra indexes for zarr region writes | slevang 39069044 | open | 0 | 8 | 2024-04-02T14:34:00Z | 2024-04-03T19:20:37Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/8904 |
Small follow up to #8877. If we're going to drop the indices anyways for region writes, we may as well not raise if they are still in the dataset. This makes the user experience of region writes simpler: ```python ds = xr.tutorial.open_dataset("air_temperature") ds.to_zarr("test.zarr") region = {"time": slice(0, 10)} This fails unless we remember to ds.drop_vars(["lat", "lon"])ds.isel(**region).to_zarr("test.zarr", region=region) ``` I find this annoying because I often have a dataset with a bunch of unrelated indexes and have to remember which ones to drop, or use some verbose cc @dcherian |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8904/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
2126356395 | I_kwDOAMm_X85-vZ-r | 8725 | `broadcast_like()` doesn't copy chunking structure | slevang 39069044 | open | 0 | 2 | 2024-02-09T02:07:19Z | 2024-03-26T18:33:13Z | CONTRIBUTOR | What is your issue?```python import dask.array import xarray as xr da1 = xr.DataArray(dask.array.ones((3,3), chunks=(1, 1)), dims=["x", "y"]) da2 = xr.DataArray(dask.array.ones((3,), chunks=(1,)), dims=["x"]) da2.broadcast_like(da1).chunksizes Frozen({'x': (1, 1, 1), 'y': (3,)}) ``` Was surprised to not find any other issues around this. Feels like a major limitation of the method for a lot of use cases. Is there an easy hack around this? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8725/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1875857414 | I_kwDOAMm_X85vz1AG | 8129 | Sort the values of an nD array | slevang 39069044 | open | 0 | 11 | 2023-08-31T16:20:40Z | 2023-09-01T15:37:34Z | CONTRIBUTOR | Is your feature request related to a problem?As far as I know, there is no straightforward API in xarray to do what Describe the solution you'd likeWould there be interest in implementing a Note: this 1D example is not really relevant, see the 2D version and more obvious implementation in comments below for what I really want.
The goal is to handle arrays that we want to monotize like so: ```python da = xr.DataArray([1, 3, 2, 4], coords={"x": [1, 2, 3, 4]}) da.sort_values("x") <xarray.DataArray (x: 4)>
array([1, 2, 3, 4])
Coordinates:
* x (x) int64 1 2 3 4
```
In addition to ```python da = xr.DataArray([1, 3, 2, 4], coords={"x": [1, 3, 2, 4]}) da.sortby("x") <xarray.DataArray (x: 4)> array([1, 2, 3, 4]) Coordinates: * x (x) int64 1 2 3 4 ``` Describe alternatives you've consideredI don't know if Additional contextSome past related threads on this topic: https://github.com/pydata/xarray/issues/3957 https://stackoverflow.com/questions/64518239/sorting-dataset-along-axis-with-dask |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8129/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1397104515 | I_kwDOAMm_X85TRh-D | 7130 | Passing keyword arguments to external functions | slevang 39069044 | open | 0 | 3 | 2022-10-05T02:51:35Z | 2023-03-26T19:15:00Z | CONTRIBUTOR | What is your issue?Follow on from #6891 and #6978 to discuss how we could homogenize the passing of keyword arguments to wrapped external functions across xarray methods. There are quite a few methods like this where we are ultimately passing data to numpy, scipy, or some other library and want the option to send variable length kwargs to that underlying function. There are two different ways of doing this today:
I could only find a few examples of the latter:
Allowing direct passage with |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7130/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1581046647 | I_kwDOAMm_X85ePNt3 | 7522 | Differences in `to_netcdf` for dask and numpy backed arrays | slevang 39069044 | open | 0 | 7 | 2023-02-11T23:06:37Z | 2023-03-01T23:12:11Z | CONTRIBUTOR | What is your issue?I make use of This works great, in that a many GB file can be lazy-loaded as a dataset in a few hundred milliseconds, by only parsing the netcdf headers with under-the-hood byte range requests. But, only if the netcdf is written from dask-backed arrays. Somehow, writing from numpy-backed arrays produces a different netcdf that requires reading deeper into the file to parse as a dataset. I spent some time digging into the backends and see xarray is ultimately passing off the store write to This should work as an MCVE: ```python import os import string import fsspec import numpy as np import xarray as xr fs = fsspec.filesystem("gs") bucket = "gs://<your-bucket>" create a ~160MB dataset with 20 variablesvariables = {v: (["x", "y"], np.random.random(size=(1000, 1000))) for v in string.ascii_letters[:20]} ds = xr.Dataset(variables) Save one version from numpy backed arrays and one from dask backed arraysds.compute().to_netcdf("numpy.nc") ds.chunk().to_netcdf("dask.nc") Copy these to a bucket of your choicefs.put("numpy.nc", bucket) fs.put("dask.nc", bucket) ``` Then time reading in these files as datasets with fsspec: ```python %timeit xr.open_dataset(fs.open(os.path.join(bucket, "numpy.nc"))) 2.15 s ± 40.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)``` ```python %timeit xr.open_dataset(fs.open(os.path.join(bucket, "dask.nc"))) 187 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7522/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
xarray 13221727 | issue | ||||||||
1359368857 | PR_kwDOAMm_X84-PSvu | 6978 | fix passing of curvefit kwargs | slevang 39069044 | open | 0 | 5 | 2022-09-01T20:26:01Z | 2022-10-11T18:50:45Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/6978 |
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6978/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
1043746973 | PR_kwDOAMm_X84uC1vs | 5933 | Reimplement `.polyfit()` with `apply_ufunc` | slevang 39069044 | open | 0 | 6 | 2021-11-03T15:29:58Z | 2022-10-06T21:42:09Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/5933 |
Reimplement There is a bunch of fiddly code here for handling the differing outputs from A few minor departures from the previous implementation:
1. The No new tests have been added since the previous suite was fairly comprehensive. Would be great to get some performance reports on real-world data such as the climate model detrending application in #5629. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5933/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);