issues
22 rows where user = 39069044 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1307112340 | I_kwDOAMm_X85N6POU | 6799 | `interp` performance with chunked dimensions | slevang 39069044 | open | 0 | 9 | 2022-07-17T14:25:17Z | 2024-04-26T21:41:31Z | CONTRIBUTOR | What is your issue?I'm trying to perform 2D interpolation on a large 3D array that is heavily chunked along the interpolation dimensions and not the third dimension. The application could be extracting a timeseries from a reanalysis dataset chunked in space but not time, to compare to observed station data with more precise coordinates. I use the advanced interpolation method as described in the documentation, with the interpolation coordinates specified by DataArray's with a shared dimension like so: ```python %load_ext memory_profiler import numpy as np import dask.array as da import xarray as xr Synthetic dataset chunked in the two interpolation dimensionsnt = 40000 nx = 200 ny = 200 ds = xr.Dataset( data_vars = { 'foo':( ('t', 'x', 'y'), da.random.random(size=(nt, nx, ny), chunks=(-1, 10, 10))), }, coords = { 't': np.linspace(0, 1, nt), 'x': np.linspace(0, 1, nx), 'y': np.linspace(0, 1, ny), } ) Interpolate to some random 2D locationsni = 10 xx = xr.DataArray(np.random.random(ni), dims='z', name='x') yy = xr.DataArray(np.random.random(ni), dims='z', name='y') interpolated = ds.foo.interp(x=xx, y=yy) %memit interpolated.compute() ``` With just 10 interpolation points, this example calculation uses about This could definitely work better, as each interpolated point usually only requires a single chunk of the input dataset, and at most 4 if it is right on the corner of a chunk. For example we can instead do it in a loop and get very reasonable memory usage, but this isn't very scalable:
I tried adding a Any tips to make this calculation work better with existing options, or otherwise ways we might improve the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6799/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
2220689594 | PR_kwDOAMm_X85rcmw1 | 8904 | Handle extra indexes for zarr region writes | slevang 39069044 | open | 0 | 8 | 2024-04-02T14:34:00Z | 2024-04-03T19:20:37Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/8904 |
Small follow up to #8877. If we're going to drop the indices anyways for region writes, we may as well not raise if they are still in the dataset. This makes the user experience of region writes simpler: ```python ds = xr.tutorial.open_dataset("air_temperature") ds.to_zarr("test.zarr") region = {"time": slice(0, 10)} This fails unless we remember to ds.drop_vars(["lat", "lon"])ds.isel(**region).to_zarr("test.zarr", region=region) ``` I find this annoying because I often have a dataset with a bunch of unrelated indexes and have to remember which ones to drop, or use some verbose cc @dcherian |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8904/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
2171912634 | PR_kwDOAMm_X85o3Ify | 8809 | Pass variable name to `encode_zarr_variable` | slevang 39069044 | closed | 0 | 6 | 2024-03-06T16:21:53Z | 2024-04-03T14:26:49Z | 2024-04-03T14:26:48Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/8809 |
The change from https://github.com/pydata/xarray/pull/8672 mostly fixed the issue of serializing a reset multiindex in the backends, but there was an additional niche issue that turned up in xeofs that was causing serialization to still fail on the zarr backend. The issue is that zarr is the only backend that uses a custom version of As a minimal fix, this PR just passes The exact workflow this turned up in involves DataTree and looks like this: ```python import numpy as np import xarray as xr from datatree import DataTree ND DataArray that gets stacked along a multiindexda = xr.DataArray(np.ones((3, 3)), coords={"dim1": [1, 2, 3], "dim2": [4, 5, 6]}) da = da.stack(feature=["dim1", "dim2"]) Extract just the stacked coordinates for saving in a datasetds = xr.Dataset(data_vars={"feature": da.feature}) Reset the multiindex, which should make things serializableds = ds.reset_index("feature") dt1 = DataTree() dt2 = DataTree(name="feature", data=ds) dt1["foo"] = dt2 Somehow in this step, dt1.foo.feature.dim1.variable becomes an IndexVariable againprint(type(dt1.foo.feature.dim1.variable)) Worksdt1.to_netcdf("test.nc", mode="w") Failsdt1.to_zarr("test.zarr", mode="w") ``` But we can reproduce in xarray with the test added here. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8809/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
2126356395 | I_kwDOAMm_X85-vZ-r | 8725 | `broadcast_like()` doesn't copy chunking structure | slevang 39069044 | open | 0 | 2 | 2024-02-09T02:07:19Z | 2024-03-26T18:33:13Z | CONTRIBUTOR | What is your issue?```python import dask.array import xarray as xr da1 = xr.DataArray(dask.array.ones((3,3), chunks=(1, 1)), dims=["x", "y"]) da2 = xr.DataArray(dask.array.ones((3,), chunks=(1,)), dims=["x"]) da2.broadcast_like(da1).chunksizes Frozen({'x': (1, 1, 1), 'y': (3,)}) ``` Was surprised to not find any other issues around this. Feels like a major limitation of the method for a lot of use cases. Is there an easy hack around this? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8725/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1648260939 | I_kwDOAMm_X85iPndL | 7702 | Allow passing coordinates in `to_zarr(region=...)` rather than passing indexes | slevang 39069044 | closed | 0 | 3 | 2023-03-30T20:23:00Z | 2023-11-14T18:34:51Z | 2023-11-14T18:34:51Z | CONTRIBUTOR | Is your feature request related to a problem?If I want to write to a region of data in a zarr, I usually have some boilerplate code like this:
Describe the solution you'd likeIt would be nice to automate this within There may be pitfalls I'm not thinking of, and I don't know exactly what the API would look like.
Describe alternatives you've consideredNo response Additional contextNo response |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7702/reactions", "total_count": 7, "+1": 7, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1985969769 | PR_kwDOAMm_X85fDaBX | 8434 | Automatic region detection and transpose for `to_zarr()` | slevang 39069044 | closed | 0 | 15 | 2023-11-09T16:15:08Z | 2023-11-14T18:34:50Z | 2023-11-14T18:34:50Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/8434 |
A quick pass at implementing these two improvements for zarr region writes:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8434/reactions", "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
1060265915 | I_kwDOAMm_X84_Ml-7 | 6013 | Memory leak with `open_zarr` default chunking option | slevang 39069044 | closed | 0 | 3 | 2021-11-22T15:06:33Z | 2023-11-10T03:08:35Z | 2023-11-10T02:32:49Z | CONTRIBUTOR | What happened:
I've been using xarray to open zarr datasets within a Flask app, and spent some time debugging a memory leak. What I found is that For whatever reason this function is generating dask items that are not easily cleared from memory within the context of a Flask route, and memory usage continues to grow within my app, at least towards some plateau. This memory growth isn't reproducible outside of a Flask route, so it's a bit of a niche problem. First proposal would be to simply align the default What you expected to happen: Memory usage should not grow when opening a zarr dataset within a Flask route. Minimal Complete Verifiable Example: ```python from flask import Flask import xarray as xr import gc import dask.array as da save a test dataset to zarr locallyds_test = xr.Dataset({"foo": (["x", "y", "z"], da.random.random(size=(300,300,300)))}) ds_test.to_zarr('test.zarr', mode='w') app = Flask(name) ping this route repeatedly to see memory increase@app.route('/open_zarr') def open_zarr(): # with default chunks='auto', memory grows, with chunks=None, memory is ok ds = xr.open_zarr('test.zarr', chunks='auto').compute() # Try to explicity clear memory but this doesn't help del ds gc.collect() return 'check memory' if name == 'main': app.run(host='0.0.0.0', port=8080, debug=True) ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.11.0-40-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1 xarray: 0.20.1 pandas: 1.3.4 numpy: 1.19.5 scipy: 1.7.2 netCDF4: 1.5.8 pydap: None h5netcdf: 0.11.0 h5py: 3.1.0 Nio: None zarr: 2.10.1 cftime: 1.5.1.1 nc_time_axis: 1.4.0 PseudoNetCDF: None rasterio: 1.2.10 cfgrib: 0.9.9.1 iris: None bottleneck: 1.3.2 dask: 2021.11.1 distributed: 2021.11.1 matplotlib: 3.4.3 cartopy: 0.20.1 seaborn: None numbagg: None fsspec: 2021.11.0 cupy: None pint: 0.18 sparse: None setuptools: 58.5.3 pip: 21.3.1 conda: None pytest: None IPython: 7.29.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6013/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1875857414 | I_kwDOAMm_X85vz1AG | 8129 | Sort the values of an nD array | slevang 39069044 | open | 0 | 11 | 2023-08-31T16:20:40Z | 2023-09-01T15:37:34Z | CONTRIBUTOR | Is your feature request related to a problem?As far as I know, there is no straightforward API in xarray to do what Describe the solution you'd likeWould there be interest in implementing a Note: this 1D example is not really relevant, see the 2D version and more obvious implementation in comments below for what I really want.
The goal is to handle arrays that we want to monotize like so: ```python da = xr.DataArray([1, 3, 2, 4], coords={"x": [1, 2, 3, 4]}) da.sort_values("x") <xarray.DataArray (x: 4)>
array([1, 2, 3, 4])
Coordinates:
* x (x) int64 1 2 3 4
```
In addition to ```python da = xr.DataArray([1, 3, 2, 4], coords={"x": [1, 3, 2, 4]}) da.sortby("x") <xarray.DataArray (x: 4)> array([1, 2, 3, 4]) Coordinates: * x (x) int64 1 2 3 4 ``` Describe alternatives you've consideredI don't know if Additional contextSome past related threads on this topic: https://github.com/pydata/xarray/issues/3957 https://stackoverflow.com/questions/64518239/sorting-dataset-along-axis-with-dask |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8129/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1689655334 | I_kwDOAMm_X85kthgm | 7797 | More `groupby` indexing problems | slevang 39069044 | closed | 0 | 1 | 2023-04-29T18:58:11Z | 2023-05-02T14:48:43Z | 2023-05-02T14:48:43Z | CONTRIBUTOR | What happened?There is still something wrong with the groupby indexing changes from ```python import numpy as np import xarray as xr monthly timeseries that should return "zero anomalies" everywheretime = xr.date_range("2023-01-01", "2023-12-31", freq="MS") data = np.linspace(-1, 1, 12) x = xr.DataArray(data, coords={"time": time}) clim = xr.DataArray(data, coords={"month": np.arange(1, 13, 1)}) seems to give the correct result if we use the full x, but not with a slicex_slice = x.sel(time=["2023-04-01"]) two typical ways of computing anomaliesanom_gb = x_slice.groupby("time.month") - clim anom_sel = x_slice - clim.sel(month=x_slice.time.dt.month) passes on 2023.3.0, fails on 2023.4.2the groupby version is aligning the indexes wrong, giving us something other than 0assert anom_sel.equals(anom_gb) ``` Related: #7759 #7766 cc @dcherian What did you expect to happen?No response Minimal Complete Verifiable ExampleNo response MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7797/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1397104515 | I_kwDOAMm_X85TRh-D | 7130 | Passing keyword arguments to external functions | slevang 39069044 | open | 0 | 3 | 2022-10-05T02:51:35Z | 2023-03-26T19:15:00Z | CONTRIBUTOR | What is your issue?Follow on from #6891 and #6978 to discuss how we could homogenize the passing of keyword arguments to wrapped external functions across xarray methods. There are quite a few methods like this where we are ultimately passing data to numpy, scipy, or some other library and want the option to send variable length kwargs to that underlying function. There are two different ways of doing this today:
I could only find a few examples of the latter:
Allowing direct passage with |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7130/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1581046647 | I_kwDOAMm_X85ePNt3 | 7522 | Differences in `to_netcdf` for dask and numpy backed arrays | slevang 39069044 | open | 0 | 7 | 2023-02-11T23:06:37Z | 2023-03-01T23:12:11Z | CONTRIBUTOR | What is your issue?I make use of This works great, in that a many GB file can be lazy-loaded as a dataset in a few hundred milliseconds, by only parsing the netcdf headers with under-the-hood byte range requests. But, only if the netcdf is written from dask-backed arrays. Somehow, writing from numpy-backed arrays produces a different netcdf that requires reading deeper into the file to parse as a dataset. I spent some time digging into the backends and see xarray is ultimately passing off the store write to This should work as an MCVE: ```python import os import string import fsspec import numpy as np import xarray as xr fs = fsspec.filesystem("gs") bucket = "gs://<your-bucket>" create a ~160MB dataset with 20 variablesvariables = {v: (["x", "y"], np.random.random(size=(1000, 1000))) for v in string.ascii_letters[:20]} ds = xr.Dataset(variables) Save one version from numpy backed arrays and one from dask backed arraysds.compute().to_netcdf("numpy.nc") ds.chunk().to_netcdf("dask.nc") Copy these to a bucket of your choicefs.put("numpy.nc", bucket) fs.put("dask.nc", bucket) ``` Then time reading in these files as datasets with fsspec: ```python %timeit xr.open_dataset(fs.open(os.path.join(bucket, "numpy.nc"))) 2.15 s ± 40.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)``` ```python %timeit xr.open_dataset(fs.open(os.path.join(bucket, "dask.nc"))) 187 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7522/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
xarray 13221727 | issue | ||||||||
1483235066 | PR_kwDOAMm_X85Eti0b | 7364 | Handle numpy-only attrs in `xr.where` | slevang 39069044 | closed | 0 | 1 | 2022-12-08T00:52:43Z | 2022-12-10T21:52:49Z | 2022-12-10T21:52:37Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/7364 |
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7364/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
1423114234 | I_kwDOAMm_X85U0v_6 | 7220 | `xr.where(..., keep_attrs=True)` overwrites coordinate attributes | slevang 39069044 | closed | 0 | 3 | 2022-10-25T21:17:17Z | 2022-11-30T23:35:30Z | 2022-11-30T23:35:30Z | CONTRIBUTOR | What happened?6461 had some unintended consequences for
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7220/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1424732975 | PR_kwDOAMm_X85Bnoaj | 7229 | Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` | slevang 39069044 | closed | 0 | 5 | 2022-10-26T21:45:01Z | 2022-11-30T23:35:29Z | 2022-11-30T23:35:29Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/7229 |
Reverts the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7229/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
1198058137 | PR_kwDOAMm_X8416DPB | 6461 | Fix `xr.where(..., keep_attrs=True)` bug | slevang 39069044 | closed | 0 | 4 | 2022-04-09T03:02:40Z | 2022-10-25T22:40:15Z | 2022-04-12T02:12:39Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/6461 |
Fixes a bug introduced by #4687 where passing a non-xarray object to |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6461/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
1359368857 | PR_kwDOAMm_X84-PSvu | 6978 | fix passing of curvefit kwargs | slevang 39069044 | open | 0 | 5 | 2022-09-01T20:26:01Z | 2022-10-11T18:50:45Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/6978 |
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6978/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
1043746973 | PR_kwDOAMm_X84uC1vs | 5933 | Reimplement `.polyfit()` with `apply_ufunc` | slevang 39069044 | open | 0 | 6 | 2021-11-03T15:29:58Z | 2022-10-06T21:42:09Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/5933 |
Reimplement There is a bunch of fiddly code here for handling the differing outputs from A few minor departures from the previous implementation:
1. The No new tests have been added since the previous suite was fairly comprehensive. Would be great to get some performance reports on real-world data such as the climate model detrending application in #5629. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5933/reactions", "total_count": 2, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
1381294181 | I_kwDOAMm_X85SVOBl | 7062 | Rolling mean on dask array does not preserve dtype | slevang 39069044 | closed | 0 | 2 | 2022-09-21T17:55:30Z | 2022-09-22T22:06:09Z | 2022-09-22T22:06:09Z | CONTRIBUTOR | What happened?Calling What did you expect to happen?This is a simple enough operation that if you start with Minimal Complete Verifiable Example```Python
MVCE confirmation
Relevant log outputNo response Anything else we need to know?5877 is somewhat related.Environment
INSTALLED VERSIONS
------------------
commit: e6791852aa7ec0b126048b0986e205e158ab9601
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 2022.6.1.dev63+ge6791852.d20220921
pandas: 1.4.2
numpy: 1.21.6
scipy: 1.8.1
netCDF4: 1.5.8
pydap: installed
h5netcdf: 1.0.2
h5py: 3.6.0
Nio: None
zarr: 2.12.0
cftime: 1.6.0
nc_time_axis: 1.4.1
PseudoNetCDF: 3.2.2
rasterio: 1.2.10
cfgrib: 0.9.10.1
iris: 3.2.1
bottleneck: 1.3.4
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: 3.5.2
cartopy: 0.20.2
seaborn: 0.11.2
numbagg: 0.2.1
fsspec: 2022.8.2
cupy: None
pint: 0.19.2
sparse: 0.13.0
flox: 0.5.9
numpy_groupies: 0.9.19
setuptools: 62.0.0
pip: 22.2.2
conda: None
pytest: 7.1.3
IPython: None
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7062/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1381297782 | PR_kwDOAMm_X84_XseG | 7063 | Better dtype preservation for rolling mean on dask array | slevang 39069044 | closed | 0 | 1 | 2022-09-21T17:59:07Z | 2022-09-22T22:06:08Z | 2022-09-22T22:06:08Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/7063 |
This just tests to make sure we at least get the same dtype whether we have a numpy or dask array. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7063/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
1380016376 | PR_kwDOAMm_X84_TlHf | 7060 | More informative error for non-existent zarr store | slevang 39069044 | closed | 0 | 2 | 2022-09-20T21:27:35Z | 2022-09-20T22:38:45Z | 2022-09-20T22:38:45Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/7060 |
I've often been tripped up by the stack trace noted in #6484. This PR changes two things:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7060/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
859218255 | MDU6SXNzdWU4NTkyMTgyNTU= | 5165 | Poor memory management with dask=2021.4.0 | slevang 39069044 | closed | 0 | 4 | 2021-04-15T20:19:05Z | 2021-04-21T12:16:31Z | 2021-04-21T10:17:40Z | CONTRIBUTOR | What happened:
With the latest dask release What you expected to happen:
Dask would intelligently manage chunks and not fill up memory. This works fine in Minimal Complete Verifiable Example: Generate a synthetic dataset with time/lat/lon variable and associated climatology stored to disk, then calculate the anomaly: ```python import xarray as xr import pandas as pd import numpy as np import dask.array as da dates = pd.date_range('1980-01-01', '2019-12-31', freq='D') ds = xr.Dataset( data_vars = { 'x':( ('time', 'lat', 'lon'), da.random.random(size=(dates.size, 360, 720), chunks=(1, -1, -1))), 'clim':( ('dayofyear', 'lat', 'lon'), da.random.random(size=(366, 360, 720), chunks=(1, -1, -1))), }, coords = { 'time': dates, 'dayofyear': np.arange(1, 367, 1), 'lat': np.arange(-90, 90, .5), 'lon': np.arange(-180, 180, .5), } ) My original use case was pulling this data from disk, but it doesn't actually seem to matterds.to_zarr('test-data', mode='w') ds = xr.open_zarr('test-data') ds['anom'] = ds.x.groupby('time.dayofyear') - ds.clim ds[['anom']].to_zarr('test-anom', mode='w') ``` Anything else we need to know?: Distributed vs local scheduler and file backend e.g. zarr vs netcdf don't seem to affect this. Dask graphs look the same for both 2021.3.0:
Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16) [GCC 9.3.0] python-bits: 64 OS: Linux OS-release: 5.8.0-48-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.17.1.dev52+ge5690588 pandas: 1.2.1 numpy: 1.19.5 scipy: 1.6.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: 0.8.1 h5py: 2.10.0 Nio: None zarr: 2.6.1 cftime: 1.3.1 nc_time_axis: 1.2.0 PseudoNetCDF: None rasterio: 1.1.8 cfgrib: 0.9.8.5 iris: None bottleneck: 1.3.2 dask: 2021.04.0 distributed: 2021.04.0 matplotlib: 3.3.3 cartopy: 0.18.0 seaborn: None numbagg: None pint: 0.16.1 setuptools: 49.6.0.post20210108 pip: 20.3.3 conda: None pytest: None IPython: 7.20.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5165/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
797302408 | MDExOlB1bGxSZXF1ZXN0NTY0MzM0ODQ1 | 4849 | Basic curvefit implementation | slevang 39069044 | closed | 0 | 12 | 2021-01-30T01:28:16Z | 2021-03-31T16:55:53Z | 2021-03-31T16:55:53Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/4849 |
This is a simple implementation of a more general curve-fitting API as discussed in #4300, using the existing scipy |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4849/reactions", "total_count": 5, "+1": 4, "-1": 0, "laugh": 0, "hooray": 1, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);