issues
11 rows where repo = 13221727, type = "issue" and user = 13662783 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1987770706 | I_kwDOAMm_X852evlS | 8440 | nD integer indexing on dask data is very slow | Huite 13662783 | closed | 0 | 2 | 2023-11-10T14:47:08Z | 2023-11-12T04:56:23Z | 2023-11-12T04:56:22Z | CONTRIBUTOR | What happened?I ran into a situation where I was indexing with a 2D integer array into some chunked netCDF data. This indexing operation is extremely slow. Using a flat 1D index instead is as fast as expected. What did you expect to happen?I would expect indexing on dask data to be very quick since the work is delayed, and indeed it is so in the 1D case. However, the 2D case is very slow -- slower than actually doing the all the work with numpy arrays! Minimal Complete Verifiable Example```Python import dask.array import numpy as np import xarray as xr %%da = xr.DataArray( data=np.random.rand(100, 1_000_000), dims=("time", "x"), ) dask_da = xr.DataArray( data=dask.array.from_array(da.to_numpy(), chunks=(1, 1_000_000)), dims=("time", "x"), ) indexer = np.random.randint(0, 1_000_000, size=100_000) indexer2d = xr.DataArray( data=indexer.reshape((4, -1)), dims=("a", "b"), ) %%%timeit da.isel(x=indexer) # 162 ms %timeit da.isel(x=indexer2d) # 164 ms %timeit dask_da.isel(x=indexer) # 5.3 ms %timeit dask_da.isel(x=indexer2d) # 860 ms according to timeit, but 6 to 14 (!) seconds in interactive use ``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:34:57) [MSC v.1936 64 bit (AMD64)]
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 151 Stepping 2, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: ('English_Netherlands', '1252')
libhdf5: 1.14.2
libnetcdf: 4.9.2
xarray: 2023.10.2.dev31+ge5d163a8.d20231110
pandas: 2.1.2
numpy: 1.26.0
scipy: 1.11.3
netCDF4: 1.6.5
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.6.3
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.10.1
distributed: 2023.10.1
matplotlib: 3.8.1
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 68.1.2
pip: 23.2.1
conda: 23.3.1
pytest: None
mypy: None
IPython: 8.17.2
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8440/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1173497454 | I_kwDOAMm_X85F8iZu | 6377 | [FEATURE]: Add a replace method | Huite 13662783 | open | 0 | 8 | 2022-03-18T11:46:37Z | 2023-06-25T07:52:46Z | CONTRIBUTOR | Is your feature request related to a problem?If I have a DataArray of values:
There's no easy way like pandas (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html) to do this. (Apologies if I've missed related issues, searching for "replace" gives many hits as the word is obviously used quite often.) Describe the solution you'd like
I've had a try at a relatively efficient implementation below. I'm wondering whether it's a worthwhile addition to xarray? Describe alternatives you've consideredIgnoring issues such as dealing with NaNs, chunks, etc., a simple dict lookup:
Alternatively, leveraging pandas:
But I also tried my hand at a custom implementation, letting
``` Such an approach seems like it's consistently the fastest: ```python da = xr.DataArray(np.random.randint(0, 100, 100_000)) to_replace = np.random.choice(np.arange(100), 10, replace=False) value = to_replace * 200 test1 = custom_replace(da, to_replace, value) test2 = pandas_replace(da, to_replace, value) test3 = dict_replace(da, to_replace, value) assert test1.equals(test2) assert test1.equals(test3) 6.93 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit custom_replace(da, to_replace, value) 9.37 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit pandas_replace(da, to_replace, value) 26.8 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit dict_replace(da, to_replace, value) ``` With the advantage growing the number of values involved: ```python da = xr.DataArray(np.random.randint(0, 10_000, 100_000)) to_replace = np.random.choice(np.arange(10_000), 10_000, replace=False) value = to_replace * 200 test1 = custom_replace(da, to_replace, value) test2 = pandas_replace(da, to_replace, value) test3 = dict_replace(da, to_replace, value) assert test1.equals(test2) assert test1.equals(test3) 21.6 ms ± 990 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit custom_replace(da, to_replace, value) 3.12 s ± 574 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeit pandas_replace(da, to_replace, value) 42.7 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit dict_replace(da, to_replace, value) ``` In my real-life example, with a DataArray of approx 110 000 elements, with 60 000 values to replace, the custom one takes 33 ms, the dict one takes 135 ms, while pandas takes 26 s (!). Additional contextIn all cases, we need dealing with NaNs, checking the input, etc.: ```python def replace(da: xr.DataArray, to_replace: Any, value: Any): from xarray.core.utils import is_scalar
``` It think it should be easy to use e.g. let it operate on the numpy arrays so e.g. apply_ufunc will work. The primary issue is whether values can be sorted; in such a case the dict lookup might be an okay fallback? I've had a peek at the pandas implementation, but didn't become much wiser. Anyway, for your consideration! I'd be happy to submit a PR. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6377/reactions", "total_count": 9, "+1": 9, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
793245791 | MDU6SXNzdWU3OTMyNDU3OTE= | 4842 | nbytes does not return the true size for sparse variables | Huite 13662783 | closed | 0 | 2 | 2021-01-25T10:17:56Z | 2022-07-22T17:25:33Z | 2022-07-22T17:25:33Z | CONTRIBUTOR | This wasn't entirely surprising to me, but Since it uses Rather than something like Minimal Complete Verifiable Example: ```python import pandas as pd import numpy as np import xarray as xr df = pd.DataFrame() df["x"] = np.repeat(np.random.rand(10_000), 10) df["y"] = np.repeat(np.random.rand(10_000), 10) df["time"] = np.tile(pd.date_range("2000-01-01", "2000-03-10", freq="W"), 10_000) df["rate"] = 10.0 df = df.set_index(["time", "y", "x"]) sparse_ds = xr.Dataset.from_dataframe(df, sparse=True) print(sparse_ds["rate"].nbytes) ```
Environment: Output of <tt>xr.show_versions()</tt>``` INSTALLED VERSIONS ------------------ commit: None python: 3.7.9 (default, Aug 31 2020, 17:10:11) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.16.1 pandas: 1.1.2 numpy: 1.19.1 scipy: 1.5.2 netCDF4: 1.5.3 pydap: None h5netcdf: 0.8.0 h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.27.0 distributed: 2.30.1 matplotlib: 3.3.1 cartopy: None seaborn: 0.11.0 numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.3 conda: None pytest: 6.1.0 IPython: 7.19.0 sphinx: 3.2.1 ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4842/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
620468256 | MDU6SXNzdWU2MjA0NjgyNTY= | 4076 | Zarr ZipStore versus DirectoryStore: ZipStore requires .close() | Huite 13662783 | open | 0 | 4 | 2020-05-18T19:58:21Z | 2022-04-28T22:37:48Z | CONTRIBUTOR | I was saving my dataset into a ZipStore -- apparently succesfully -- but then I couldn't reopen them. The issue appears to be that a regular DirectoryStore behaves a little differently: it doesn't need to be closed, while a ZipStore. (I'm not sure how this relates to #2586, the remarks there don't appear to be applicable anymore.) MCVE Code SampleThis errors: ```python import xarray as xr import zarr works as expectedds = xr.Dataset({'foo': [2,3,4], 'bar': ('x', [1, 2]), 'baz': 3.14}) ds.to_zarr(zarr.DirectoryStore("test.zarr")) print(xr.open_zarr(zarr.DirectoryStore("test.zarr"))) error with ValueError "group not found at path ''ds.to_zarr(zarr.ZipStore("test.zip")) print(xr.open_zarr(zarr.ZipStore("test.zip"))) ``` Calling close, or using ```python store = zarr.ZipStore("test2.zip") ds.to_zarr(store) store.close() print(xr.open_zarr(zarr.ZipStore("test2.zip"))) with zarr.ZipStore("test3.zip") as store: ds.to_zarr(store) print(xr.open_zarr(zarr.ZipStore("test3.zip"))) ``` Expected OutputI think it would be preferable to close the ZipStore in this case. But I might be missing something? Problem DescriptionBecause VersionsOutput of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.2.dev41+g8415eefa.d20200419 pandas: 0.25.3 numpy: 1.17.5 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.14.0+23.gbea4c9a2 distributed: 2.14.0 matplotlib: 3.1.2 cartopy: None seaborn: 0.10.0 numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.3.4 IPython: 7.13.0 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4076/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
656165548 | MDU6SXNzdWU2NTYxNjU1NDg= | 4222 | Extension proposal: extension for UGRID and unstructured mesh utils | Huite 13662783 | closed | 0 | 5 | 2020-07-13T21:44:12Z | 2022-04-22T18:19:53Z | 2022-04-18T15:37:35Z | CONTRIBUTOR | Disclaimer: I'm not a 100% sure this is necessarily the right place to discuss this. I think some of the things I propose could be broadly useful. At any rate, I appreciate feedback! EDIT: to clarify, I don't think my original wording was entirely clear: I'm not suggesting adding this to xarray, but I'm thinking about a dataset accessor or a wrapping object in a separate package. IntroductionXarray is a great tool for structured data, but there isn't anything quite like it for unstructured meshes of convex cells. I'm coming from a hydrological background, and I believe many of my colleagues working with unstructured meshes could greatly benefit from xarray. In terms of data models and formats, there are many unstructured mesh formats, I think UGRID is the most widely shared (outside of specific model formats) and it supports many aspects (e.g. data on faces, nodes, edges). Although Gridded exists and provides a number of features, I really need something that consumes and produces xarray Datasets (I don't want to go without e.g. dask, or deal with the netcdf4 library myself). My most immediate need is some utilities for 2D unstructured meshes (y, x), and layered meshes (layer, y, x). This is also what UGRID is mostly used for now, as there don't seem to be many examples of the 3D unstructured mesh definitions in the wild. Sticking to 2D simplies things somewhat, so I'll ignore the third dimension for now -- but I don't see any reason why one couldn't generalize many ideas to 3D. I currently use xarray a great deal with structured data, in dealing with "regular" netCDFs. In my thinking, it's possible to "mirror" a few methods which by and large behave similarly to xarray, using the UGRID data model for the mesh topology. A simple UGRID exampleTo clarify what I'm talking about, this is what UGRID looks like in xarray, there's a Dataset with a dummy variable which describes in which variables the mesh topology is stored, in this case for a triangle and a quad.
It's only the topology that is special cased. A time dependent variable looks the same as it would otherwise, except it would have dimensions UGRID netCDFs can in principle be read and written without issue. I would prefer to work with standard xarray Datasets like this if possible, because it means there's no need to touch any IO methods, as well as maintaining the full flexibility of xarray. Some methodsI'm thinking of a few methods which behave similarly as the xarray methods except that they know about the mesh topology (in x and y) and the dependencies between the different variables. So, you could do: ```Python import xugrid # name of extension selection_ds = ds.xugrid.sel(node_x=slice(-30.0, 30.0)) ``` And it would slice based on the x coordinate of the nodes, but update all the other variables to produce a consistent mesh (and renumber the
RegriddingI think one of the major needs for dealing with unstructured grids is being able to map data from one mesh to another, go from (GIS) raster data to unstructured, or go from unstructured model output to GIS raster data. For my use case, I often require area weighted methods, which requires computing overlap between source and destination meshes. I haven't really found a suitable package, e.g. xemsf looks very useful, but doesn't have quite the methods I'd want. Also troubling is that I can't really get it installed on Windows. Existing GIS routines for vector data are far too slow (they have to take too many special cases into account, I think), see some benchmarks here. I think numba provides an interesting opportunity here, not just for ease of development and distribution, but also because of JIT-compilation, the user can provide their own regridding methods. E.g. ```python import numba as nb import numpy as np def median(values, weights): return np.nanpercentile(values, 50) class Regridder: def init(self, source, destination): # compute and store weights # ...
``` The best way to find overlapping cells seems via a cell tree, as implemented here, and requires searching for bounding boxes rather than points. I've ported the CellTree2D to numba, and added a bounding box search. The serial version is a little bit slower than C++ for point searches, but with numba's Interpolation on an unstructured grids can probably be handled by pyinterp, and can be made available via the regridder. (Scipy interpolate takes too much memory in my experience for large meshes.) API proposal```python class Regridder: # Basically same API as xemsf def init(self, source: xr.Dataset, destination: xr.Dataset) -> None: pass
xugrid.triangulate(self) -> xr.Dataset: xugrid.rasterize(self, resolution: Mapping) -> xr.Dataset: xugrid.reproject(self, dst_crs, src_crs) -> xr.Dataset: # just offload all coordinates to pyproj? xugrid.sel(self, indexers: Mapping) -> xr.Dataset: xugrid.isel(self, indexers: Mapping) -> xr.Dataset: xugrid.plot(self) -> None: # calls triangulate if needed xugrid.plot.imshow(self) -> None: # maybe calls rasterize first, provides a quicker peek xugrid.slice_vertical(self, linestring) -> xr.Dataset: # traces one or more rays through the mesh xugrid.check_mesh(self) -> None: # check for convexity, all indices present, etc. xugrid.connectivity_matrix(self) -> scipy.sparse? xugrid.reverse_cuthill_mckee(self) -> xr.Dataset: # use scipy.sparse.csgraph xugrid.binary_erosion(self, iterations: int) -> xr.DataArray: xugrid.binary_dilation(self, iterations: int) -> xr.DataArray: xugrid.meshfix(self) # call pymeshfix xugrid.to_pyvista(self) -> pyvista.UnstructuredGrid: # create a pyvista (vtk) mesh ``` Most of these methods require dealing with numpy arrays (or numpy arrays representing sparse arrays), so I think numba is very suitable for the algorithms (if they don't already exist). For most methods, 3D meshes would be possible too. It requires some changes in e.g. the CellTree, and plotting would first require a Some methods might not be very needed, I'm basing it off of my impression what I currently use xarray for in combination with some other tools that assume structured data. Resources
(Have to check for non-permissive / copyleft licenses...) Thoughts?Does this sound useful? Does it belong somewhere else / does it already exist? (I'm aware a number e.g. the tree structures are available in vtk as well, but the Python API unfortunately doesn't expose them; having a numba version also makes changes a lot easier.) Something that popped up: Some of the algorithms benefit from some persistent data structures (e.g. a cell tree or connectivity matrix), can these be maintained via an xarray-extension? I'm assuming mesh topologies that fit within memory, that seems challenging enough for now... |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4222/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
386596872 | MDU6SXNzdWUzODY1OTY4NzI= | 2587 | DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0 | Huite 13662783 | open | 0 | 3 | 2018-12-02T20:34:36Z | 2022-04-18T16:06:12Z | CONTRIBUTOR | Code Sample```python import xarray as xr import numpy as np from datetime import datetime time = [np.datetime64(datetime.strptime("10000101", "%Y%m%d"))] print(time[0]) print(np.dtype(time[0])) da = xr.DataArray(time, ("time",), {"time":time})
print(da)
<xarray.DataArray (time: 1)> array(['2169-02-08T23:09:07.419103232'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2169-02-08T23:09:07.419103232 ``` Problem descriptionI was happy to see
However, it seems that the DataArray constructor does not use Expected OutputI think I'd expect (But perhaps this was already on your radar, and am I just a little too eager!) Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2587/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
601224688 | MDU6SXNzdWU2MDEyMjQ2ODg= | 3977 | xr.full_like (often) fails when other is chunked and fill_value is non-scalar | Huite 13662783 | closed | 0 | 1 | 2020-04-16T16:23:59Z | 2020-04-24T07:15:44Z | 2020-04-24T07:15:44Z | CONTRIBUTOR | I've been running into some issues when using Now, I just checked, MCVE Code Sample
This results in an error:
Expected OutputExpected is a DataArray with the dimensions and coords of Problem DescriptionThe issue lies here: https://github.com/pydata/xarray/blob/2c77eb531b6689f9f1d2adbde0d8bf852f1f7362/xarray/core/common.py#L1420-L1436 Calling As one would expect, if I set It does fail on a similarly chunked dask array (since it's applying it for every chunk):
The most obvious solution would be to force it down the However, in all cases, they still broadcast automatically... ```python a = np.full((2, 2), [1, 2]
So kind of undefined behavior of a blocked VersionsOutput of `xr.show_versions()`INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.1 pandas: 0.25.3 numpy: 1.17.5 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.9.2 distributed: 2.10.0 matplotlib: 3.1.2 cartopy: None seaborn: 0.10.0 numbagg: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.3.4 IPython: 7.13.0 sphinx: 2.3.1 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3977/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
545762025 | MDU6SXNzdWU1NDU3NjIwMjU= | 3664 | Regression 0.14.0 -> 0.14.1: __setattr__ no longer updates matching index | Huite 13662783 | closed | 0 | 2 | 2020-01-06T14:42:50Z | 2020-01-06T16:03:16Z | 2020-01-06T15:37:17Z | CONTRIBUTOR | MCVE Code SampleThis came up in the following (simplified) case: ```python import numpy as np import xarray as xr x = [1, 2, 3, 4] data = [1., 2., 3. , 4.] dims = ["x"] coords = {"x": x} da = xr.DataArray(data, coords, dims) da1 = da.sel(x=[1, 3]) da2 = da.sel(x=[2, 4]) da2["x"].values = da1["x"].values da3 = da2 - da1 print(da3) ``` In 0.14.1, this results in an empty DataArray 0.14.0 (and before), no issues:
Interestingly, the values of However, the index has not been updated, Expected OutputI'm expecting output as in 0.14.0 and before. I've briefly stepped through the code for Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3664/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
354298235 | MDU6SXNzdWUzNTQyOTgyMzU= | 2383 | groupby().apply() on variable with NaNs raises IndexError | Huite 13662783 | closed | 0 | 1 | 2018-08-27T12:27:06Z | 2019-10-28T23:46:41Z | 2019-10-28T23:46:41Z | CONTRIBUTOR | Code Sample```python import xarray as xr import numpy as np def standardize(x): return (x - x.mean()) / x.std() ds = xr.Dataset() ds["variable"] = xr.DataArray(np.random.rand(4,3,5), {"lat":np.arange(4), "lon":np.arange(3), "time":np.arange(5)}, ("lat", "lon", "time"), ) ds["id"] = xr.DataArray(np.arange(12.0).reshape((4,3)), {"lat": np.arange(4), "lon":np.arange(3)}, ("lat", "lon"), ) ds["id"].values[0,0] = np.nan ds.groupby("id").apply(standardize) ``` Problem descriptionThis results in an IndexError. This is mildly confusing, it took me a little while to figure out the NaN's were to blame. I'm guessing the NaN doesn't get filtered out everywhere. The traceback: ``` IndexError Traceback (most recent call last) <ipython-input-2-267ba57bc264> in <module>() 15 ds["id"].values[0,0] = np.nan 16 ---> 17 ds.groupby("id").apply(standardize) C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in apply(self, func, kwargs) 607 kwargs.pop('shortcut', None) # ignore shortcut if set (for now) 608 applied = (func(ds, kwargs) for ds in self._iter_grouped()) --> 609 return self._combine(applied) 610 611 def _combine(self, applied): C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _combine(self, applied) 614 coord, dim, positions = self._infer_concat_args(applied_example) 615 combined = concat(applied, dim) --> 616 combined = _maybe_reorder(combined, dim, positions) 617 if coord is not None: 618 combined[coord.name] = coord C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _maybe_reorder(xarray_obj, dim, positions) 428 429 def _maybe_reorder(xarray_obj, dim, positions): --> 430 order = _inverse_permutation_indices(positions) 431 432 if order is None: C:\Miniconda3\envs\main\lib\site-packages\xarray\core\groupby.py in _inverse_permutation_indices(positions) 109 positions = [np.arange(sl.start, sl.stop, sl.step) for sl in positions] 110 --> 111 indices = nputils.inverse_permutation(np.concatenate(positions)) 112 return indices 113 C:\Miniconda3\envs\main\lib\site-packages\xarray\core\nputils.py in inverse_permutation(indices) 52 # use intp instead of int64 because of windows :( 53 inverse_permutation = np.empty(len(indices), dtype=np.intp) ---> 54 inverse_permutation[indices] = np.arange(len(indices), dtype=np.intp) 55 return inverse_permutation 56 IndexError: index 11 is out of bounds for axis 0 with size 11 ``` Expected OutputMy assumption was that it would throw out the values that fall within the NaN group, like ```python import pandas as pd import numpy as np df = pd.DataFrame() df["var"] = np.random.rand(10) df["id"] = np.arange(10) df["id"].iloc[0:2] = np.nan df.groupby("id").mean() ``` Out:
Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2383/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
441341340 | MDU6SXNzdWU0NDEzNDEzNDA= | 2947 | xr.merge always sorts indexes ascending | Huite 13662783 | open | 0 | 2 | 2019-05-07T17:06:06Z | 2019-05-07T21:07:26Z | CONTRIBUTOR | Code Sample, a copy-pastable example if possible```python import xarray as xr import numpy as np nrow, ncol = (4, 5) dx, dy = (1.0, -1.0) xmins = (0.0, 3.0, 3.0, 0.0) xmaxs = (5.0, 8.0, 8.0, 5.0) ymins = (0.0, 2.0, 0.0, 2.0) ymaxs = (4.0, 6.0, 4.0, 6.0) data = np.ones((nrow, ncol), dtype=np.float64) das = [] for xmin, xmax, ymin, ymax in zip(xmins, xmaxs, ymins, ymaxs): kwargs = dict( name="example", dims=("y", "x"), coords={"y": np.arange(ymax, ymin, dy), "x": np.arange(xmin, xmax, dx)}, ) das.append(xr.DataArray(data, **kwargs)) xr.merge(das) This won't flip the coordinate:xr.merge([das[0])) ``` Problem descriptionLet's say I have a number of geospatial grids that I'd like to merge (for example, loaded with
Expected OutputI think the expected output for these geospatial grids is that; if you provide only DataArrays with positive dx, negative dy; that the merged result comes out with a positive dx and a negative dy as well. When the DataArrays to merge are mixed in coordinate direction (some with ascending, some with descending coordinate values), defaulting to an ascending sort seems sensible. A suggestionI saw that the sort is occurring here, in pandas; and that there's a I think this could work (it solves my issue at least), in xarray.core.alignment
For reference this is what it looks like now:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2947/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
365854029 | MDU6SXNzdWUzNjU4NTQwMjk= | 2454 | Regression from 0.10.8 to 0.10.9, "shape mismatch" IndexError in reindex_like() after open_rasterio().squeeze() | Huite 13662783 | closed | 0 | 1 | 2018-10-02T11:28:16Z | 2018-10-08T18:17:02Z | 2018-10-08T18:17:02Z | CONTRIBUTOR | Code Sample```python import xarray as xr import numpy as np import rasterio from affine import Affine def write_tif(path, epsg): nrow = 5 ncol = 8 values = 10.0 * np.random.rand(nrow, ncol)
write_tif("basic.tif", epsg=28992) da = xr.open_rasterio("basic.tif").squeeze("band", drop=True) likevalues = np.empty((10, 16)) likecoords = {"y": np.arange(0.25, 5.0, 0.5), "x": np.arange(0.25, 8.0, 0.5)} likedims = ("y", "x") like = xr.DataArray(likevalues, likecoords, likedims) newda = da.reindex_like(like) ``` Problem descriptionThe code above executes without issues in xarray version 0.10.8. However, it results in a "shape-mismatch" error in 0.10.9: ``` Traceback (most recent call last): File "<ipython-input-57-f099072e9750>", line 8, in <module> newda = da.reindex_like(like) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\dataarray.py", line 919, in reindex_like **indexers) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\dataarray.py", line 969, in reindex indexers=indexers, method=method, tolerance=tolerance, copy=copy) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\dataset.py", line 1924, in reindex tolerance, copy=copy) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\alignment.py", line 377, in reindex_variables new_var = var._getitem_with_mask(key) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\variable.py", line 655, in _getitem_with_mask data = duck_array_ops.where(mask, fill_value, data) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py", line 183, in where return _where(condition, *as_shared_dtype([x, y])) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py", line 115, in as_shared_dtype arrays = [asarray(x) for x in scalars_or_arrays] File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py", line 115, in <listcomp> arrays = [asarray(x) for x in scalars_or_arrays] File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\duck_array_ops.py", line 110, in asarray return data if isinstance(data, dask_array_type) else np.asarray(data) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py", line 624, in array self._ensure_cached() File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py", line 621, in _ensure_cached self.array = NumpyIndexingAdapter(np.asarray(self.array)) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py", line 602, in array return np.asarray(self.array, dtype=dtype) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\numpy\core\numeric.py", line 492, in asarray return array(a, dtype, copy=False, order=order) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py", line 508, in array return np.asarray(array[self.key], dtype=None) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\backends\rasterio_.py", line 119, in getitem key, self.shape, indexing.IndexingSupport.OUTER, self._getitem) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\core\indexing.py", line 776, in explicit_indexing_adapter result = raw_indexing_method(raw_key.tuple) File "d:\bootsma\miniconda3\envs\main\lib\site-packages\xarray\backends\rasterio_.py", line 115, in _getitem return out[np_inds] IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (10,) (16,) ``` I only get it to occur when loading a raster using
The "shape mismatch" makes me think that maybe the newda = da.reindex_like(like) ``` So it looks to me like some lazy evaluation is going awry when ```Python write_tif("basic.tif", epsg=28992) da = xr.open_rasterio("basic.tif").compute().squeeze("band", drop=True) likevalues = np.empty((10, 16)) likecoords = {"y": np.arange(0.25, 5.0, 0.5), "x": np.arange(0.25, 8.0, 0.5)} likedims = ("y", "x") like = xr.DataArray(likevalues, likecoords, likedims) newda = da.reindex_like(like) ``` Interestingly, I can't easily inspect it, because inspecting makes the problem go away! Expected OutputA succesfully reindexed DataArray. Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2454/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);