issues
5 rows where repo = 13221727, state = "open" and user = 13662783 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, created_at (date), updated_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1173497454 | I_kwDOAMm_X85F8iZu | 6377 | [FEATURE]: Add a replace method | Huite 13662783 | open | 0 | 8 | 2022-03-18T11:46:37Z | 2023-06-25T07:52:46Z | CONTRIBUTOR | Is your feature request related to a problem?If I have a DataArray of values:
There's no easy way like pandas (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.replace.html) to do this. (Apologies if I've missed related issues, searching for "replace" gives many hits as the word is obviously used quite often.) Describe the solution you'd like
I've had a try at a relatively efficient implementation below. I'm wondering whether it's a worthwhile addition to xarray? Describe alternatives you've consideredIgnoring issues such as dealing with NaNs, chunks, etc., a simple dict lookup:
Alternatively, leveraging pandas:
But I also tried my hand at a custom implementation, letting
``` Such an approach seems like it's consistently the fastest: ```python da = xr.DataArray(np.random.randint(0, 100, 100_000)) to_replace = np.random.choice(np.arange(100), 10, replace=False) value = to_replace * 200 test1 = custom_replace(da, to_replace, value) test2 = pandas_replace(da, to_replace, value) test3 = dict_replace(da, to_replace, value) assert test1.equals(test2) assert test1.equals(test3) 6.93 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit custom_replace(da, to_replace, value) 9.37 ms ± 212 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)%timeit pandas_replace(da, to_replace, value) 26.8 ms ± 1.59 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit dict_replace(da, to_replace, value) ``` With the advantage growing the number of values involved: ```python da = xr.DataArray(np.random.randint(0, 10_000, 100_000)) to_replace = np.random.choice(np.arange(10_000), 10_000, replace=False) value = to_replace * 200 test1 = custom_replace(da, to_replace, value) test2 = pandas_replace(da, to_replace, value) test3 = dict_replace(da, to_replace, value) assert test1.equals(test2) assert test1.equals(test3) 21.6 ms ± 990 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit custom_replace(da, to_replace, value) 3.12 s ± 574 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)%timeit pandas_replace(da, to_replace, value) 42.7 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)%timeit dict_replace(da, to_replace, value) ``` In my real-life example, with a DataArray of approx 110 000 elements, with 60 000 values to replace, the custom one takes 33 ms, the dict one takes 135 ms, while pandas takes 26 s (!). Additional contextIn all cases, we need dealing with NaNs, checking the input, etc.: ```python def replace(da: xr.DataArray, to_replace: Any, value: Any): from xarray.core.utils import is_scalar
``` It think it should be easy to use e.g. let it operate on the numpy arrays so e.g. apply_ufunc will work. The primary issue is whether values can be sorted; in such a case the dict lookup might be an okay fallback? I've had a peek at the pandas implementation, but didn't become much wiser. Anyway, for your consideration! I'd be happy to submit a PR. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6377/reactions", "total_count": 9, "+1": 9, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
445745470 | MDExOlB1bGxSZXF1ZXN0MjgwMTIwNzIz | 2972 | ENH: Preserve monotonic descending index order when merging | Huite 13662783 | open | 0 | 4 | 2019-05-18T19:12:11Z | 2022-06-09T14:50:17Z | CONTRIBUTOR | 0 | pydata/xarray/pulls/2972 |
CommentsI was doing some work and I kept running into the issue described at #2947, so I had a try at a fix. It was somewhat of a hassle to understand the issue because I kept running into seeming inconsistencies. This is caused by the fact that the joiner doesn't sort with a single index:
I also noticed that an outer join gets called with e.g. an It's just checking for the specific case now, but it feels like an very specific issue anyway... The merge behavior is slightly different now, which is reflected in the updated test outcomes in |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2972/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
620468256 | MDU6SXNzdWU2MjA0NjgyNTY= | 4076 | Zarr ZipStore versus DirectoryStore: ZipStore requires .close() | Huite 13662783 | open | 0 | 4 | 2020-05-18T19:58:21Z | 2022-04-28T22:37:48Z | CONTRIBUTOR | I was saving my dataset into a ZipStore -- apparently succesfully -- but then I couldn't reopen them. The issue appears to be that a regular DirectoryStore behaves a little differently: it doesn't need to be closed, while a ZipStore. (I'm not sure how this relates to #2586, the remarks there don't appear to be applicable anymore.) MCVE Code SampleThis errors: ```python import xarray as xr import zarr works as expectedds = xr.Dataset({'foo': [2,3,4], 'bar': ('x', [1, 2]), 'baz': 3.14}) ds.to_zarr(zarr.DirectoryStore("test.zarr")) print(xr.open_zarr(zarr.DirectoryStore("test.zarr"))) error with ValueError "group not found at path ''ds.to_zarr(zarr.ZipStore("test.zip")) print(xr.open_zarr(zarr.ZipStore("test.zip"))) ``` Calling close, or using ```python store = zarr.ZipStore("test2.zip") ds.to_zarr(store) store.close() print(xr.open_zarr(zarr.ZipStore("test2.zip"))) with zarr.ZipStore("test3.zip") as store: ds.to_zarr(store) print(xr.open_zarr(zarr.ZipStore("test3.zip"))) ``` Expected OutputI think it would be preferable to close the ZipStore in this case. But I might be missing something? Problem DescriptionBecause VersionsOutput of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 21:48:41) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 158 Stepping 9, GenuineIntel byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None libhdf5: 1.10.5 libnetcdf: 4.7.3 xarray: 0.15.2.dev41+g8415eefa.d20200419 pandas: 0.25.3 numpy: 1.17.5 scipy: 1.3.1 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: 2.10.0 Nio: None zarr: 2.4.0 cftime: 1.0.4.2 nc_time_axis: None PseudoNetCDF: None rasterio: 1.1.2 cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.14.0+23.gbea4c9a2 distributed: 2.14.0 matplotlib: 3.1.2 cartopy: None seaborn: 0.10.0 numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 20.0.2 conda: None pytest: 5.3.4 IPython: 7.13.0 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4076/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
386596872 | MDU6SXNzdWUzODY1OTY4NzI= | 2587 | DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0 | Huite 13662783 | open | 0 | 3 | 2018-12-02T20:34:36Z | 2022-04-18T16:06:12Z | CONTRIBUTOR | Code Sample```python import xarray as xr import numpy as np from datetime import datetime time = [np.datetime64(datetime.strptime("10000101", "%Y%m%d"))] print(time[0]) print(np.dtype(time[0])) da = xr.DataArray(time, ("time",), {"time":time})
print(da)
<xarray.DataArray (time: 1)> array(['2169-02-08T23:09:07.419103232'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2169-02-08T23:09:07.419103232 ``` Problem descriptionI was happy to see
However, it seems that the DataArray constructor does not use Expected OutputI think I'd expect (But perhaps this was already on your radar, and am I just a little too eager!) Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2587/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
441341340 | MDU6SXNzdWU0NDEzNDEzNDA= | 2947 | xr.merge always sorts indexes ascending | Huite 13662783 | open | 0 | 2 | 2019-05-07T17:06:06Z | 2019-05-07T21:07:26Z | CONTRIBUTOR | Code Sample, a copy-pastable example if possible```python import xarray as xr import numpy as np nrow, ncol = (4, 5) dx, dy = (1.0, -1.0) xmins = (0.0, 3.0, 3.0, 0.0) xmaxs = (5.0, 8.0, 8.0, 5.0) ymins = (0.0, 2.0, 0.0, 2.0) ymaxs = (4.0, 6.0, 4.0, 6.0) data = np.ones((nrow, ncol), dtype=np.float64) das = [] for xmin, xmax, ymin, ymax in zip(xmins, xmaxs, ymins, ymaxs): kwargs = dict( name="example", dims=("y", "x"), coords={"y": np.arange(ymax, ymin, dy), "x": np.arange(xmin, xmax, dx)}, ) das.append(xr.DataArray(data, **kwargs)) xr.merge(das) This won't flip the coordinate:xr.merge([das[0])) ``` Problem descriptionLet's say I have a number of geospatial grids that I'd like to merge (for example, loaded with
Expected OutputI think the expected output for these geospatial grids is that; if you provide only DataArrays with positive dx, negative dy; that the merged result comes out with a positive dx and a negative dy as well. When the DataArrays to merge are mixed in coordinate direction (some with ascending, some with descending coordinate values), defaulting to an ascending sort seems sensible. A suggestionI saw that the sort is occurring here, in pandas; and that there's a I think this could work (it solves my issue at least), in xarray.core.alignment
For reference this is what it looks like now:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2947/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);