issues
896 rows where user = 1217238 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: milestone, comments, draft, state_reason, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2266174558 | I_kwDOAMm_X86HExRe | 8975 | Xarray sponsorship guidelines | shoyer 1217238 | open | 0 | 3 | 2024-04-26T17:05:01Z | 2024-04-30T20:52:33Z | MEMBER | At what level of support should Xarray acknowledge sponsors on our website?I would like to surface this for open discussion because there are potential sponsoring organizations with conflicts of interest with members of Xarray's leadership team (e.g., Earthmover, which employs @jhamman, @rabernat and @dcherian). My suggestion is to use NumPy's guidelines, with an adjustment down to 1/3 of the thresholds to account for the smaller size of the project:
The NumPy guidelines also include a grace period of a minimum of 6 months for acknowledging support. I would suggest increasing this to a minimum of 1 year for Xarray. I would greatly appreciate any feedback from members of the community, either in this issue or on the next team meeting. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8975/reactions", "total_count": 6, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 1, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
271043420 | MDU6SXNzdWUyNzEwNDM0MjA= | 1689 | Roundtrip serialization of coordinate variables with spaces in their names | shoyer 1217238 | open | 0 | 5 | 2017-11-03T16:43:20Z | 2024-03-22T14:02:48Z | MEMBER | If coordinates have spaces in their names, they get restored from netCDF files as data variables instead: ```
This happens because the CF convention is to indicate coordinates as a space separated string, e.g., Even though these aren't CF compliant variable names (which cannot have strings) It would be nice to have an ad-hoc convention for xarray that allows us to serialize/deserialize coordinates in all/most cases. Maybe we could use escape characters for spaces (e.g., At the very least, we should issue a warning in these cases. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1689/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
267542085 | MDU6SXNzdWUyNjc1NDIwODU= | 1647 | Representing missing values in string arrays on disk | shoyer 1217238 | closed | 0 | 3 | 2017-10-23T05:01:10Z | 2024-02-06T13:03:40Z | 2024-02-06T13:03:40Z | MEMBER | This came up as part of my clean-up of serializing unicode strings in https://github.com/pydata/xarray/pull/1648. There are two ways to represent strings in netCDF files.
Currently, by default (if no For character arrays, we could use the normal In [11]: ds Out[11]: <xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: foo (x) object b'bar' nan In [12]: ds.to_netcdf('foobar.nc') In [13]: xr.open_dataset('foobar.nc').load() Out[13]: <xarray.Dataset> Dimensions: (x: 2) Dimensions without coordinates: x Data variables: foo (x) object b'bar' nan ``` For variable length strings, it currently isn't possible to set a fill-value. So there's no good way to indicate missing values, though this may change if the future depending on the resolution of the netCDF-python issue. It would obviously be nice to always automatically round-trip missing values, both for strings and bytes. I see two possible ways to do this:
1. Require setting an explicit The default option is to adopt neither of these, and keep the current behavior where missing values are written as empty strings and not decoded at all. Any opinions? I am leaning towards option (2). |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1647/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
842436143 | MDU6SXNzdWU4NDI0MzYxNDM= | 5081 | Lazy indexing arrays as a stand-alone package | shoyer 1217238 | open | 0 | 6 | 2021-03-27T07:06:03Z | 2023-12-15T13:20:03Z | MEMBER | From @rabernat on Twitter:
The idea here is create a first-class "duck array" library for lazy indexing that could replace xarray's internal classes for lazy indexing. This would be in some ways similar to dask.array, but much simpler, because it doesn't have to worry about parallel computing. Desired features:
A common feature of these operations is they can (and almost always should) be fused with indexing: if N elements are selected via indexing, only O(N) compute and memory is required to produce them, regards of the size of the original arrays as long as the number of applied operations can be treated as a constant. Memory access is significantly slower than compute on modern hardware, so recomputing these operations on the fly is almost always a good idea. Out of scope: lazy computation when indexing could require access to many more elements to compute the desired value than are returned. For example, This is valuable functionality for Xarray for two reasons:
Related issues:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5081/reactions", "total_count": 6, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
197939448 | MDU6SXNzdWUxOTc5Mzk0NDg= | 1189 | Document using a spawning multiprocessing pool for multiprocessing with dask | shoyer 1217238 | closed | 0 | 3 | 2016-12-29T01:21:50Z | 2023-12-05T21:51:04Z | 2023-12-05T21:51:04Z | MEMBER | This is a nice option for working with in-file HFD5/netCDF4 compression: https://github.com/pydata/xarray/pull/1128#issuecomment-261936849 Mixed multi-threading/multi-processing could also be interesting, if anyone wants to revive that: https://github.com/dask/dask/pull/457 (I think it would work now that xarray data stores are pickle-able) CC @mrocklin |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1189/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
430188626 | MDU6SXNzdWU0MzAxODg2MjY= | 2873 | Dask distributed tests fail locally | shoyer 1217238 | closed | 0 | 3 | 2019-04-07T20:26:53Z | 2023-12-05T21:43:02Z | 2023-12-05T21:43:02Z | MEMBER | I'm not sure why, but when I run the integration tests with dask-distributed locally (on my MacBook pro), they fail: ``` $ pytest xarray/tests/test_distributed.py --maxfail 1 ================================================ test session starts ================================================= platform darwin -- Python 3.7.2, pytest-4.0.1, py-1.7.0, pluggy-0.8.0 rootdir: /Users/shoyer/dev/xarray, inifile: setup.cfg plugins: repeat-0.7.0 collected 19 items xarray/tests/test_distributed.py F ====================================================== FAILURES ====================================================== ____ test_dask_distributed_netcdf_roundtrip[netcdf4-NETCDF3_CLASSIC] _______ loop = <tornado.platform.asyncio.AsyncIOLoop object at 0x1c182da1d0> tmp_netcdf_filename = '/private/var/folders/15/qdcz0wqj1t9dg40m_ld0fjkh00b4kd/T/pytest-of-shoyer/pytest-3/test_dask_distributed_netcdf_r0/testfile.nc' engine = 'netcdf4', nc_format = 'NETCDF3_CLASSIC'
xarray/tests/test_distributed.py:87: ../../miniconda3/envs/xarray-py37/lib/python3.7/contextlib.py:119: in exit next(self.gen) nworkers = 2, nanny = False, worker_kwargs = {}, active_rpc_timeout = 1, scheduler_kwargs = {}
../../miniconda3/envs/xarray-py37/lib/python3.7/site-packages/distributed/utils_test.py:721: AssertionError ------------------------------------------------ Captured stderr call ------------------------------------------------ distributed.scheduler - INFO - Clear task state distributed.scheduler - INFO - Scheduler at: tcp://127.0.0.1:51715 distributed.worker - INFO - Start worker at: tcp://127.0.0.1:51718 distributed.worker - INFO - Listening to: tcp://127.0.0.1:51718 distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 1 distributed.worker - INFO - Memory: 17.18 GB distributed.worker - INFO - Local Directory: /Users/shoyer/dev/xarray/_test_worker-5cabd1b7-4d9c-49eb-a79e-205c588f5dae/worker-n8uv72yx distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Start worker at: tcp://127.0.0.1:51720 distributed.worker - INFO - Listening to: tcp://127.0.0.1:51720 distributed.worker - INFO - Waiting to connect to: tcp://127.0.0.1:51715 distributed.scheduler - INFO - Register tcp://127.0.0.1:51718 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 1 distributed.worker - INFO - Memory: 17.18 GB distributed.worker - INFO - Local Directory: /Users/shoyer/dev/xarray/_test_worker-71a426d4-bd34-4808-9d33-79cac2bb4801/worker-a70rlf4r distributed.worker - INFO - ------------------------------------------------- distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51718 distributed.core - INFO - Starting established connection distributed.worker - INFO - Registered to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Register tcp://127.0.0.1:51720 distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:51720 distributed.core - INFO - Starting established connection distributed.worker - INFO - Registered to: tcp://127.0.0.1:51715 distributed.worker - INFO - ------------------------------------------------- distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Receive client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.core - INFO - Starting established connection distributed.core - INFO - Event loop was unresponsive in Worker for 1.05s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability. distributed.scheduler - INFO - Receive client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.core - INFO - Starting established connection distributed.core - INFO - Event loop was unresponsive in Worker for 1.33s. This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability. distributed.scheduler - INFO - Receive client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.core - INFO - Starting established connection distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.scheduler - INFO - Remove client Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-59a7918c-5972-11e9-912a-8c85907bce57 distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51720 distributed.worker - INFO - Stopping worker at tcp://127.0.0.1:51718 distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51720 distributed.core - INFO - Removing comms to tcp://127.0.0.1:51720 distributed.scheduler - INFO - Remove worker tcp://127.0.0.1:51718 distributed.core - INFO - Removing comms to tcp://127.0.0.1:51718 distributed.scheduler - INFO - Lost all workers distributed.scheduler - INFO - Remove client Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.scheduler - INFO - Remove client Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-worker-5b2496d8-5972-11e9-9137-8c85907bce57 distributed.scheduler - INFO - Close client connection: Client-worker-5a5c81de-5972-11e9-9136-8c85907bce57 distributed.scheduler - INFO - Scheduler closing... distributed.scheduler - INFO - Scheduler closing all comms ``` Version info: ``` In [2]: xarray.show_versions() INSTALLED VERSIONScommit: 2ce0639ee2ba9c7b1503356965f77d847d6cfcdf python: 3.7.2 (default, Dec 29 2018, 00:00:04) [Clang 4.0.1 (tags/RELEASE_401/final)] python-bits: 64 OS: Darwin OS-release: 18.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.2 xarray: 0.12.1+4.g2ce0639e pandas: 0.24.0 numpy: 1.15.4 scipy: 1.1.0 netCDF4: 1.4.3.2 pydap: None h5netcdf: 0.7.0 h5py: 2.9.0 Nio: None zarr: 2.2.0 cftime: 1.0.3.4 nc_time_axis: None PseudonetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.2.1 dask: 1.1.5 distributed: 1.26.1 matplotlib: 3.0.2 cartopy: 0.17.0 seaborn: 0.9.0 setuptools: 40.0.0 pip: 18.0 conda: None pytest: 4.0.1 IPython: 6.5.0 sphinx: 1.8.2 ``` @mrocklin does this sort of error look familiar to you? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2873/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
not_planned | xarray 13221727 | issue | ||||||
707647715 | MDExOlB1bGxSZXF1ZXN0NDkyMDEzODg4 | 4453 | Simplify and restore old behavior for deep-copies | shoyer 1217238 | closed | 0 | 3 | 2020-09-23T20:10:33Z | 2023-09-14T03:06:34Z | 2023-09-14T03:06:33Z | MEMBER | 1 | pydata/xarray/pulls/4453 | Intended to fix https://github.com/pydata/xarray/issues/4449 The goal is to restore behavior to match what we had prior to https://github.com/pydata/xarray/pull/4379 for all types of Needs tests!
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4453/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
588105641 | MDU6SXNzdWU1ODgxMDU2NDE= | 3893 | HTML repr in the online docs | shoyer 1217238 | open | 0 | 3 | 2020-03-26T02:17:51Z | 2023-09-11T17:41:59Z | MEMBER | I noticed two minor issues in our online docs, now that we've switched to the hip new HTML repr by default.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3893/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1376109308 | I_kwDOAMm_X85SBcL8 | 7045 | Should Xarray stop doing automatic index-based alignment? | shoyer 1217238 | open | 0 | 13 | 2022-09-16T15:31:03Z | 2023-08-23T07:42:34Z | MEMBER | What is your issue?I am increasingly thinking that automatic index-based alignment in Xarray (copied from pandas) may have been a design mistake. Almost every time I work with datasets with different indexes, I find myself writing code to explicitly align them:
Would it be insane to consider changing Xarray's behavior to stop doing automatic alignment? I imagine we could roll this out slowly, first with warnings and then with an option for disabling it. If you think this is a good or bad idea, consider responding to this issue with a 👍 or 👎 reaction. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7045/reactions", "total_count": 13, "+1": 9, "-1": 2, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 2 } |
xarray 13221727 | issue | ||||||||
342928718 | MDExOlB1bGxSZXF1ZXN0MjAyNzE0MjUx | 2302 | WIP: lazy=True in apply_ufunc() | shoyer 1217238 | open | 0 | 1 | 2018-07-20T00:01:21Z | 2023-07-18T04:19:17Z | MEMBER | 0 | pydata/xarray/pulls/2302 |
Still needs more tests and documentation. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2302/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
1767947798 | PR_kwDOAMm_X85TkPzV | 7933 | Update calendar for developers meeting | shoyer 1217238 | closed | 0 | 0 | 2023-06-21T16:09:44Z | 2023-06-21T17:56:22Z | 2023-06-21T17:56:22Z | MEMBER | 0 | pydata/xarray/pulls/7933 | The old calendar was on @jhamman's UCAR account, which he no longer has access to! |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7933/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
479942077 | MDU6SXNzdWU0Nzk5NDIwNzc= | 3213 | How should xarray use/support sparse arrays? | shoyer 1217238 | open | 0 | 55 | 2019-08-13T03:29:42Z | 2023-06-07T15:43:55Z | MEMBER | I'm looking forward to being easily able to create sparse xarray objects from pandas: https://github.com/pydata/xarray/issues/3206 Are there other xarray APIs that could make good use of sparse arrays, or could make sparse arrays easier to use? Some ideas:
- |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3213/reactions", "total_count": 14, "+1": 14, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1465287257 | I_kwDOAMm_X85XVoJZ | 7325 | Support reading Zarr data via TensorStore | shoyer 1217238 | open | 0 | 1 | 2022-11-27T00:12:17Z | 2023-05-11T01:24:27Z | MEMBER | What is your issue?TensorStore is another high performance API for reading distributed arrays in formats such as Zarr, written in C++. It could be interesting to write an Xarray storage backend using TensorStore as an alternative way to read Zarr files. As an exercise, I make a little demo of doing this: https://gist.github.com/shoyer/5b0c485979cc9c36a9685d8cf8e94565 I have not tested it for performance. The main annoyance is that TensorStore doesn't understand Zarr groups or Zarr array attributes, so I needed to write my own helpers for reading this metadata. Also, there's a bit of an impedance mis-match between TensorStore (where everything returns futures) and Xarray (which assumes that indexing results in numpy arrays). This could likely be improved with some amount of effort -- in particular https://github.com/pydata/xarray/pull/6874/files should help. CC @jbms who may have better ideas about how to use the TensorStore API. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7325/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
253395960 | MDU6SXNzdWUyNTMzOTU5NjA= | 1533 | Index variables loaded from dask can be computed twice | shoyer 1217238 | closed | 0 | 6 | 2017-08-28T17:18:27Z | 2023-04-06T04:15:46Z | 2023-04-06T04:15:46Z | MEMBER | as reported by @crusaderky in #1522 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1533/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
209653741 | MDU6SXNzdWUyMDk2NTM3NDE= | 1285 | FAQ page could use some updating | shoyer 1217238 | open | 0 | 1 | 2017-02-23T03:29:16Z | 2023-03-26T16:32:44Z | MEMBER | Along the same lines as https://github.com/pydata/xarray/issues/1282, we haven't done much updating for frequently asked questions -- it's mostly still the original handful of FAQ entries I wrote in the first version of the docs. Topics worth addressing:
(please add suggestions for this list!) StackOverflow may be a helpful reference here: http://stackoverflow.com/questions/tagged/python-xarray?sort=votes&pageSize=50 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1285/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
176805500 | MDU6SXNzdWUxNzY4MDU1MDA= | 1004 | Remove IndexVariable.name | shoyer 1217238 | open | 0 | 3 | 2016-09-14T03:27:43Z | 2023-03-11T19:57:40Z | MEMBER | As discussed in #947, we should remove the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1004/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
98587746 | MDU6SXNzdWU5ODU4Nzc0Ng== | 508 | Ignore missing variables when concatenating datasets? | shoyer 1217238 | closed | 0 | 8 | 2015-08-02T06:03:57Z | 2023-01-20T16:04:28Z | 2023-01-20T16:04:28Z | MEMBER | Several users (@raj-kesavan, @richardotis, now myself) have wondered about how to concatenate xray Datasets with different variables. With the current This would also be more consistent with |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/508/reactions", "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
895983112 | MDExOlB1bGxSZXF1ZXN0NjQ4MTM1NTcy | 5351 | Add xarray.backends.NoMatchingEngineError | shoyer 1217238 | open | 0 | 4 | 2021-05-19T22:09:21Z | 2022-11-16T15:19:54Z | MEMBER | 0 | pydata/xarray/pulls/5351 |
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5351/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
803068773 | MDExOlB1bGxSZXF1ZXN0NTY5MDU5MTEz | 4879 | Cache files for different CachingFileManager objects separately | shoyer 1217238 | closed | 0 | 10 | 2021-02-07T21:48:06Z | 2022-10-18T16:40:41Z | 2022-10-18T16:40:40Z | MEMBER | 0 | pydata/xarray/pulls/4879 | This means that explicitly opening a file multiple times with
If users want to reuse the cached file, they can reuse the same xarray object. We don't need this for handling many files in Dask (the original motivation for caching), because in those cases only a single CachingFileManager is created. I think this should some long-standing usability issues: #4240, #4862 Conveniently, this also obviates the need for some messy reference counting logic.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4879/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
623804131 | MDU6SXNzdWU2MjM4MDQxMzE= | 4090 | Error with indexing 2D lat/lon coordinates | shoyer 1217238 | closed | 0 | 2 | 2020-05-24T06:19:45Z | 2022-09-28T12:06:03Z | 2022-09-28T12:06:03Z | MEMBER | ``` filslp = "ChonghuaYinData/prmsl.mon.mean.nc" filtmp = "ChonghuaYinData/air.sig995.mon.mean.nc" filprc = "ChonghuaYinData/precip.mon.mean.nc" ds_slp = xr.open_dataset(filslp).sel(time=slice(str(yrStrt)+'-01-01', str(yrLast)+'-12-31')) ds_slp
``` yrStrt = 1950 # manually specify for convenience yrLast = 2018 # 20th century ends 2018 clStrt = 1950 # reference climatology for SOI clLast = 1979 yrStrtP = 1979 # 1st year GPCP yrLastP = yrLast # match 20th century latT = -17.6 # Tahiti
lonT = 210.75 select grids of T and DT = ds_slp.sel(lat=latT, lon=lonT, method='nearest')
D = ds_slp.sel(lat=latD, lon=lonD, method='nearest')
ValueError Traceback (most recent call last) <ipython-input-27-6702b30f473f> in <module> 1 # select grids of T and D ----> 2 T = ds_slp.sel(lat=latT, lon=lonT, method='nearest') 3 D = ds_slp.sel(lat=latD, lon=lonD, method='nearest') ~\Anaconda3\lib\site-packages\xarray\core\dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs) 2004 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel") 2005 pos_indexers, new_indexes = remap_label_indexers( -> 2006 self, indexers=indexers, method=method, tolerance=tolerance 2007 ) 2008 result = self.isel(indexers=pos_indexers, drop=drop) ~\Anaconda3\lib\site-packages\xarray\core\coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs) 378 379 pos_indexers, new_indexes = indexing.remap_label_indexers( --> 380 obj, v_indexers, method=method, tolerance=tolerance 381 ) 382 # attach indexer's coordinate to pos_indexers ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance) 257 new_indexes = {} 258 --> 259 dim_indexers = get_dim_indexers(data_obj, indexers) 260 for dim, label in dim_indexers.items(): 261 try: ~\Anaconda3\lib\site-packages\xarray\core\indexing.py in get_dim_indexers(data_obj, indexers) 223 ] 224 if invalid: --> 225 raise ValueError("dimensions or multi-index levels %r do not exist" % invalid) 226 227 level_indexers = defaultdict(dict) ValueError: dimensions or multi-index levels ['lat', 'lon'] do not exist ``` Does any know how fix to this problem?Thank you very much. Originally posted by @JimmyGao0204 in https://github.com/pydata/xarray/issues/475#issuecomment-633172787 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4090/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1210147360 | I_kwDOAMm_X85IIWIg | 6504 | test_weighted.test_weighted_operations_nonequal_coords should avoid depending on random number seed | shoyer 1217238 | closed | 0 | shoyer 1217238 | 0 | 2022-04-20T19:56:19Z | 2022-08-29T20:42:30Z | 2022-08-29T20:42:30Z | MEMBER | What happened?In testing an upgrade to the latest version of xarray in our systems, I noticed this test failing: ``` def test_weighted_operations_nonequal_coords(): # There are no weights for a == 4, so that data point is ignored. weights = DataArray(np.random.randn(4), dims=("a",), coords=dict(a=[0, 1, 2, 3])) data = DataArray(np.random.randn(4), dims=("a",), coords=dict(a=[1, 2, 3, 4])) check_weighted_operations(data, weights, dim="a", skipna=None)
It appears that this test is hard-coded to match a particular random number seed, which in turn would fix the resutls of What did you expect to happen?Whenever possible, Xarray's own tests should avoid relying on particular random number generators, e.g., in this case we could specify random numbers instead. A back-up option would be to explicitly set random seed locally inside the tests, e.g., by creating a Minimal Complete Verifiable ExampleNo response Relevant log outputNo response Anything else we need to know?No response Environment... |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6504/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | |||||
1210267320 | I_kwDOAMm_X85IIza4 | 6505 | Dropping a MultiIndex variable raises an error after explicit indexes refactor | shoyer 1217238 | closed | 0 | 3 | 2022-04-20T22:07:26Z | 2022-07-21T14:46:58Z | 2022-07-21T14:46:58Z | MEMBER | What happened?With the latest released version of Xarray, it is possible to delete all variables corresponding to a MultiIndex by simply deleting the name of the MultiIndex. After the explicit indexes refactor (i.e,. using the "main" development branch) this now raises error about how this would "corrupt" index state. This comes up when using This is not hard to work around, but we may want to consider this bug a blocker for the next Xarray release. I found the issue surfaced in several projects when attempting to use the new version of Xarray inside Google's codebase. CC @benbovy in case you have any thoughts to share. What did you expect to happen?For now, we should preserve the behavior of deleting the variables corresponding to MultiIndex levels, but should issue a deprecation warning encouraging users to explicitly delete everything. Minimal Complete Verifiable Example```Python import xarray array = xarray.DataArray( [[1, 2], [3, 4]], dims=['x', 'y'], coords={'x': ['a', 'b']}, ) stacked = array.stack(z=['x', 'y']) print(stacked.drop('z')) print() print(stacked.assign_coords(z=[1, 2, 3, 4])) ``` Relevant log output```Python ValueError Traceback (most recent call last) Input In [1], in <cell line: 9>() 3 array = xarray.DataArray( 4 [[1, 2], [3, 4]], 5 dims=['x', 'y'], 6 coords={'x': ['a', 'b']}, 7 ) 8 stacked = array.stack(z=['x', 'y']) ----> 9 print(stacked.drop('z')) 10 print() 11 print(stacked.assign_coords(z=[1, 2, 3, 4])) File ~/dev/xarray/xarray/core/dataarray.py:2425, in DataArray.drop(self, labels, dim, errors, labels_kwargs)
2408 def drop(
2409 self,
2410 labels: Mapping = None,
(...)
2414 labels_kwargs,
2415 ) -> DataArray:
2416 """Backward compatible method based on File ~/dev/xarray/xarray/core/dataset.py:4590, in Dataset.drop(self, labels, dim, errors, **labels_kwargs)
4584 if dim is None and (is_scalar(labels) or isinstance(labels, Iterable)):
4585 warnings.warn(
4586 "dropping variables using File ~/dev/xarray/xarray/core/dataset.py:4549, in Dataset.drop_vars(self, names, errors) 4546 if errors == "raise": 4547 self._assert_all_in_dataset(names) -> 4549 assert_no_index_corrupted(self.xindexes, names) 4551 variables = {k: v for k, v in self._variables.items() if k not in names} 4552 coord_names = {k for k in self._coord_names if k in variables} File ~/dev/xarray/xarray/core/indexes.py:1394, in assert_no_index_corrupted(indexes, coord_names) 1392 common_names_str = ", ".join(f"{k!r}" for k in common_names) 1393 index_names_str = ", ".join(f"{k!r}" for k in index_coords) -> 1394 raise ValueError( 1395 f"cannot remove coordinate(s) {common_names_str}, which would corrupt " 1396 f"the following index built from coordinates {index_names_str}:\n" 1397 f"{index}" 1398 ) ValueError: cannot remove coordinate(s) 'z', which would corrupt the following index built from coordinates 'z', 'x', 'y': <xarray.core.indexes.PandasMultiIndex object at 0x148c95150> ``` Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: 33cdabd261b5725ac357c2823bd0f33684d3a954
python: 3.10.4 | packaged by conda-forge | (main, Mar 24 2022, 17:42:03) [Clang 12.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 21.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.18.3.dev137+g96c56836
pandas: 1.4.2
numpy: 1.22.3
scipy: 1.8.0
netCDF4: 1.5.8
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.11.3
cftime: 1.6.0
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.04.1
distributed: 2022.4.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2022.3.0
cupy: None
pint: None
sparse: None
setuptools: 62.1.0
pip: 22.0.4
conda: None
pytest: 7.1.1
IPython: 8.2.0
sphinx: None
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6505/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
168272291 | MDExOlB1bGxSZXF1ZXN0NzkzMjE2NTc= | 924 | WIP: progress toward making groupby work with multiple arguments | shoyer 1217238 | open | 0 | 16 | 2016-07-29T08:07:57Z | 2022-06-09T14:50:17Z | MEMBER | 0 | pydata/xarray/pulls/924 | Fixes #324 It definitely doesn't work properly yet, totally mixing up coordinates, data variables and multi-indexes (as shown by the failing tests). A simple example: ``` In [4]: coords = {'a': ('x', [0, 0, 1, 1]), 'b': ('y', [0, 0, 1, 1])} In [5]: square = xr.DataArray(np.arange(16).reshape(4, 4), coords=coords, dims=['x', 'y']) In [6]: square Out[6]: <xarray.DataArray (x: 4, y: 4)> array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) Coordinates: b (y) int64 0 0 1 1 a (x) int64 0 0 1 1 * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 In [7]: square.groupby(['a', 'b']).mean() Out[7]: <xarray.DataArray (a: 2, b: 2)> array([[ 2.5, 4.5], [ 10.5, 12.5]]) Coordinates: * a (a) int64 0 1 * b (b) int64 0 1 In [8]: square.groupby(['x', 'y']).mean() Out[8]: <xarray.DataArray (x: 4, y: 4)> array([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [ 12., 13., 14., 15.]]) Coordinates: * x (x) int64 0 1 2 3 * y (y) int64 0 1 2 3 ``` More examples: https://gist.github.com/shoyer/5cfa4d5751e8a78a14af25f8442ad8d5 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/924/reactions", "total_count": 4, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
711626733 | MDU6SXNzdWU3MTE2MjY3MzM= | 4473 | Wrap numpy-groupies to speed up Xarray's groupby aggregations | shoyer 1217238 | closed | 0 | 8 | 2020-09-30T04:43:04Z | 2022-05-15T02:38:29Z | 2022-05-15T02:38:29Z | MEMBER | Is your feature request related to a problem? Please describe. Xarray's groupby aggregations (e.g., Describe the solution you'd like We could speed things up considerably (easily 100x) by wrapping the numpy-groupies package. Additional context One challenge is how to handle dask arrays (and other duck arrays). In some cases it might make sense to apply the numpy-groupies function (using apply_ufunc), but in other cases it might be better to stick with the current indexing + concatenate solution. We could either pick some simple heuristics for choosing the algorithm to use on dask arrays, or could just stick with the current algorithm for now. In particular, it might make sense to stick with the current algorithm if there are a many chunks in the arrays to aggregated along the "grouped" dimension (depending on the size of the unique group values). |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4473/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
326205036 | MDU6SXNzdWUzMjYyMDUwMzY= | 2180 | How should Dataset.update() handle conflicting coordinates? | shoyer 1217238 | open | 0 | 16 | 2018-05-24T16:46:23Z | 2022-04-30T13:40:28Z | MEMBER | Recently, we updated In v0.10.3, both In v0.10.4, both I'm not sure this is the right behavior. In particular, in the case of Note that one advantage of the current logic (which is violated by my current fix in https://github.com/pydata/xarray/pull/2162), is that we maintain the invariant that |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2180/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
612918997 | MDU6SXNzdWU2MTI5MTg5OTc= | 4034 | Fix tight_layout warning on cartopy facetgrid docs example | shoyer 1217238 | open | 0 | 1 | 2020-05-05T21:54:46Z | 2022-04-30T12:37:50Z | MEMBER | Per the fix in https://github.com/pydata/xarray/pull/4032, I'm pretty sure we will soon start seeing a warning message printed on ReadTheDocs in Cartopy FacetGrid example: http://xarray.pydata.org/en/stable/plotting.html#maps This would be nice to fix for users, especially because it's likely users will see this warning when running code outside of our documentation, too. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4034/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
621123222 | MDU6SXNzdWU2MjExMjMyMjI= | 4081 | Wrap "Dimensions" onto multiple lines in xarray.Dataset repr? | shoyer 1217238 | closed | 0 | 4 | 2020-05-19T16:31:59Z | 2022-04-29T19:59:24Z | 2022-04-29T19:59:24Z | MEMBER | Here's an example dataset of a large dataset from @alimanfoo:
https://nbviewer.jupyter.org/gist/alimanfoo/b74b08465727894538d5b161b3ced764
I know similarly large datasets with lots of dimensions come up in other contexts as well, e.g., with geophysical model output. That's a very long first line! This would be easier to read as:
or maybe:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4081/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
205455788 | MDU6SXNzdWUyMDU0NTU3ODg= | 1251 | Consistent naming for xarray's methods that apply functions | shoyer 1217238 | closed | 0 | 13 | 2017-02-05T21:27:24Z | 2022-04-27T20:06:25Z | 2022-04-27T20:06:25Z | MEMBER | We currently have two types of methods that take a function to apply to xarray objects:
- And one more method that we want to add but isn't finalized yet -- currently named I'd like to have three distinct names that makes it clear what these methods do and how they are different. This has come up a few times recently, e.g., https://github.com/pydata/xarray/issues/1130 One proposal: rename |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1251/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
342180429 | MDU6SXNzdWUzNDIxODA0Mjk= | 2298 | Making xarray math lazy | shoyer 1217238 | open | 0 | 7 | 2018-07-18T05:18:53Z | 2022-04-19T15:38:59Z | MEMBER | At SciPy, I had the realization that it would be relatively straightforward to make element-wise math between xarray objects lazy. This would let us support lazy coordinate arrays, a feature that has quite a few use-cases, e.g., for both geoscience and astronomy. The trick would be to write a lazy array class that holds an element-wise vectorized function and passes indexers on to its arguments. I haven't thought too hard about this yet for vectorized indexing, but it could be quite efficient for outer indexing. I have some prototype code but no tests yet. The question is how to hook this into xarray operations. In particular, supposing that the inputs to a function do no hold dask arrays:
- Should we try to make every element-wise operation with vectorized functions (ufuncs) lazy by default? This might have negative performance implications and would be a little tricky to implement with xarray's current code, since we still implement binary operations like I am leaning towards the last option for now but would welcome other opinions. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2298/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
902622057 | MDU6SXNzdWU5MDI2MjIwNTc= | 5381 | concat() with compat='no_conflicts' on dask arrays has accidentally quadratic runtime | shoyer 1217238 | open | 0 | 0 | 2021-05-26T16:12:06Z | 2022-04-19T03:48:27Z | MEMBER | This ends up calling This has quadratic behavior if the variables are stored in dask arrays (the dask graph gets one element larger after each loop iteration). This is OK for I encountered this because I guess there's also the related issue which is that even if we produced the output dask graph by hand without a loop, it still wouldn't be easy to evaluate for a large number of elements. Ideally we would use some sort of tree-reduction to ensure the operation can be parallelized. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5381/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
325439138 | MDU6SXNzdWUzMjU0MzkxMzg= | 2171 | Support alignment/broadcasting with unlabeled dimensions of size 1 | shoyer 1217238 | open | 0 | 5 | 2018-05-22T19:52:21Z | 2022-04-19T03:15:24Z | MEMBER | Sometimes, it's convenient to include placeholder dimensions of size 1, which allows for removing any ambiguity related to the order of output dimensions. Currently, this is not supported with xarray: ```
However, these operations aren't really ambiguous. With size 1 dimensions, we could logically do broadcasting like NumPy arrays, e.g., ```
This would be particularly convenient if we add |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2171/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
665488672 | MDU6SXNzdWU2NjU0ODg2NzI= | 4267 | CachingFileManager should not use __del__ | shoyer 1217238 | open | 0 | 2 | 2020-07-25T01:20:52Z | 2022-04-17T21:42:39Z | MEMBER |
Per https://github.com/shoyer/h5netcdf/issues/50#issuecomment-572191867, the right solution is probably to use |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4267/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
469440752 | MDU6SXNzdWU0Njk0NDA3NTI= | 3139 | Change the signature of DataArray to DataArray(data, dims, coords, ...)? | shoyer 1217238 | open | 0 | 1 | 2019-07-17T20:54:57Z | 2022-04-09T15:28:51Z | MEMBER | Currently, the signature of DataArray is In the long term, I think My original reasoning for this argument order was that The challenge in making any change here would be to have a smooth deprecation process, and that ideally avoids requiring users to rewrite all of their code and avoids loads of pointless/extraneous warnings. I'm not entirely sure this is possible. We could likely use heuristics to distinguish between An alternative that might achieve some of the convenience of this change would be to allow for passing lists of strings in the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3139/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
327166000 | MDExOlB1bGxSZXF1ZXN0MTkxMDMwMjA4 | 2195 | WIP: explicit indexes | shoyer 1217238 | closed | 0 | 3 | 2018-05-29T04:25:15Z | 2022-03-21T14:59:52Z | 2022-03-21T14:59:52Z | MEMBER | 0 | pydata/xarray/pulls/2195 | Some utility functions that should be useful for https://github.com/pydata/xarray/issues/1603 Still very much a work in progress -- it would be great if someone has time to finish writing any of these in another PR! |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2195/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
864249974 | MDU6SXNzdWU4NjQyNDk5NzQ= | 5202 | Make creating a MultiIndex in stack optional | shoyer 1217238 | closed | 0 | 7 | 2021-04-21T20:21:03Z | 2022-03-17T17:11:42Z | 2022-03-17T17:11:42Z | MEMBER | As @Hoeze notes in https://github.com/pydata/xarray/issues/5179, calling This is true with how Regardless of how we define the semantics for boolean indexing (https://github.com/pydata/xarray/issues/1887), it seems like it could be a good idea to allow stack to skip creating a MultiIndex for the new dimension, via a new keyword argument such as |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5202/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
237008177 | MDU6SXNzdWUyMzcwMDgxNzc= | 1460 | groupby should still squeeze for non-monotonic inputs | shoyer 1217238 | open | 0 | 5 | 2017-06-19T20:05:14Z | 2022-03-04T21:31:41Z | MEMBER | We can simply use |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1460/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
58117200 | MDU6SXNzdWU1ODExNzIwMA== | 324 | Support multi-dimensional grouped operations and group_over | shoyer 1217238 | open | 0 | 1.0 741199 | 12 | 2015-02-18T19:42:20Z | 2022-02-28T19:03:17Z | MEMBER | Multi-dimensional grouped operations should be relatively straightforward -- the main complexity will be writing an N-dimensional concat that doesn't involve repetitively copying data. The idea with Roughly speaking (it's a little more complex for the case of non-dimension variables), Related: #266 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/324/reactions", "total_count": 18, "+1": 18, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | |||||||
1090700695 | I_kwDOAMm_X85BAsWX | 6125 | [Bug]: HTML repr does not display well in notebooks hosted on GitHub | shoyer 1217238 | open | 0 | 0 | 2021-12-29T19:05:49Z | 2021-12-29T19:36:25Z | MEMBER | What happened?We see both the raw text and a malformed version of the HTML (without CSS formatting). What did you expect to happen?Either:
nbviewer gets this right:
Minimal Complete Verifiable ExampleNo response Relevant log outputNo response Anything else we need to know?No response EnvironmentNA |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6125/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1062709354 | PR_kwDOAMm_X84u-sO9 | 6025 | Simplify missing value handling in xarray.corr | shoyer 1217238 | closed | 0 | 1 | 2021-11-24T17:48:03Z | 2021-11-28T04:39:22Z | 2021-11-28T04:39:22Z | MEMBER | 0 | pydata/xarray/pulls/6025 | This PR simplifies the fix from https://github.com/pydata/xarray/pull/5731, specifically for the benefit of xarray.corr. There is no need to use It is a basically an alternative version of https://github.com/pydata/xarray/pull/5284. It is potentially slightly less efficient to do this masking step when unnecessary, but I doubt this makes a noticeable performance difference in practice (and I doubt this optimization is useful insdie |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6025/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
1044151556 | PR_kwDOAMm_X84uELYB | 5935 | Docs: fix URL for PTSA | shoyer 1217238 | closed | 0 | 1 | 2021-11-03T21:56:44Z | 2021-11-05T09:36:04Z | 2021-11-05T09:36:04Z | MEMBER | 0 | pydata/xarray/pulls/5935 | One of the PTSA authors told me about the new URL by email. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5935/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
874292512 | MDU6SXNzdWU4NzQyOTI1MTI= | 5251 | Switch default for Zarr reading/writing to consolidated=True? | shoyer 1217238 | closed | 0 | 4 | 2021-05-03T06:59:42Z | 2021-08-30T15:21:11Z | 2021-08-30T15:21:11Z | MEMBER | Consolidated metadata was a new feature in Zarr v2.3, which was released over two year ago (March 22, 2019). Since then, I have used I wonder if consolidated metadata is mature enough now that we could consider switching the default behavior in Xarray. From my perspective, this is a big "gotcha" for getting good performance with Zarr. More than one of my colleagues has been unimpressed with the performance of Zarr until they learned to set I would suggest doing this in way is almost entirely backwards compatible, with only a minor performance costs for reading non-consolidated datasets:
- CC @rabernat |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5251/reactions", "total_count": 11, "+1": 11, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
928402742 | MDU6SXNzdWU5Mjg0MDI3NDI= | 5516 | Rename master branch -> main | shoyer 1217238 | closed | 0 | 4 | 2021-06-23T15:45:57Z | 2021-07-23T21:58:39Z | 2021-07-23T21:58:39Z | MEMBER | This is a best practice for inclusive projects. See https://github.com/github/renaming for guidance. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5516/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
948890466 | MDExOlB1bGxSZXF1ZXN0NjkzNjY1NDEy | 5624 | Make typing-extensions optional | shoyer 1217238 | closed | 0 | 6 | 2021-07-20T17:43:22Z | 2021-07-22T23:30:49Z | 2021-07-22T23:02:03Z | MEMBER | 0 | pydata/xarray/pulls/5624 | Type checking may be a little worse if typing-extensions are not installed, but I don't think it's worth the trouble of adding another hard dependency just for one use for TypeGuard. Note: sadly this doesn't work yet. Mypy (and pylance) don't like the type alias defined with try/except. Any ideas? In the worst case, we could revert the TypeGuard entirely, but that would be a shame...
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5624/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
890534794 | MDU6SXNzdWU4OTA1MzQ3OTQ= | 5295 | Engine is no longer inferred for filenames not ending in ".nc" | shoyer 1217238 | closed | 0 | 0 | 2021-05-12T22:28:46Z | 2021-07-15T14:57:54Z | 2021-05-14T22:40:14Z | MEMBER | This works with xarray=0.17.0:
On xarray 0.18.0, it fails: ``` ValueError Traceback (most recent call last) <ipython-input-1-20e128a730aa> in <module>() 2 3 xarray.Dataset({'x': [1, 2, 3]}).to_netcdf('tmp') ----> 4 xarray.open_dataset('tmp') /usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, args, *kwargs) 483 484 if engine is None: --> 485 engine = plugins.guess_engine(filename_or_obj) 486 487 backend = plugins.get_backend(engine) /usr/local/lib/python3.7/dist-packages/xarray/backends/plugins.py in guess_engine(store_spec) 110 warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning) 111 --> 112 raise ValueError("cannot guess the engine, try passing one explicitly") 113 114 ValueError: cannot guess the engine, try passing one explicitly ``` I'm not entirely sure what changed. My guess is that we used to fall-back to trying to use SciPy, but don't do that anymore. A potential fix would be reading strings as filenames in |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5295/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
252707680 | MDU6SXNzdWUyNTI3MDc2ODA= | 1525 | Consider setting name=False in Variable.chunk() | shoyer 1217238 | open | 0 | 4 | 2017-08-24T19:34:28Z | 2021-07-13T01:50:16Z | MEMBER | @mrocklin writes:
See here for discussion: https://github.com/pydata/xarray/pull/1517#issuecomment-324722153 Whether this is worth doing really depends on on what people would find most useful -- and what is the most intuitive behavior. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1525/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
254888879 | MDU6SXNzdWUyNTQ4ODg4Nzk= | 1552 | Flow chart for choosing indexing operations | shoyer 1217238 | open | 0 | 2 | 2017-09-03T17:33:30Z | 2021-07-11T22:26:17Z | MEMBER | We have a lot of indexing operations, even though A flow chart / decision tree to help users pick the right indexing operation might be helpful (e.g., like this skimage FlowChart). It would ask various questions (e.g., do you have labels or integer positions? do you want to select or impose coordinates?) and then suggest appropriate the indexer methods. cc @fujiisoup |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1552/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
891281614 | MDU6SXNzdWU4OTEyODE2MTQ= | 5302 | Suggesting specific IO backends to install when open_dataset() fails | shoyer 1217238 | closed | 0 | 3 | 2021-05-13T18:45:28Z | 2021-06-23T08:18:07Z | 2021-06-23T08:18:07Z | MEMBER | Currently, Xarray's internal backends don't get registered unless the necessary dependencies are installed: https://github.com/pydata/xarray/blob/1305d9b624723b86050ca5b2d854e5326bbaa8e6/xarray/backends/netCDF4_.py#L567-L568 In order to facilitating suggesting a specific backend to install (e.g., to improve error messages from opening tutorial datasets https://github.com/pydata/xarray/issues/5291), I would suggest that Xarray always registers its own backend entrypoints. Then we make the following changes to the plugin protocol:
This will let us leverage the existing Does this reasonable and worthwhile? CC @aurghs @alexamici |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5302/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
874331538 | MDExOlB1bGxSZXF1ZXN0NjI4OTE0NDQz | 5252 | Add mode="r+" for to_zarr and use consolidated writes/reads by default | shoyer 1217238 | closed | 0 | 14 | 2021-05-03T07:57:16Z | 2021-06-22T06:51:35Z | 2021-06-17T17:19:26Z | MEMBER | 0 | pydata/xarray/pulls/5252 |
This PR includes several related changes to
These changes gave me a ~5x boost in write performance in a large
parallel job making use of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5252/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
340733448 | MDU6SXNzdWUzNDA3MzM0NDg= | 2283 | Exact alignment should allow missing dimension coordinates | shoyer 1217238 | open | 0 | 2 | 2018-07-12T17:40:24Z | 2021-06-15T09:52:29Z | MEMBER | Code Sample, a copy-pastable example if possible
Problem descriptionThis currently results in an error, but a missing index of size 3 does not actually conflict: ```python-traceback ValueError Traceback (most recent call last) <ipython-input-15-1d63d3512fb6> in <module>() 1 xr.align(xr.DataArray([1, 2, 3], dims='x'), 2 xr.DataArray([1, 2, 3], dims='x', coords=[[0, 1, 2]]), ----> 3 join='exact') /usr/local/lib/python3.6/dist-packages/xarray/core/alignment.py in align(objects, *kwargs) 129 raise ValueError( 130 'indexes along dimension {!r} are not equal' --> 131 .format(dim)) 132 index = joiner(matching_indexes) 133 joined_indexes[dim] = index ValueError: indexes along dimension 'x' are not equal ``` This surfaced as an issue on StackOverflow: https://stackoverflow.com/questions/51308962/computing-matrix-vector-multiplication-for-each-time-point-in-two-dataarrays Expected OutputBoth output arrays should end up with the Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2283/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
842438533 | MDU6SXNzdWU4NDI0Mzg1MzM= | 5082 | Move encoding from xarray.Variable to duck arrays? | shoyer 1217238 | open | 0 | 2 | 2021-03-27T07:21:55Z | 2021-06-13T01:34:00Z | MEMBER | The I think a cleaner way to handle |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5082/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
416554477 | MDU6SXNzdWU0MTY1NTQ0Nzc= | 2797 | Stalebot is being overly aggressive | shoyer 1217238 | closed | 0 | 7 | 2019-03-03T19:37:37Z | 2021-06-03T21:31:46Z | 2021-06-03T21:22:48Z | MEMBER | E.g., see https://github.com/pydata/xarray/issues/1151 where stalebot closed an issue even after another comment. Is this something we need to reconfigure or just a bug? cc @pydata/xarray |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2797/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
276241764 | MDU6SXNzdWUyNzYyNDE3NjQ= | 1739 | Utility to restore original dimension order after apply_ufunc | shoyer 1217238 | open | 0 | 11 | 2017-11-23T00:47:57Z | 2021-05-29T07:39:33Z | MEMBER | This seems to be coming up quite a bit for wrapping functions that apply an operation along an axis, e.g., for We should either write a utility function to do this or consider adding an option to |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1739/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
901047466 | MDU6SXNzdWU5MDEwNDc0NjY= | 5372 | Consider revising the _repr_inline_ protocol | shoyer 1217238 | open | 0 | 0 | 2021-05-25T16:18:31Z | 2021-05-25T16:18:31Z | MEMBER |
As I wrote in https://github.com/pydata/xarray/pull/5352, I would suggest revising it in one of two ways:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5372/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
891253662 | MDExOlB1bGxSZXF1ZXN0NjQ0MTQ5Mzc2 | 5300 | Better error message when no backend engine is found. | shoyer 1217238 | closed | 0 | 4 | 2021-05-13T18:10:04Z | 2021-05-18T21:23:00Z | 2021-05-18T21:23:00Z | MEMBER | 0 | pydata/xarray/pulls/5300 | Also includes a better error message when loading a tutorial dataset but an underlying IO dependency is not found.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5300/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
890573049 | MDExOlB1bGxSZXF1ZXN0NjQzNTc1Mjc5 | 5296 | More robust guess_can_open for netCDF4/scipy/h5netcdf entrypoints | shoyer 1217238 | closed | 0 | 1 | 2021-05-12T23:53:32Z | 2021-05-14T22:40:14Z | 2021-05-14T22:40:14Z | MEMBER | 0 | pydata/xarray/pulls/5296 | The new version checks magic numbers in files on disk, not just already open file objects. I've also added a bunch of unit-tests. Fixes GH5295
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5296/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
46049691 | MDU6SXNzdWU0NjA0OTY5MQ== | 255 | Add Dataset.to_pandas() method | shoyer 1217238 | closed | 0 | 0.5 987654 | 2 | 2014-10-17T00:01:36Z | 2021-05-04T13:56:00Z | 2021-05-04T13:56:00Z | MEMBER | This would be the complement of the DataArray constructor, converting an xray.DataArray into a 1D series, 2D DataFrame or 3D panel, whichever is appropriate.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/255/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | |||||
294241734 | MDU6SXNzdWUyOTQyNDE3MzQ= | 1887 | Boolean indexing with multi-dimensional key arrays | shoyer 1217238 | open | 0 | 13 | 2018-02-04T23:28:45Z | 2021-04-22T21:06:47Z | MEMBER | Originally from https://github.com/pydata/xarray/issues/974 For boolean indexing:
- |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1887/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
346822633 | MDU6SXNzdWUzNDY4MjI2MzM= | 2336 | test_88_character_filename_segmentation_fault should not try to write to the current working directory | shoyer 1217238 | closed | 0 | 2 | 2018-08-02T01:06:41Z | 2021-04-20T23:38:53Z | 2021-04-20T23:38:53Z | MEMBER | This files in cases where the current working directory does not support writes, e.g., as seen here ``` def test_88_character_filename_segmentation_fault(self): # should be fixed in netcdf4 v1.3.1 with mock.patch('netCDF4.version', '1.2.4'): with warnings.catch_warnings(): message = ('A segmentation fault may occur when the ' 'file path has exactly 88 characters') warnings.filterwarnings('error', message) with pytest.raises(Warning): # Need to construct 88 character filepath
tests/test_backends.py:1234: core/dataset.py:1150: in to_netcdf compute=compute) backends/api.py:715: in to_netcdf autoclose=autoclose, lock=lock) backends/netCDF4_.py:332: in open ds = opener() backends/netCDF4_.py:231: in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs) third_party/py/netCDF4/_netCDF4.pyx:2111: in netCDF4._netCDF4.Dataset.init ???
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2336/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
843996137 | MDU6SXNzdWU4NDM5OTYxMzc= | 5092 | Concurrent loading of coordinate arrays from Zarr | shoyer 1217238 | open | 0 | 0 | 2021-03-30T02:19:50Z | 2021-04-19T02:43:31Z | MEMBER | When you open a dataset with Zarr, xarray loads coordinate arrays corresponding to indexes in serial. This can be slow (multiple seconds) even with only a handful of such arrays if they are stored in a remote filesystem (e.g., cloud object stores). This is similar to the use-cases for consolidated metadata. In principle, we could speed up loading datasets from Zarr into Xarray significantly by reading the data corresponding to these arrays in parallel (e.g., in multiple threads). |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5092/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
621082480 | MDU6SXNzdWU2MjEwODI0ODA= | 4080 | Most arguments to open_dataset should be keyword only | shoyer 1217238 | closed | 0 | 1 | 2020-05-19T15:38:51Z | 2021-03-16T10:56:09Z | 2021-03-16T10:56:09Z | MEMBER |
Similarly to the case for pandas (https://github.com/pandas-dev/pandas/issues/27544), it would be nice to make most of these arguments keyword-only, e.g., This would encourage writing readable code when calling To make this change, we could make use of the |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4080/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
645062817 | MDExOlB1bGxSZXF1ZXN0NDM5NTg4OTU1 | 4178 | Fix min_deps_check; revert to support numpy=1.14 and pandas=0.24 | shoyer 1217238 | closed | 0 | 5 | 2020-06-25T00:37:19Z | 2021-02-27T21:46:43Z | 2021-02-27T21:46:42Z | MEMBER | 1 | pydata/xarray/pulls/4178 | Fixes the issue noticed in: https://github.com/pydata/xarray/pull/4175#issuecomment-649135372 Let's see if this passes CI...
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4178/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
645154872 | MDU6SXNzdWU2NDUxNTQ4NzI= | 4179 | Consider revising our minimum dependency version policy | shoyer 1217238 | closed | 0 | 7 | 2020-06-25T05:04:38Z | 2021-02-22T05:02:25Z | 2021-02-22T05:02:25Z | MEMBER | Our current policy is that xarray supports "the minor version (X.Y) initially published no more than N months ago" where N is:
I think this policy is too aggressive, particularly for pandas, SciPy and other libraries. Some of these projects can go 6+ months between minor releases. For example, version 2.3 of zarr is currently more than 6 months old. So if zarr released 2.4 today and xarray issued a new release tomorrow, and then our policy would dictate that we could ask users to upgrade to the new version. In https://github.com/pydata/xarray/pull/4178, I misinterpreted our policy as supporting "the most recent minor version (X.Y) initially published more than N months ago". This version makes a bit more sense to me: users only need to upgrade dependencies at least every N months to use the latest xarray release. I understand that NEP-29 chose its language intentionally, so that distributors know ahead of time when they can drop support for a Python or NumPy version. But this seems like a (very) poor fit for projects without regular releases. At the very least we should adjust the specific time windows. I'll see if I can gain some understanding of the motivation for this particular language over on the NumPy tracker... |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4179/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
267927402 | MDU6SXNzdWUyNjc5Mjc0MDI= | 1652 | Resolve warnings issued in the xarray test suite | shoyer 1217238 | closed | 0 | 10 | 2017-10-24T07:36:55Z | 2021-02-21T23:06:35Z | 2021-02-21T23:06:34Z | MEMBER | 82 warnings are currently issued in the process of running our test suite: https://gist.github.com/shoyer/db0b2c82efd76b254453216e957c4345 Some of can probably be safely ignored, but others are likely noticed by users, e.g., https://stackoverflow.com/questions/41130138/why-is-invalid-value-encountered-in-greater-warning-thrown-in-python-xarray-fo/41147570#41147570 It would be nice to clean up all of these, either by catching the appropriate upstream warning (if irrelevant) or changing our usage to avoid the warning. There may very well be a lurking FutureWarning in there somewhere that could cause issues when another library updates. Probably the easiest way to get started here is to get the test suite running locally, and use |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1652/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
777327298 | MDU6SXNzdWU3NzczMjcyOTg= | 4749 | Option for combine_attrs with conflicting values silently dropped | shoyer 1217238 | closed | 0 | 0 | 2021-01-01T18:04:49Z | 2021-02-10T19:50:17Z | 2021-02-10T19:50:17Z | MEMBER |
It would be nice to have an option to combine attrs from all objects like "no_conflicts", but that drops attributes with conflicting values rather than raising an error. We might call this This is similar to how xarray currently handles conflicting values for cc @keewis |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4749/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
264098632 | MDU6SXNzdWUyNjQwOTg2MzI= | 1618 | apply_raw() for a simpler version of apply_ufunc() | shoyer 1217238 | open | 0 | 4 | 2017-10-10T04:51:38Z | 2021-01-01T17:14:43Z | MEMBER |
The rule for Output dimensions would be determined from a simple rule of some sort:
- Default output dimensions would either be copied from the first argument, or would take on the ordered union on all input dimensions.
- Custom dimensions could either be set by adding a This also could be suitable for defining as a method instead of a separate function. See https://github.com/pydata/xarray/issues/1251 and https://github.com/pydata/xarray/issues/1130 for related issues. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1618/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
269700511 | MDU6SXNzdWUyNjk3MDA1MTE= | 1672 | Append along an unlimited dimension to an existing netCDF file | shoyer 1217238 | open | 0 | 8 | 2017-10-30T18:09:54Z | 2020-11-29T17:35:04Z | MEMBER | This would be a nice feature to have for some use cases, e.g., for writing simulation time-steps: https://stackoverflow.com/questions/46951981/create-and-write-xarray-dataarray-to-netcdf-in-chunks It should be relatively straightforward to add, too, building on support for writing files with unlimited dimensions. User facing API would probably be a new keyword argument to |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1672/reactions", "total_count": 21, "+1": 21, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
314444743 | MDU6SXNzdWUzMTQ0NDQ3NDM= | 2059 | How should xarray serialize bytes/unicode strings across Python/netCDF versions? | shoyer 1217238 | open | 0 | 5 | 2018-04-15T19:36:55Z | 2020-11-19T10:08:16Z | MEMBER | netCDF string typesWe have several options for storing strings in netCDF files:
-
NumPy/Python string typesOn the Python side, our options are perhaps even more confusing:
- NumPy's Like pandas, we are pretty liberal with converting back and forth between fixed-length ( Current behavior of xarrayCurrently, xarray uses the same behavior on Python 2/3. The priority was faithfully round-tripping data from a particular version of Python to netCDF and back, which the current serialization behavior achieves: | Python version | NetCDF version | NumPy datatype | NetCDF datatype | | --------- | ---------- | -------------- | ------------ | | Python 2 | NETCDF3 | np.string_ / str | NC_CHAR | | Python 2 | NETCDF4 | np.string_ / str | NC_CHAR | | Python 3 | NETCDF3 | np.string_ / bytes | NC_CHAR | | Python 3 | NETCDF4 | np.string_ / bytes | NC_CHAR | | Python 2 | NETCDF3 | np.unicode_ / unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | np.unicode_ / unicode | NC_STRING | | Python 3 | NETCDF3 | np.unicode_ / str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | np.unicode_ / str | NC_STRING | | Python 2 | NETCDF3 | object bytes/str | NC_CHAR | | Python 2 | NETCDF4 | object bytes/str | NC_CHAR | | Python 3 | NETCDF3 | object bytes | NC_CHAR | | Python 3 | NETCDF4 | object bytes | NC_CHAR | | Python 2 | NETCDF3 | object unicode | NC_CHAR with UTF-8 encoding | | Python 2 | NETCDF4 | object unicode | NC_STRING | | Python 3 | NETCDF3 | object unicode/str | NC_CHAR with UTF-8 encoding | | Python 3 | NETCDF4 | object unicode/str | NC_STRING | This can also be selected explicitly for most data-types by setting dtype in encoding:
- Script for generating table:
```python
from __future__ import print_function
import xarray as xr
import uuid
import netCDF4
import numpy as np
import sys
for dtype_name, value in [
('np.string_ / ' + type(b'').__name__, np.array([b'abc'])),
('np.unicode_ / ' + type(u'').__name__, np.array([u'abc'])),
('object bytes/' + type(b'').__name__, np.array([b'abc'], dtype=object)),
('object unicode/' + type(u'').__name__, np.array([u'abc'], dtype=object)),
]:
for format in ['NETCDF3_64BIT', 'NETCDF4']:
filename = str(uuid.uuid4()) + '.nc'
xr.Dataset({'data': value}).to_netcdf(filename, format=format)
with netCDF4.Dataset(filename) as f:
var = f.variables['data']
disk_dtype = var.dtype
has_encoding = hasattr(var, '_Encoding')
disk_dtype_name = (('NC_CHAR' if disk_dtype == 'S1' else 'NC_STRING') +
(' with UTF-8 encoding' if has_encoding else ''))
print('|', 'Python %i' % sys.version_info[0],
'|', format[:7],
'|', dtype_name,
'|', disk_dtype_name,
'|')
```
Potential alternativesThe main option I'm considering is switching to default to This would imply two changes:
1. Attempting to serialize arbitrary bytes (on Python 2) would start raising an error -- anything that isn't ASCII would require explicitly disabling This implicit conversion would be consistent with Python 2's general handling of bytes/unicode, and facilitate reading netCDF files on Python 3 that were written with Python 2. The counter-argument is that it may not be worth changing this at this late point, given that we will be sunsetting Python 2 support by year's end. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2059/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
613012939 | MDExOlB1bGxSZXF1ZXN0NDEzODQ3NzU0 | 4035 | Support parallel writes to regions of zarr stores | shoyer 1217238 | closed | 0 | 17 | 2020-05-06T02:40:19Z | 2020-11-04T06:19:01Z | 2020-11-04T06:19:01Z | MEMBER | 0 | pydata/xarray/pulls/4035 | This PR adds support for a This is useful for creating large Zarr datasets without requiring dask. For example, the separate workers in a simulation job might each write a single non-overlapping chunk of a Zarr file. The standard way to handle such datasets today is to first write netCDF files in each process, and then consolidate them afterwards with dask (see #3096). Creating empty Zarr storesIn order to do so, the Zarr file must be pre-existing with desired variables in the right shapes/chunks. It is desirable to be able to create such stores without actually writing data, because datasets that we want to write in parallel may be very large. In the example below, I achieve this filling a
I think (1) is maybe the cleanest option (no extra API endpoints). Unchunked variablesOne potential gotcha concerns coordinate arrays that are not chunked, e.g., consider parallel writing of a dataset divided along time with 2D If a Zarr store does not have atomic writes, then conceivably this could result in corrupted data. The default DirectoryStore has atomic writes and cloud based object stores should also be atomic, so perhaps this doesn't matter in practice, but at the very least it's inefficient and could cause issues for large-scale jobs due to resource contention. Options include:
I think (4) would be my preferred option. Some users would undoubtedly find this annoying, but the power-users for whom we are adding this feature would likely appreciate it. Usage example```python import xarray import dask.array as da ds = xarray.Dataset({'u': (('x',), da.arange(1000, chunks=100))}) create the new zarr store, but don't write datapath = 'my-data.zarr' ds.to_zarr(path, compute=False) look at the unwritten datads_opened = xarray.open_zarr(path) print('Data before writing:', ds_opened.u.data[::100].compute()) Data before writing: [ 1 100 1 100 100 1 1 1 1 1]write out each slice (could be in separate processes)for start in range(0, 1000, 100): selection = {'x': slice(start, start + 100)} ds.isel(selection).to_zarr(path, region=selection) print('Data after writing:', ds_opened.u.data[::100].compute()) Data after writing: [ 0 100 200 300 400 500 600 700 800 900]```
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4035/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
124809636 | MDU6SXNzdWUxMjQ4MDk2MzY= | 703 | Document xray internals / advanced API | shoyer 1217238 | closed | 0 | 2 | 2016-01-04T18:12:30Z | 2020-11-03T17:33:32Z | 2020-11-03T17:33:32Z | MEMBER | It would be useful to document the internal I had some notes in an earlier version of the docs that could be adapted. Note, however, that the internal structure of |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/703/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
715374721 | MDU6SXNzdWU3MTUzNzQ3MjE= | 4490 | Group together decoding options into a single argument | shoyer 1217238 | open | 0 | 6 | 2020-10-06T06:15:18Z | 2020-10-29T04:07:46Z | MEMBER | Is your feature request related to a problem? Please describe.
Describe the solution you'd like To simple the interface, I propose to group together all the decoding options into a new @dataclass(frozen=True) class DecodingOptions: mask: Optional[bool] = None scale: Optional[bool] = None datetime: Optional[bool] = None timedelta: Optional[bool] = None use_cftime: Optional[bool] = None concat_characters: Optional[bool] = None coords: Optional[bool] = None drop_variables: Optional[List[str]] = None
``` The signature of Question: are Note: the current signature is Usage with the new interface would look like This requires a little bit more typing than what we currently have, but it has a few advantages:
Describe alternatives you've considered For the overall approach:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4490/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
718492237 | MDExOlB1bGxSZXF1ZXN0NTAwODc5MTY3 | 4500 | Add variable/attribute names to netCDF validation errors | shoyer 1217238 | closed | 0 | 1 | 2020-10-10T00:47:18Z | 2020-10-10T05:28:08Z | 2020-10-10T05:28:08Z | MEMBER | 0 | pydata/xarray/pulls/4500 | This should result in a better user experience, e.g., specifically pointing out the attribute with an invalid value.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4500/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
169274464 | MDU6SXNzdWUxNjkyNzQ0NjQ= | 939 | Consider how to deal with the proliferation of decoder options on open_dataset | shoyer 1217238 | closed | 0 | 8 | 2016-08-04T01:57:26Z | 2020-10-06T15:39:11Z | 2020-10-06T15:39:11Z | MEMBER | There are already lots of keyword arguments, and users want even more! (#843) Maybe we should use some sort of object to encapsulate desired options? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/939/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
253107677 | MDU6SXNzdWUyNTMxMDc2Nzc= | 1527 | Binary operations with ds.groupby('time.dayofyear') errors out, but ds.groupby('time.month') works | shoyer 1217238 | open | 0 | 10 | 2017-08-26T16:54:53Z | 2020-09-29T10:05:42Z | MEMBER | Reported on the mailing list: Original datasets: ```
Issue: Grouping by month works and outputs this: ```
Grouping by dayofyear doesn't work and gives this traceback: ```
/data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in func(self, other) 316 g = f if not reflexive else lambda x, y: f(y, x) 317 applied = self._yield_binary_applied(g, other) --> 318 combined = self._combine(applied) 319 return combined 320 return func /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/groupby.py in _combine(self, applied, shortcut) 532 combined = self._concat_shortcut(applied, dim, positions) 533 else: --> 534 combined = concat(applied, dim) 535 combined = _maybe_reorder(combined, dim, positions) 536 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over) 118 raise TypeError('can only concatenate xarray Dataset and DataArray ' 119 'objects, got %s' % type(first_obj)) --> 120 return f(objs, dim, data_vars, coords, compat, positions) 121 122 /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions) 210 datasets = align(*datasets, join='outer', copy=False, exclude=[dim]) 211 --> 212 concat_over = _calc_concat_over(datasets, dim, data_vars, coords) 213 214 def insert_result_variable(k, v): /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in _calc_concat_over(datasets, dim, data_vars, coords) 190 if dim in v.dims) 191 concat_over.update(process_subset_opt(data_vars, 'data_vars')) --> 192 concat_over.update(process_subset_opt(coords, 'coords')) 193 if dim in datasets[0]: 194 concat_over.add(dim) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in process_subset_opt(opt, subset) 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset --> 167 concat_new = set(k for k in getattr(datasets[0], subset) 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) --> 168 if k not in concat_over and differs(k)) 169 elif opt == 'all': 170 concat_new = (set(getattr(datasets[0], subset)) - /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in differs(vname) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/combine.py in <genexpr>(.0) 163 v = datasets[0].variables[vname] 164 return any(not ds.variables[vname].equals(v) --> 165 for ds in datasets[1:]) 166 # all nonindexes that are not the same in each dataset 167 concat_new = set(k for k in getattr(datasets[0], subset) /data/keeling/a/ahuang11/anaconda3/lib/python3.6/site-packages/xarray/core/utils.py in getitem(self, key) 288 289 def getitem(self, key): --> 290 return self.mapping[key] 291 292 def iter(self): KeyError: 'lon' ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1527/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
644821435 | MDU6SXNzdWU2NDQ4MjE0MzU= | 4176 | Pre-expand data and attributes in DataArray/Variable HTML repr? | shoyer 1217238 | closed | 0 | 7 | 2020-06-24T18:22:35Z | 2020-09-21T20:10:26Z | 2020-06-28T17:03:40Z | MEMBER | ProposalGiven that a major purpose for plotting an array is to look at data or attributes, I wonder if we should expand these sections by default? - I worry that clicking on icons to expand sections may not be easy to discover - This would also be consistent with the text repr, which shows these sections by default (the Dataset repr is already consistent by default between text and HTML already) ContextCurrently the HTML repr for DataArray/Variable looks like this:
To see array data, you have to click on the (thanks to @max-sixty for making this a little bit more manageably sized in https://github.com/pydata/xarray/pull/3905!) There's also a really nice repr for nested dask arrays:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4176/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
702372014 | MDExOlB1bGxSZXF1ZXN0NDg3NjYxMzIz | 4426 | Fix for h5py deepcopy issues | shoyer 1217238 | closed | 0 | 6 | 2020-09-16T01:11:00Z | 2020-09-18T22:31:13Z | 2020-09-18T22:31:09Z | MEMBER | 0 | pydata/xarray/pulls/4426 |
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4426/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
669307837 | MDExOlB1bGxSZXF1ZXN0NDU5Njk1NDA5 | 4292 | Fix indexing with datetime64[ns] with pandas=1.1 | shoyer 1217238 | closed | 0 | 11 | 2020-07-31T00:48:50Z | 2020-09-16T03:11:48Z | 2020-09-16T01:33:30Z | MEMBER | 0 | pydata/xarray/pulls/4292 | Fixes #4283 The underlying issue is that calling
We can fix this by using I've added a crude regression test. There may well be a better way to test this but I haven't figured it out yet.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4292/reactions", "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
417542619 | MDU6SXNzdWU0MTc1NDI2MTk= | 2803 | Test failure with TestValidateAttrs.test_validating_attrs | shoyer 1217238 | closed | 0 | 6 | 2019-03-05T23:03:02Z | 2020-08-25T14:29:19Z | 2019-03-14T15:59:13Z | MEMBER | This is due to setting multi-dimensional attributes being an error, as of the latest netCDF4-Python release: https://github.com/Unidata/netcdf4-python/blob/master/Changelog E.g., as seen on Appveyor: https://ci.appveyor.com/project/shoyer/xray/builds/22834250/job/9q0ip6i3cchlbkw2 ``` ================================== FAILURES =================================== ___ TestValidateAttrs.test_validating_attrs _____ self = <xarray.tests.test_backends.TestValidateAttrs object at 0x00000096BE5FAFD0> def test_validating_attrs(self): def new_dataset(): return Dataset({'data': ('y', np.arange(10.0))}, {'y': np.arange(10)})
xarray\core\dataset.py:1323: in to_netcdf compute=compute) xarray\backends\api.py:767: in to_netcdf unlimited_dims=unlimited_dims) xarray\backends\api.py:810: in dump_to_store unlimited_dims=unlimited_dims) xarray\backends\common.py:262: in store self.set_attributes(attributes) xarray\backends\common.py:278: in set_attributes self.set_attribute(k, v) xarray\backends\netCDF4_.py:418: in set_attribute set_nc_attribute(self.ds, key, value) xarray\backends\netCDF4.py:294: in _set_nc_attribute obj.setncattr(key, value) netCDF4_netCDF4.pyx:2781: in netCDF4._netCDF4.Dataset.setncattr ???
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2803/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
676306518 | MDU6SXNzdWU2NzYzMDY1MTg= | 4331 | Support explicitly setting a dimension order with to_dataframe() | shoyer 1217238 | closed | 0 | 0 | 2020-08-10T17:45:17Z | 2020-08-14T18:28:26Z | 2020-08-14T18:28:26Z | MEMBER | As discussed in https://github.com/pydata/xarray/issues/2346, it would be nice to support explicitly setting the desired order of dimensions when calling There is nice precedent for this in the I imagine we could copy the exact same API for `to_dataframe. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4331/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
671019427 | MDU6SXNzdWU2NzEwMTk0Mjc= | 4295 | We shouldn't require a recent version of setuptools to install xarray | shoyer 1217238 | closed | 0 | 33 | 2020-08-01T16:49:57Z | 2020-08-14T09:52:42Z | 2020-08-14T09:52:42Z | MEMBER | @canol reports on our mailing that our setuptools 41.2 (released 21 August 2019) install requirement is making it hard to install recent versions of xarray at his company: https://groups.google.com/g/xarray/c/HS_xcZDEEtA/m/GGmW-3eMCAAJ
I was surprised to see this in our Given that setuptools may be challenging to upgrade, would it be possible to relax this version requirement? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4295/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
638597800 | MDExOlB1bGxSZXF1ZXN0NDM0MzMxNzQ3 | 4154 | Update issue templates inspired/based on dask | shoyer 1217238 | closed | 0 | 1 | 2020-06-15T07:00:53Z | 2020-08-05T13:05:33Z | 2020-06-17T16:50:57Z | MEMBER | 0 | pydata/xarray/pulls/4154 | See https://github.com/dask/dask/issues/new/choose for an approximate example of what this looks like. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4154/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
290593053 | MDU6SXNzdWUyOTA1OTMwNTM= | 1850 | xarray contrib module | shoyer 1217238 | closed | 0 | 25 | 2018-01-22T19:50:08Z | 2020-07-23T16:34:10Z | 2020-07-23T16:34:10Z | MEMBER | Over in #1288 @nbren12 wrote:
Yes, I agree that we should explore this. There are a lot of interesting projects building on xarray now but not great ways to discover them. Are there other open source projects with a good model we should copy here?
- Scikit-Learn has a separate GitHub org/repositories for contrib projects: https://github.com/scikit-learn-contrib.
- TensorFlow has a contrib module within the TensorFlow namespace: This gives us two different models to consider. The first "separate repository" model might be easier/flexible from a maintenance perspective. Any preferences/thoughts? There's also some nice overlap with the Pangeo project. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1850/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
646073396 | MDExOlB1bGxSZXF1ZXN0NDQwNDMxNjk5 | 4184 | Improve the speed of from_dataframe with a MultiIndex (by 40x!) | shoyer 1217238 | closed | 0 | 1 | 2020-06-26T07:39:14Z | 2020-07-02T20:39:02Z | 2020-07-02T20:39:02Z | MEMBER | 0 | pydata/xarray/pulls/4184 | Before:
After:
~~There are still some cases where we have to fall back to the existing slow implementation, but hopefully they should now be relatively rare.~~ Edit: now we always use the new implementation
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4184/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 1, "eyes": 0 } |
xarray 13221727 | pull | |||||
645961347 | MDExOlB1bGxSZXF1ZXN0NDQwMzQ2NTQz | 4182 | Show data by default in HTML repr for DataArray | shoyer 1217238 | closed | 0 | 0 | 2020-06-26T02:25:08Z | 2020-06-28T17:03:41Z | 2020-06-28T17:03:41Z | MEMBER | 0 | pydata/xarray/pulls/4182 |
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4182/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
644170008 | MDExOlB1bGxSZXF1ZXN0NDM4ODQxMjk2 | 4171 | Remove <pre> from nested HTML repr | shoyer 1217238 | closed | 0 | 0 | 2020-06-23T21:51:14Z | 2020-06-24T15:45:20Z | 2020-06-24T15:45:00Z | MEMBER | 0 | pydata/xarray/pulls/4171 | Using Before (Jupyter notebook):
After:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4171/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
613546626 | MDExOlB1bGxSZXF1ZXN0NDE0MjgwMDEz | 4039 | Revise pull request template | shoyer 1217238 | closed | 0 | 5 | 2020-05-06T19:08:19Z | 2020-06-18T05:45:11Z | 2020-06-18T05:45:10Z | MEMBER | 0 | pydata/xarray/pulls/4039 | See below for the new language, to clarify that documentation is only necessary for "user visible changes." I added "including notable bug fixes" to indicate that minor bug fixes may not be worth noting (I was thinking of test-suite only fixes in this category) but perhaps that is too confusing. cc @pydata/xarray for opinions!
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4039/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
639334065 | MDExOlB1bGxSZXF1ZXN0NDM0OTQ0NTc4 | 4159 | Test RTD's new pull request builder | shoyer 1217238 | closed | 0 | 1 | 2020-06-16T03:06:32Z | 2020-06-17T16:54:02Z | 2020-06-17T16:54:02Z | MEMBER | 1 | pydata/xarray/pulls/4159 | { "url": "https://api.github.com/repos/pydata/xarray/issues/4159/reactions", "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 3, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | ||||||
639397110 | MDExOlB1bGxSZXF1ZXN0NDM0OTk1NzQz | 4160 | Fix failing upstream-dev build & remove docs build | shoyer 1217238 | closed | 0 | 0 | 2020-06-16T06:08:55Z | 2020-06-16T06:35:49Z | 2020-06-16T06:35:44Z | MEMBER | 0 | pydata/xarray/pulls/4160 | Instead, we'll use RTD's new doc builder instead. For an example, click on "docs/readthedocs.org:xray" below or look at GH4159
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4160/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
35682274 | MDU6SXNzdWUzNTY4MjI3NA== | 158 | groupby should work with name=None | shoyer 1217238 | closed | 0 | 2 | 2014-06-13T15:38:00Z | 2020-05-30T13:15:56Z | 2020-05-30T13:15:56Z | MEMBER | { "url": "https://api.github.com/repos/pydata/xarray/issues/158/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | |||||||
612214951 | MDExOlB1bGxSZXF1ZXN0NDEzMjIyOTEx | 4028 | Remove broken test for Panel with to_pandas() | shoyer 1217238 | closed | 0 | 5 | 2020-05-04T22:41:42Z | 2020-05-06T01:50:21Z | 2020-05-06T01:50:21Z | MEMBER | 0 | pydata/xarray/pulls/4028 | We don't support creating a Panel with to_pandas() with any version of pandas at present, so this test was previous broken if pandas < 0.25 was installed. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4028/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
612772669 | MDU6SXNzdWU2MTI3NzI2Njk= | 4030 | Doc build on Azure is timing out on master | shoyer 1217238 | closed | 0 | 1 | 2020-05-05T17:30:16Z | 2020-05-05T21:49:26Z | 2020-05-05T21:49:26Z | MEMBER | I don't know what's going on, but it currently times out after 1 hour: https://dev.azure.com/xarray/xarray/_build/results?buildId=2767&view=logs&j=7e620c85-24a8-5ffa-8b1f-642bc9b1fc36&t=68484831-0a19-5145-bfe9-6309e5f7691d Is it possible to login to Azure to debug this stuff? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4030/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
612838635 | MDExOlB1bGxSZXF1ZXN0NDEzNzA3Mzgy | 4032 | Allow warning with cartopy in docs plotting build | shoyer 1217238 | closed | 0 | 1 | 2020-05-05T19:25:11Z | 2020-05-05T21:49:26Z | 2020-05-05T21:49:26Z | MEMBER | 0 | pydata/xarray/pulls/4032 | Fixes https://github.com/pydata/xarray/issues/4030 It looks like this is triggered by the new cartopy version now being installed on RTD (version 0.17.0 -> 0.18.0). Long term we should fix this, but for now it's better just to disable the warning. Here's the message from RTD:
/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/latest/xarray/plot/facetgrid.py:373: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations. self.fig.tight_layout() <<<------------------------------------------------------------------------- ``` https://readthedocs.org/projects/xray/builds/10969146/ |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4032/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
612262200 | MDExOlB1bGxSZXF1ZXN0NDEzMjYwNTY2 | 4029 | Support overriding existing variables in to_zarr() without appending | shoyer 1217238 | closed | 0 | 2 | 2020-05-05T01:06:40Z | 2020-05-05T19:28:02Z | 2020-05-05T19:28:02Z | MEMBER | 0 | pydata/xarray/pulls/4029 | This is nice for consistency with
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4029/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
187625917 | MDExOlB1bGxSZXF1ZXN0OTI1MjQzMjg= | 1087 | WIP: New DataStore / Encoder / Decoder API for review | shoyer 1217238 | closed | 0 | 8 | 2016-11-07T05:02:04Z | 2020-04-17T18:37:45Z | 2020-04-17T18:37:45Z | MEMBER | 0 | pydata/xarray/pulls/1087 | The goal here is to make something extensible that we can live with for quite some time, and to clean up the internals of xarray's backend interface. Most of these are analogues of existing xarray classes with a cleaned up interface. I have not yet worried about backwards compatibility or tests -- I would appreciate feedback on the approach here. Several parts of the logic exist for the sake of dask. I've included the word "dask" in comments to facilitate inspection by mrocklin. CC @rabernat, @pwolfram, @jhamman, @mrocklin -- for review CC @mcgibbon, @JoyMonteiro -- this is relevant to our discussion today about adding support for appending to netCDF files. Don't let this stop you from getting started on that with the existing interface, though. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1087/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | pull | |||||
598567792 | MDU6SXNzdWU1OTg1Njc3OTI= | 3966 | HTML repr is slightly broken in Google Colab | shoyer 1217238 | closed | 0 | 1 | 2020-04-12T20:44:51Z | 2020-04-16T20:14:37Z | 2020-04-16T20:14:32Z | MEMBER | The "data" toggles are pre-expanded and don't work. See https://github.com/googlecolab/colabtools/issues/1145 for a full description. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3966/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
479434052 | MDU6SXNzdWU0Nzk0MzQwNTI= | 3206 | DataFrame with MultiIndex -> xarray with sparse array | shoyer 1217238 | closed | 0 | 1 | 2019-08-12T00:46:16Z | 2020-04-06T20:41:26Z | 2019-08-27T08:54:26Z | MEMBER | Now that we have preliminary support for sparse arrays in xarray, one really cool feature we could explore is creating sparse arrays from MultiIndexed pandas DataFrames. Right now, xarray's methods for creating objects from pandas always create dense arrays, but the size of these dense arrays can get big really quickly if the MultiIndex is sparsely populated, e.g.,
We can imagine Once sparse arrays work pretty well, this could actually obviate most of the use cases for |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3206/reactions", "total_count": 3, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 3, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
479940669 | MDU6SXNzdWU0Nzk5NDA2Njk= | 3212 | Custom fill_value for from_dataframe/from_series | shoyer 1217238 | open | 0 | 0 | 2019-08-13T03:22:46Z | 2020-04-06T20:40:26Z | MEMBER | It would be to have the option to customize the fill value when creating an xarray objects from pandas, instead of requiring to always be NaN. This would probably be especially useful when creating sparse arrays (https://github.com/pydata/xarray/issues/3206), for which it often makes sense to use a fill value of zero. If your data has integer values (e.g., it represents counts), you probably don't want to let it be cast to float first. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/3212/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
314482923 | MDU6SXNzdWUzMTQ0ODI5MjM= | 2061 | Backend specific conventions decoding | shoyer 1217238 | open | 0 | 1 | 2018-04-16T02:45:46Z | 2020-04-05T23:42:34Z | MEMBER | Currently, we have a single function This is appropriate for netCDF data, but it's not appropriate for backends with different implementations. For example, it doesn't work for zarr (which is why we have the separate Instead, we should declare default decoders as part of the backend API, and use those decoders as the defaults for This should probably be tackled as part of the broader backends refactor: https://github.com/pydata/xarray/issues/1970 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/2061/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
28376794 | MDU6SXNzdWUyODM3Njc5NA== | 25 | Consistent rules for handling merges between variables with different attributes | shoyer 1217238 | closed | 0 | 13 | 2014-02-26T22:37:01Z | 2020-04-05T19:13:13Z | 2014-09-04T06:50:49Z | MEMBER | Currently, variable attributes are checked for equality before allowing for a merge via a call to The right design of this feature should probably include some optional argument to We can argue about which of these should be the default option. My inclination is to be as flexible as possible by using 1 or 2 in most cases. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/25/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
173612265 | MDU6SXNzdWUxNzM2MTIyNjU= | 988 | Hooks for custom attribute handling in xarray operations | shoyer 1217238 | open | 0 | 24 | 2016-08-27T19:48:22Z | 2020-04-05T18:19:11Z | MEMBER | Over in #964, I am working on a rewrite/unification of the guts of xarray's logic for computation with labelled data. The goal is to get all of xarray's internal logic for working with labelled data going through a minimal set of flexible functions which we can also expose as part of the API. Because we will finally have all (or at least nearly all) xarray operations using the same code path, I think it will also finally become feasible to open up hooks allowing extensions how xarray handles metadata. Two obvious use cases here are units (#525) and automatic maintenance of metadata (e.g., I like the idea of supporting something like NumPy's Feedback would be greatly appreciated. CC @darothen @rabernat @jhamman @pwolfram |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/988/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
29136905 | MDU6SXNzdWUyOTEzNjkwNQ== | 60 | Implement DataArray.idxmax() | shoyer 1217238 | closed | 0 | 1.0 741199 | 14 | 2014-03-10T22:03:06Z | 2020-03-29T01:54:25Z | 2020-03-29T01:54:25Z | MEMBER | Should match the pandas function: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.idxmax.html |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/60/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);