issues
42 rows where type = "issue" and user = 4160723 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: comments, closed_at, created_at (date), updated_at (date), closed_at (date)
id | node_id | number | title | user | state | locked | assignee | milestone | comments | created_at | updated_at ▲ | closed_at | author_association | active_lock_reason | draft | pull_request | body | reactions | performed_via_github_app | state_reason | repo | type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1389295853 | I_kwDOAMm_X85Szvjt | 7099 | Pass arbitrary options to sel() | benbovy 4160723 | open | 0 | 4 | 2022-09-28T12:44:52Z | 2024-04-30T00:44:18Z | MEMBER | Is your feature request related to a problem?Currently It would be also useful for custom indexes to expose their own selection options, e.g.,
From #3223, it would be nice if we could also pass distinct options values per index. What would be a good API for that? Describe the solution you'd likeSome ideas: A. Allow passing a tuple
B. Expose an
Option A does not look very readable. Option B is slightly better, although the nested dictionary is not great. Any other ideas? Some sort of context manager? Some Describe alternatives you've consideredThe API proposed in #3223 would look great if Additional contextNo response |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7099/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
915057433 | MDU6SXNzdWU5MTUwNTc0MzM= | 5452 | [community] Flexible indexes meeting | benbovy 4160723 | closed | 0 | 7 | 2021-06-08T13:32:16Z | 2024-02-15T01:39:08Z | 2024-02-15T01:39:08Z | MEMBER | In addition to the bi-weekly community developers meeting, we plan to have 30min meetings on a weekly basis -- every Tue 8:30-9:00 PDT (17:30-18:00 CEST) -- to discuss the flexible indexes refactor. Anyone from @pydata/xarray feel free to join! The first meeting is in a couple of hours. Zoom link (subject to change). |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5452/reactions", "total_count": 5, "+1": 5, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1861543091 | I_kwDOAMm_X85u9OSz | 8097 | Documentation rendering issues (dark mode) | benbovy 4160723 | open | 0 | 2 | 2023-08-22T14:06:03Z | 2024-02-13T02:31:10Z | MEMBER | What is your issue?There is a couple of rendering issues in Xarray's documentation landing page, especially with the dark mode.
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8097/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
213004586 | MDU6SXNzdWUyMTMwMDQ1ODY= | 1303 | `xarray.core.variable.as_variable()` part of the public API? | benbovy 4160723 | closed | 0 | 5 | 2017-03-09T11:07:52Z | 2024-02-06T17:57:21Z | 2017-06-02T17:55:12Z | MEMBER | Is it safe to use I have a specific use case where this would be very useful. I'm working on a package that heavily uses and extends xarray for landscape evolution modeling, and inside a custom class for model parameters I want to be able to create Although I know that ```python import xarray as xr class Parameter(object):
``` I don't think it is a viable option to copy A workaround using only public API would be something like: ```python class Parameter(object):
``` but it feels a bit hacky. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1303/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
667864088 | MDU6SXNzdWU2Njc4NjQwODg= | 4285 | Awkward array backend? | benbovy 4160723 | open | 0 | 38 | 2020-07-29T13:53:45Z | 2023-12-30T18:47:48Z | MEMBER | Just curious if anyone here has thoughts on this. For more context: Awkward is like numpy but for arrays of very arbitrary (dynamic) structure. I don't know much yet about that library (I've just seen this SciPy 2020 presentation), but now I could imagine using xarray for dealing with labelled collections of geometrical / geospatial objects like polylines or polygons. At this stage, any integration between xarray and awkward arrays would be something highly experimental, but I think this might be an interesting case for flexible arrays (and possibly flexible indexes) mentioned in the roadmap. There is some discussion here: https://github.com/scikit-hep/awkward-1.0/issues/27. Does anyone see any other potential use case? cc @pydata/xarray |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4285/reactions", "total_count": 6, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1989356758 | I_kwDOAMm_X852kyzW | 8447 | Improve discoverability of backend engine options | benbovy 4160723 | open | 0 | 5 | 2023-11-12T11:14:56Z | 2023-12-12T20:30:28Z | MEMBER | Is your feature request related to a problem?Backend engine options are not easily discoverable and we need to know or figure out them before passing it as kwargs to Describe the solution you'd likeThe solution is similar to the one proposed in #8002 for setting a new index. The API could look like this: ```python import xarray as xr ds = xr.open_dataset( file_or_obj, engine=xr.backends.engine("myengine").with_options( option1=True, option2=100, ), ) ``` where We would need to extend the API for ```python class BackendEntrypoint: _open_dataset_options: dict[str, Any]
``` Such that ```python class MyEngineBackendEntryPoint(BackendEntrypoint): open_dataset_parameters = ("option1", "option2")
``` Pros:
Cons:
Describe alternatives you've consideredA Additional contextcc @jsignell https://github.com/stac-utils/pystac/issues/846#issuecomment-1405758442 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8447/reactions", "total_count": 4, "+1": 4, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1148021907 | I_kwDOAMm_X85EbWyT | 6293 | Explicit indexes: next steps | benbovy 4160723 | open | 0 | 3 | 2022-02-23T12:19:38Z | 2023-12-01T09:34:28Z | MEMBER | 5692 is ~~not merged yet~~ now merged ~~but~~ and we can ~~already~~ start thinking about the next steps. I’m opening this issue to list and track the remaining tasks. @pydata/xarray, do not hesitate to add a comment below if you think about something that is missing here.Continue the refactoring of the internalsAlthough in #5692 everything seems to work with the current pandas index wrappers for dimension coordinates, not all of Xarray's internals have been refactored yet to fully support (or at least be compatible with) custom indexes. Here is a list of
I ended up following a common pattern in #5692 when adding explicit / flexible index support for various features (it is quite generic, though, the actual procedure may vary from one case to another and many steps may be skipped):
Relax all constraints related to “dimension (index) coordinates” in Xarray
Indexes repr
Public API for assigning and (re)setting indexesThere is no public API yet for creating and/or assigning existing indexes to Dataset and DataArray objects.
We still need to figure out how best we can (1) assign existing indexes (possibly with their coordinates) and (2) pass index build options. Other public API for index-based operationsTo fully leverage the power and flexibility of custom indexes, we might want to update some parts of Xarray’s public API in order to allow passing arbitrary options per index. For example:
Also:
Documentation
Index types and helper classes built in Xarray
3rd party indexes
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6293/reactions", "total_count": 12, "+1": 6, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 6, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1890893841 | I_kwDOAMm_X85wtMAR | 8171 | Fancy reprs | benbovy 4160723 | open | 0 | 10 | 2023-09-11T16:46:43Z | 2023-09-15T21:07:52Z | MEMBER | What is your issue?In Xarray we already have the plain-text and html reprs, which is great. Recently, I've tried anywidget and I think that it has potential to overcome some of the limitations of the current repr and possibly go well beyond it. The main advantages of anywidget:
I don't think we should replace the current html repr (it is still useful to have a basic, pure HTML/CSS version), but having a new widget could improve some aspects like not including the whole CSS each time an object repr is displayed, removing some HTML/CSS hacks... and actually has much more potential since we would have the whole javascript ecosystem at our fingertips (quick plots, etc.). Also bi-directional communication with Python is possible. I'm opening this issue to brainstorm about what would be nice to have in widget-based Xarray reprs:
cc @pydata/xarray |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8171/reactions", "total_count": 5, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 2, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1889195671 | I_kwDOAMm_X85wmtaX | 8166 | Dataset.from_dataframe: deprecate expanding the multi-index | benbovy 4160723 | open | 0 | 3 | 2023-09-10T15:54:31Z | 2023-09-11T06:20:50Z | MEMBER | What is your issue?Let's continue here the discussion about changing the behavior of Dataset.from_dataframe (see https://github.com/pydata/xarray/pull/8140#issuecomment-1712485626).
If we don't unstack anymore the multi-index in ```python ds = xr.Dataset( {"foo": (("x", "y"), [[1, 2], [3, 4]])}, coords={"x": ["a", "b"], "y": [1, 2]}, ) df = ds.to_dataframe() ds2 = xr.Dataset.from_dataframe(df, dim="z") ds2.identical(ds) # False ds2.unstack("z").identical(ds) # True ``` cc @max-sixty @dcherian |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8166/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1364388790 | I_kwDOAMm_X85RUuu2 | 7002 | Custom indexes and coordinate (re)ordering | benbovy 4160723 | open | 0 | 2 | 2022-09-07T09:44:12Z | 2023-08-23T14:35:32Z | MEMBER | What is your issue?(From https://github.com/pydata/xarray/issues/5647#issuecomment-946546464). The current alignment logic (as refactored in #5692) requires that two compatible indexes (i.e., of the same type) must relate to one or more coordinates with matching names but also in a matching order. For some multi-coordinate indexes like Possible options:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7002/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
979316661 | MDU6SXNzdWU5NzkzMTY2NjE= | 5738 | Flexible indexes: how to handle possible dimension vs. coordinate name conflicts? | benbovy 4160723 | closed | 0 | 4 | 2021-08-25T15:31:39Z | 2023-08-23T13:28:41Z | 2023-08-23T13:28:40Z | MEMBER | Another thing that I've noticed while working on #5692. Currently it is not possible to have a Dataset with a same name used for both a dimension and a multi-index level. I guess the reason is to prevent some errors like unmatched dimension sizes when eventually the multi-index is dropped with renamed dimension(s) according to the level names (e.g., with I'm wondering how we should handle this in the context of flexible / custom indexes: A. Keep this current behavior as a special case for (pandas) multi-indexes. This would avoid breaking changes but how to support custom indexes that could eventually be used like pandas multi-indexes in B. Introduce some tag in C. Do not allow any dimension name matching the name of a coordinate attached to a multi-coordinate index. This seems silly? D. Eventually revert #2353 and let users taking care of potential conflicts. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5738/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1175329407 | I_kwDOAMm_X85GDhp_ | 6392 | Pass indexes to the Dataset and DataArray constructors | benbovy 4160723 | closed | 0 | 6 | 2022-03-21T12:41:51Z | 2023-07-21T20:40:05Z | 2023-07-21T20:40:04Z | MEMBER | Is your feature request related to a problem?This is part of #6293 (explicit indexes next steps). Describe the solution you'd likeA pros:
cons:
An example with a pandas multi-indexCurrently a pandas multi-index may be passed directly as one (dimension) coordinate ; it is then "unpacked" into one dimension (tuple values) coordinate and one or more level coordinates. I would suggest depreciating this behavior in favor of a more explicit (although more verbose) way to pass an existing pandas multi-index: ```python import pandas as pd import xarray as xr pd_idx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) idx = xr.PandasMultiIndex(pd_idx, "x") indexes = {"x": idx, "foo": idx, "bar": idx} coords = idx.create_variables() ds = xr.Dataset(coords=coords, indexes=indexes) ``` The cases below should raise an error: ```python ds = xr.Dataset(indexes=indexes) ValueError: missing coordinate(s) for index(es): 'x', 'foo', 'bar'ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx}, ) ValueError: missing index(es) for coordinate(s): 'bar'ds = xr.Dataset( coords={"x": coords["x"], "foo": [0, 1, 2, 3], "bar": coords["bar"]}, indexes=indexes, ) ValueError: conflict between coordinate(s) and index(es): 'foo'ds = xr.Dataset( coords=coords, indexes={"x": idx, "foo": idx, "bar": xr.PandasIndex([0, 1, 2], "y")}, ) ValueError: conflict between coordinate(s) and index(es): 'bar'``` Should we raise an error or simply ignore the index in the case below? ```python ds = xr.Dataset(coords=coords) ValueError: missing index(es) for coordinate(s): 'x', 'foo', 'bar'orcreate unindexed coordinates 'foo' and 'bar' and a 'x' coordinate with a single pandas index``` Should we silently reorder the coordinates and/or indexes when the levels are not passed in the right order? It seems odd requiring mapping elements be passed in a given order. ```python ds = xr.Dataset(coords=coords, indexes={"bar": idx, "x": idx, "foo": idx}) list(ds.xindexes.keys()) ["x", "foo", "bar"]``` How to generalize to any (custom) index?With the case of multi-index, it is pretty easy to check whether the coordinates and indexes are consistent because we ensure consistent However, this may not be easy for other indexes. Some Xarray custom indexes (like a KD-Tree index) likely won't return anything from How could we solve this?
I think I prefer the second option. Describe alternatives you've consideredAlso allow passing index types (and build options) via
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6392/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1812008663 | I_kwDOAMm_X85sAQ7X | 8002 | Improve discoverability of index build options | benbovy 4160723 | open | 0 | 2 | 2023-07-19T13:54:09Z | 2023-07-19T17:48:51Z | MEMBER | Is your feature request related to a problem?Currently Describe the solution you'd likeWhat about something like this? ```python ds.set_xindex("x", MyCustomIndex.with_options(foo=1, bar=True)) ords.set_xindex("x", *MyCustomIndex.with_options(foo=1, bar=True)) ``` This would require adding a ```python xarray.core.indexesclass Index: @classmethod def with_options(cls) -> tuple[type[Self], dict[str, Any]]: return cls, {} ``` ```python third-party codefrom xarray.indexes import Index class MyCustomIndex(Index):
``` Thoughts? Describe alternatives you've consideredBuild options are also likely defined in the Index constructor, e.g., ```python third-party codefrom xarray.indexes import Index class MyCustomIndex(Index):
``` However, the Index constructor is not public API (only used internally and indirectly in Xarray when setting a new index from existing coordinates). Any other idea? Additional contextNo response |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/8002/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1472470718 | I_kwDOAMm_X85XxB6- | 7346 | assign_coords reset all dimension coords to default (pandas) index | benbovy 4160723 | closed | 0 | 0 | 2022-12-02T08:07:55Z | 2022-12-02T16:32:41Z | 2022-12-02T16:32:41Z | MEMBER | What happened?See https://github.com/martinfleis/xvec/issues/13#issue-1472023524 What did you expect to happen?
Minimal Complete Verifiable ExampleSee https://github.com/martinfleis/xvec/issues/13#issue-1472023524 MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
Xarray version 2022.11.0
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/7346/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1151751524 | I_kwDOAMm_X85EplVk | 6308 | xr.doctor(): diagnostics on a Dataset / DataArray ? | benbovy 4160723 | open | 0 | 4 | 2022-02-26T12:10:07Z | 2022-11-07T15:28:35Z | MEMBER | Is your feature request related to a problem?Recently I've been reading through various issue reports here and there (GH issues and discussions, forums, etc.) and I'm wondering if it wouldn't be useful to have some function in Xarray that inspects a Dataset or DataArray and reports a bunch of diagnostics, so that the community could better help troubleshooting performance or other issues faced by users. It's not always obvious where to look (e.g., number of chunks of a dask array, number of tasks of a dask graph, etc.) to diagnose issues, sometimes even for experienced users. Describe the solution you'd likeA
Describe alternatives you've consideredNone Additional contextNo response |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6308/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1322198907 | I_kwDOAMm_X85Ozyd7 | 6849 | Public API for setting new indexes: add a set_xindex method? | benbovy 4160723 | closed | 0 | 5 | 2022-07-29T12:38:34Z | 2022-09-28T07:25:16Z | 2022-09-28T07:25:16Z | MEMBER | What is your issue?xref https://github.com/pydata/xarray/pull/6795#discussion_r932665544 and #6293 (Public API section). The
Thoughts @pydata/xarray? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6849/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1361896826 | I_kwDOAMm_X85RLOV6 | 6989 | reset multi-index to single index (level): coordinate not renamed | benbovy 4160723 | closed | 0 | benbovy 4160723 | 0 | 2022-09-05T12:45:22Z | 2022-09-27T10:35:39Z | 2022-09-27T10:35:39Z | MEMBER | What happened?Resetting a multi-index to a single level (i.e., a single index) does not rename the remaining level coordinate to the dimension name. What did you expect to happen?While it is certainly more consistent not to rename the level coordinate here (since an index can be assigned to a non-dimension coordinate now), it breaks from the old behavior. I think it's better not introduce any breaking change. As discussed elsewhere, we might eventually want to deprecate Minimal Complete Verifiable Example```Python import pandas as pd import xarray as xr midx = pd.MultiIndex.from_product([["a", "b"], [1, 2]], names=("foo", "bar")) ds = xr.Dataset(coords={"x": midx}) <xarray.Dataset>Dimensions: (x: 4)Coordinates:* x (x) object MultiIndex* foo (x) object 'a' 'a' 'b' 'b'* bar (x) int64 1 2 1 2Data variables:emptyrds = ds.reset_index("foo") v2022.03.0<xarray.Dataset>Dimensions: (x: 4)Coordinates:* x (x) int64 1 2 1 2foo (x) object 'a' 'a' 'b' 'b'Data variables:emptyv2022.06.0<xarray.Dataset>Dimensions: (x: 4)Coordinates:foo (x) object 'a' 'a' 'b' 'b'* bar (x) int64 1 2 1 2Dimensions without coordinates: xData variables:empty``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6989/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | |||||
1361626450 | I_kwDOAMm_X85RKMVS | 6987 | Indexes.get_unique() TypeError with pandas indexes | benbovy 4160723 | closed | 0 | benbovy 4160723 | 0 | 2022-09-05T09:02:50Z | 2022-09-23T07:30:39Z | 2022-09-23T07:30:39Z | MEMBER | @benbovy I also just tested the Taking the above dataset ```python
TypeError: unhashable type: 'MultiIndex' ``` However, for
[<xarray.core.indexes.PandasMultiIndex at 0x7f105bf1df20>] ``` Originally posted by @lukasbindreiter in https://github.com/pydata/xarray/issues/6752#issuecomment-1236717180 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6987/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | |||||
302077805 | MDU6SXNzdWUzMDIwNzc4MDU= | 1961 | Extend xarray with custom "coordinate wrappers" | benbovy 4160723 | closed | 0 | 10 | 2018-03-04T11:26:15Z | 2022-09-19T08:47:45Z | 2022-09-19T08:47:44Z | MEMBER | Recent and ongoing developments in xarray turn DataArray and Dataset more and more into data wrappers that are extensible at (almost) every level:
Regarding the latter, I’m thinking about the idea of extending xarray at an even more abstract level, i.e., the possibility of adding / registering "coordinate wrappers" to EDIT: "coordinate agents" may not be quite right here, I changed that to "coordinate wrappers") Indexes are a specific case of coordinate wrappers that serve the purpose of indexing. This is built in xarray. While indexing is enough in 80% of cases, I see a couple of use cases where other coordinate wrappers (built outside of xarray) would be nice to have:
In those examples we usually rely on coordinate attributes and/or classes that encapsulate xarray objects to implement the specific features that we need. While it works, it has limitations and I think it can be improved. Custom coordinate wrappers would be a way of extending xarray that is very consistent with other current (or considered) extension mechanisms. This is still a very vague idea and I’m sure that there are lots of details that can be discussed (serialization, etc.). But before going further, I’d like to know your thoughts @pydata/xarray. Do you think it is a silly idea? Do you have in mind other use cases where custom coordinate wrappers would be useful? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1961/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
955936490 | MDU6SXNzdWU5NTU5MzY0OTA= | 5647 | Flexible indexes: review the implementation of alignment and merge | benbovy 4160723 | closed | 0 | 12 | 2021-07-29T15:03:23Z | 2022-09-07T09:47:13Z | 2022-09-07T09:47:13Z | MEMBER | The current implementation of the
This currently works well since a pd.Index can be directly treated as a 1-d array but this won’t be always the case anymore with custom indexes. I'm opening this issue to gather ideas on how best to handle alignment in a more flexible way (I haven't been thinking much at this problem yet). |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5647/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1325016510 | I_kwDOAMm_X85O-iW- | 6860 | Align with join='override' may update index coordinate metadata | benbovy 4160723 | open | 0 | 0 | 2022-08-01T21:45:13Z | 2022-08-01T21:49:41Z | MEMBER | What happened?It seems that cf. @keewis' original https://github.com/pydata/xarray/pull/6857#discussion_r934425142. What did you expect to happen?Index coordinate metadata unaffected by alignment (i.e., metadata is passed through object -> aligned object for each object), like for align with other join methods. Minimal Complete Verifiable Example```Python import xarray as xr ds1 = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"foo": 1})}) ds2 = xr.Dataset(coords={"x": ("x", [1, 2, 3], {"bar": 2})}) aligned1, aligned2 = xr.align(ds1, ds2, join="override") aligned1.x.attrs v2022.03.0 -> {'foo': 1}v2022.06.0 -> {'foo': 1, 'bar': 2}PR #6857 -> {'foo': 1}expected -> {'foo': 1}aligned2.x.attrs v2022.03.0 -> {}v2022.06.0 -> {'foo': 1, 'bar': 2}PR #6857 -> {'foo': 1, 'bar': 2}expected -> {'bar': 2}aligned11, aligned22 = xr.align(ds1, ds2, join="inner") aligned11.x.attrs {'foo': 1}aligned22.x.attrs {'bar': 2}``` MVCE confirmation
Relevant log outputNo response Anything else we need to know?No response Environment
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:36:15)
[Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 20.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.21.2.dev137+g30023a484
pandas: 1.4.0
numpy: 1.22.2
scipy: 1.7.1
netCDF4: 1.5.8
pydap: installed
h5netcdf: 0.11.0
h5py: 3.4.0
Nio: None
zarr: 2.6.1
cftime: 1.5.2
nc_time_axis: 1.2.0
PseudoNetCDF: installed
rasterio: 1.2.10
cfgrib: 0.9.8.5
iris: 3.0.4
bottleneck: 1.3.2
dask: 2022.01.1
distributed: 2022.01.1
matplotlib: 3.4.3
cartopy: 0.20.1
seaborn: 0.11.1
numbagg: 0.2.1
fsspec: 0.8.5
cupy: None
pint: 0.16.1
sparse: 0.13.0
flox: None
numpy_groupies: None
setuptools: 57.4.0
pip: 20.2.4
conda: None
pytest: 6.2.5
IPython: 7.27.0
sphinx: 3.3.1
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/6860/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1322190255 | I_kwDOAMm_X85OzwWv | 6848 | Update API | benbovy 4160723 | closed | 0 | 0 | 2022-07-29T12:30:08Z | 2022-07-29T12:30:23Z | 2022-07-29T12:30:23Z | MEMBER | { "url": "https://api.github.com/repos/pydata/xarray/issues/6848/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | |||||||
968796847 | MDU6SXNzdWU5Njg3OTY4NDc= | 5697 | Coerce the labels passed to Index.query to array-like objects | benbovy 4160723 | closed | 0 | 3 | 2021-08-12T13:09:40Z | 2022-03-17T17:11:43Z | 2022-03-17T17:11:43Z | MEMBER | When looking at #5691 I noticed that the labels are sometimes coerced to arrays (i.e., #3153) but not always. Later in Shouldn't we therefore make things easier and ensure that the labels given to |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5697/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
968990058 | MDU6SXNzdWU5Njg5OTAwNTg= | 5700 | Selection with multi-index and float32 values | benbovy 4160723 | closed | 0 | 0 | 2021-08-12T14:55:11Z | 2022-03-17T17:11:43Z | 2022-03-17T17:11:43Z | MEMBER | I guess it's rather an edge case, but a similar issue than the one fixed in #3153 may occur with multi-indexes: ```python
```python
```python
(xarray version: 0.18.2 as there's a regression introduced in 0.19.0 #5691) |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5700/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
955605233 | MDU6SXNzdWU5NTU2MDUyMzM= | 5645 | Flexible indexes: handle renaming coordinate variables | benbovy 4160723 | closed | 0 | 0 | 2021-07-29T08:42:00Z | 2022-03-17T17:11:42Z | 2022-03-17T17:11:42Z | MEMBER | We should have some API in This currently implemented here where the underlying This logic should be moved into Other, custom indexes might also have internal attributes to update, so we might need formal API for that. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5645/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
1005623261 | I_kwDOAMm_X8478Jfd | 5812 | Check explicit indexes when comparing two xarray objects | benbovy 4160723 | open | 0 | 2 | 2021-09-23T16:19:32Z | 2021-09-24T15:59:02Z | MEMBER | Is your feature request related to a problem? Please describe.
With the explicit index refactor, two Dataset or DataArray objects Describe the solution you'd like
I'd suggest that One drawback is when we want to check either the attributes or the indexes but not both. Should we add options like suggested in #5733 then? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5812/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
1006335177 | I_kwDOAMm_X847-3TJ | 5814 | Confusing assertion message when comparing datasets with differing coordinates | benbovy 4160723 | open | 0 | 1 | 2021-09-24T10:50:11Z | 2021-09-24T15:17:00Z | MEMBER | What happened:
When two datasets What you expected to happen: An output assertion error message that shows only the differing coordinates. Minimal Complete Verifiable Example: ```python
Differing coordinates: L * x (x) int64 0 1 R * x (x) int64 2 3 Differing data variables: L var (x) float64 10.0 11.0 R var (x) float64 10.0 11.0 ``` I would rather expect: ```python
Differing coordinates: L * x (x) int64 0 1 R * x (x) int64 2 3 ``` Anything else we need to know?: Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:36:15) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 20.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.1.dev72+ga8d84c703.d20210901 pandas: 1.3.2 numpy: 1.21.2 scipy: 1.7.1 netCDF4: 1.5.6 pydap: installed h5netcdf: 0.8.1 h5py: 3.3.0 Nio: None zarr: 2.6.1 cftime: 1.5.0 nc_time_axis: 1.2.0 PseudoNetCDF: installed rasterio: 1.2.1 cfgrib: 0.9.8.5 iris: 3.0.4 bottleneck: 1.3.2 dask: 2021.01.1 distributed: 2021.01.1 matplotlib: 3.4.3 cartopy: 0.18.0 seaborn: 0.11.1 numbagg: None fsspec: 0.8.5 cupy: None pint: 0.16.1 sparse: 0.11.2 setuptools: 57.4.0 pip: 20.2.4 conda: None pytest: 6.2.5 IPython: 7.27.0 sphinx: 3.3.1 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5814/reactions", "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
985162305 | MDU6SXNzdWU5ODUxNjIzMDU= | 5755 | Mypy errors with the last version of _typed_ops.pyi | benbovy 4160723 | closed | 0 | 5 | 2021-09-01T13:34:52Z | 2021-09-13T10:53:16Z | 2021-09-13T00:04:54Z | MEMBER | What happened: Since #5569 I get a lot of mypy errors from
I also tried @max-sixty @Illviljan Any idea on what's happening? What you expected to happen: No mypy error in all cases. Anything else we need to know?:
Environment: mypy 0.910 python 3.9.6 (also tested with 3.8) |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5755/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
977149831 | MDU6SXNzdWU5NzcxNDk4MzE= | 5732 | Coordinates implicitly created when passing a DataArray as coord to Dataset constructor | benbovy 4160723 | open | 0 | 3 | 2021-08-23T15:20:37Z | 2021-08-24T14:18:09Z | MEMBER | I stumbled on this while working on #5692. Is this intended behavior or unwanted side effect? What happened: Create a new Dataset by passing a DataArray object as coordinate also add the DataArray coordinates to the dataset: ```python
What you expected to happen: The behavior above seems a bit counter-intuitive to me. I would rather expect no additional coordinates auto-magically added to the dataset, i.e. only one ```python
Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Nov 27 2020, 19:17:44) [Clang 11.0.0 ] python-bits: 64 OS: Darwin OS-release: 20.3.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: None LOCALE: (None, 'UTF-8') libhdf5: 1.10.6 libnetcdf: 4.7.4 xarray: 0.19.0 pandas: 1.1.5 numpy: 1.21.1 scipy: 1.7.0 netCDF4: 1.5.5.1 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.3.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2021.07.2 distributed: 2021.07.2 matplotlib: 3.3.3 cartopy: 0.19.0.post1 seaborn: None numbagg: None pint: None setuptools: 49.6.0.post20201009 pip: 20.3.1 conda: None pytest: 6.1.2 IPython: 7.25.0 sphinx: 3.3.1 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5732/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
933551030 | MDU6SXNzdWU5MzM1NTEwMzA= | 5553 | Flexible indexes: how best to implement the new data model? | benbovy 4160723 | closed | 0 | 2 | 2021-06-30T10:38:13Z | 2021-08-09T07:56:56Z | 2021-08-09T07:56:56Z | MEMBER | Yesterday during the flexible indexes weekly meeting we have discussed with @shoyer and @jhamman on what would be the best approach to implement the new data model described here. In this issue I summarize the implementation of the current data model as well as some suggestions for the new data model along with their pros / cons (I might still be missing important ones!). I don't think there's an easy or ideal solution unfortunately, so @pydata/xarray any feedback would be very welcome! Current data model implementationCurrently any (pandas) index is wrapped into an Proposed alternativesOption 1: independent (coordinate) variables and indexesIndexes and coordinates are loosely coupled, i.e., a Pros:
Cons:
Option 2: indexes hold coordinate variablesThis is the opposite approach of the current one. Here, a Pros:
Cons:
Option 3: intermediate solutionWhen an index is set (or unset), it returns a new set of coordinate variables to replace the existing ones. Pros:
Cons:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5553/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
187859705 | MDU6SXNzdWUxODc4NTk3MDU= | 1092 | Dataset groups | benbovy 4160723 | closed | 0 | 20 | 2016-11-07T23:28:36Z | 2021-07-02T19:56:50Z | 2021-07-02T19:56:49Z | MEMBER | EDIT: see https://github.com/pydata/xarray/issues/4118 for ongoing discussion Probably it has been already suggested, but similarly to netCDF4 groups it would be nice if we could access Currently xarray allows loading a specific netCDF4 group into a I think about an implementation of
Questions:
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1092/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
902009258 | MDU6SXNzdWU5MDIwMDkyNTg= | 5376 | Multi-scale datasets and custom indexes | benbovy 4160723 | open | 0 | 6 | 2021-05-26T08:38:00Z | 2021-06-02T08:07:38Z | MEMBER | I've been wondering if:
I'm thinking of an API that would look like this: ```python lazily load a big n-d image (full resolution) as a xarray.Datasetxyz_dataset = ... set a new index for the x/y/z coordinates(
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5376/reactions", "total_count": 1, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 1 } |
xarray 13221727 | issue | ||||||||
869721207 | MDU6SXNzdWU4Njk3MjEyMDc= | 5226 | Attributes encoding compatibility between backends | benbovy 4160723 | open | 0 | 1 | 2021-04-28T09:11:19Z | 2021-04-28T15:42:42Z | MEMBER | What happened: Let's create an Zarr dataset with some "less common" dtype and fill value, open it with Xarray and save the dataset as NetCDF: ```python import xarray as xr import zarr g = zarr.group() g.create('arr', shape=3, fill_value='z', dtype='<U1') g['arr'].attrs['_ARRAY_DIMENSIONS'] = ('dim_1') -- without masking fill valuesds = xr.open_zarr(g.store, mask_and_scale=False) ds.arr.attrs # returns {'_FillValue': 'z'} error: netCDF4 does not yet support setting a fill value for variable-length stringsds.to_netcdf('test.nc') -- with masking fill valuesds2 = xr.open_zarr(g.store, mask_and_scale=True) returns a dict that includes item _FillValue': 'z'ds2.arr.encoding same error than aboveds2.to_netcdf('out2.nc') ``` What you expected to happen: Seamless conversion (read/write) from one backend to another. Is there anything we could do to improve the case shown here above, and maybe other cases like the one described in #5223? Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None libhdf5: None libnetcdf: None xarray: 0.17.0 pandas: 1.0.3 numpy: 1.18.1 scipy: 1.3.1 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.8.1 cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.11.0 distributed: 2.14.0 matplotlib: 3.1.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 46.1.3.post20200325 pip: 19.2.3 conda: None pytest: 5.4.1 IPython: 7.13.0 sphinx: None |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/5226/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
733077617 | MDU6SXNzdWU3MzMwNzc2MTc= | 4555 | Vectorized indexing (isel) of chunked data with 1D indices gives weird chunks | benbovy 4160723 | open | 0 | 1 | 2020-10-30T10:55:33Z | 2021-03-02T17:36:48Z | MEMBER | What happened: Applying What you expected to happen: More consistent chunk sizes. Minimal Complete Verifiable Example: Let's create a chunked DataArray ```python In [1]: import numpy as np In [2]: import xarray as xr In [3]: da = xr.DataArray(np.random.rand(100), dims='points').chunk(50) In [4]: da Out[4]: <xarray.DataArray (points: 100)> dask.array<xarray-\<this-array>, shape=(100,), dtype=float64, chunksize=(50,), chunktype=numpy.ndarray> Dimensions without coordinates: points ``` Select random indices results in a lot of small chunks ```python In [5]: indices = xr.Variable('nodes', np.random.choice(np.arange(100, dtype='int'), size=10)) In [6]: da_sel = da.isel(points=indices) In [7]: da_sel.chunks Out[7]: ((1, 1, 3, 1, 1, 3),) ``` What I would expect
This works fine with 2+ dimensional indexers, e.g., ```python In [9]: indices_2d = xr.Variable(('x', 'y'), np.random.choice(np.arange(100), size=(10, 10))) In [10]: da_sel_2d = da.isel(points=indices_2d) In [11]: da_sel_2d.chunks Out[11]: ((10,), (10,)) ``` Anything else we need to know?: I suspect the issue is here: In the example above I think we still want vectorized indexing (i.e., call Environment: Output of <tt>xr.show_versions()</tt>INSTALLED VERSIONS ------------------ commit: None python: 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:21:09) [Clang 9.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.UTF-8 libhdf5: None libnetcdf: None xarray: 0.16.1 pandas: 1.1.3 numpy: 1.19.1 scipy: 1.5.2 netCDF4: None pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: None nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.19.0 distributed: 2.25.0 matplotlib: 3.3.1 cartopy: None seaborn: None numbagg: None pint: None setuptools: 47.3.1.post20200616 pip: 20.1.1 conda: None pytest: 5.4.3 IPython: 7.16.1 sphinx: 3.2.1 |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/4555/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
187873247 | MDU6SXNzdWUxODc4NzMyNDc= | 1094 | Supporting out-of-core computation/indexing for very large indexes | benbovy 4160723 | open | 0 | 5 | 2016-11-08T00:56:56Z | 2021-01-26T20:09:12Z | MEMBER | (Follow-up of discussion here https://github.com/pydata/xarray/pull/1024#issuecomment-258524115). xarray + dask.array successfully enable out-of-core computation for very large variables that doesn't fit in memory. One current limitation is that the indexes of a However, this may be problematic in some specific cases where we have to deal with very large indexes. As an example, big unstructured meshes often have coordinates (x, y, z) arranged as 1-d arrays of length that equals the number of nodes, which can be very large!! (See, e.g., ugrid conventions). It would be very nice if xarray could also help for these use cases. Therefore I'm wondering if (and how) out-of-core support can be extended to indexes and indexing. I've briefly looked at the documentation on My knowledge of dask is very limited, though. So I've no doubt that this suggestion is very simplistic and not very efficient, or that there are better approaches. I'm also certainly missing other issues not directly related to indexing. Any thoughts? cc @shoyer @mrocklin |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1094/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xarray 13221727 | issue | ||||||||
264747372 | MDU6SXNzdWUyNjQ3NDczNzI= | 1627 | html repr of xarray object (for the notebook) | benbovy 4160723 | closed | 0 | 39 | 2017-10-11T21:49:20Z | 2019-10-24T16:56:15Z | 2019-10-24T16:48:47Z | MEMBER | Edit: preview for
I started to think a bit more deeply about how could look like a more rich, html-based representation of xarray objects that we would see, e.g., in jupyter notebooks. Here are some ideas for Some notes:
- The html repr looks pretty similar than the plain-text repr. I think it's better if they don't differ too much from each other.
- For the sake of consistency, I've stolen some style from It is still, of course, some preliminary thoughts. Any feedback/suggestion is welcome, even opinions about whether an html repr is really needed or not! |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1627/reactions", "total_count": 11, "+1": 7, "-1": 0, "laugh": 0, "hooray": 4, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
234658224 | MDU6SXNzdWUyMzQ2NTgyMjQ= | 1447 | Package naming "conventions" for xarray extensions | benbovy 4160723 | closed | 0 | 5 | 2017-06-08T21:14:24Z | 2019-06-28T22:58:33Z | 2019-06-28T21:58:33Z | MEMBER | I'm wondering what would be a good name for a package that primarily aims at providing an xarray extension (in the form of a I'm currently thinking about using a prefix like the For example, for a xarray extension for signal processing we would have: package full name: ```python
The main advantage is that we directly have an idea on what the package is about. It may be also good for the overall visibility of both xarray and its 3rd-party extensions. The downside is that there is three name variations: one for getting and installing the package, another one for importing the package and again another one for using the accessor. This may be annoying especially for new users who are not accustomed to this kind of naming convention. Conversely, choosing a different, unrelated name like salem or pangaea has the advantage of using the same name everywhere and perhaps providing multiple accessors in the same package, but given that the number of xarray extensions is likely to grow in a next future (see, e.g., the pangeo-data project) it would become difficult to have a clear view of the whole xarray package ecosystem. Any thoughts? |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1447/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
180676935 | MDU6SXNzdWUxODA2NzY5MzU= | 1030 | Concatenate multiple variables into one variable with a multi-index (categories) | benbovy 4160723 | closed | 0 | 3 | 2016-10-03T15:54:23Z | 2019-02-25T07:25:40Z | 2019-02-25T07:25:40Z | MEMBER | I often have to deal with datasets in this form (multiple variables of different sizes, each representing different categories, on the same physical dimension but using different names as they have different labels),
where it would be more convenient to have the data re-arranged into the following form (concatenate the variables into a single variable with a multi-index with the labels of both the categories and the physical coordinate):
The latter would allow using xarray's nice features like Currently, the best way that I've found to transform the data is something like: ``` python data = np.concatenate([ds.data_band1, ds.data_band2, ds.data_band3]) wn = np.concatenate([ds.wn_band1, ds.wn_band2, ds.wn_band3]) band = np.concatenate([np.repeat(1, 4), np.repeat(2, 6), np.repeat(3, 8)]) midx = pd.MultiIndex.from_arrays([band, wn], names=('band', 'wn')) ds2 = xr.Dataset({'data': ('spectrum', data)}, coords={'spectrum': midx}) ``` Maybe I miss a better way to do this? If I don't, it would be nice to have a convenience method for this, unless this use case is too rare to be worth it. Also not sure at all on what would be a good API such a method. |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1030/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
275033174 | MDU6SXNzdWUyNzUwMzMxNzQ= | 1727 | IPython auto-completion triggers data loading | benbovy 4160723 | closed | 0 | 11 | 2017-11-18T00:14:00Z | 2017-11-18T07:09:41Z | 2017-11-18T07:09:40Z | MEMBER | I create a big netcdf file like this: ```python In [1]: import xarray as xr In [2]: import numpy as np In [3]: ds = xr.Dataset({'myvar': np.arange(100000000, dtype='float64')}) In [4]: ds.to_netcdf('test.nc') ``` Then when I open the file in a IPython console and I use auto-completion, it triggers loading the data. ```python In [1]: import xarray as xr In [2]: ds = xr.open_dataset('test.nc') In [3]: ds.my # <TAB> autocompletion with any character -> triggers loading ``` I don't have that issue using the python console. Auto-completion for dictionary access in IPython (#1632) works fine too. Output of
|
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1727/reactions", "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
274591962 | MDU6SXNzdWUyNzQ1OTE5NjI= | 1722 | Change in behavior of .set_index() from pandas 0.20.3 to 0.21.0 | benbovy 4160723 | closed | 0 | 1 | 2017-11-16T17:05:20Z | 2017-11-17T00:54:51Z | 2017-11-17T00:54:51Z | MEMBER | I use xarray 0.9.6 for both examples below. With pandas 0.20.3, ```python In [1]: import xarray as xr In [2]: import pandas as pd In [3]: pd.version Out[3]: '0.20.3' In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])}) In [5]: ds.set_index(x='grid__x') Out[5]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) int64 1 2 3 Data variables: empty ``` With pandas 0.21.0, it creates a ```python In [1]: import xarray as xr In [2]: import pandas as pd In [3]: pd.version Out[3]: '0.21.0' In [4]: ds = xr.Dataset({'grid__x': ('x', [1, 2, 3])}) In [5]: ds.set_index(x='grid__x') Out[5]: <xarray.Dataset> Dimensions: (x: 3) Coordinates: * x (x) MultiIndex - grid__x (x) int64 1 2 3 Data variables: empty ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/1722/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
134359597 | MDU6SXNzdWUxMzQzNTk1OTc= | 767 | MultiIndex and data selection | benbovy 4160723 | closed | 0 | 9 | 2016-02-17T18:24:00Z | 2016-09-14T14:28:29Z | 2016-09-14T14:28:29Z | MEMBER | [Edited for more clarity] First of all, I find the MultiIndex very useful and I'm looking forward to see the TODOs in #719 implemented in the next releases, especially the three first ones in the list! Apart from these issues, I think that some other aspects may be improved, notably regarding data selection. Or maybe I've not correctly understood how to deal with multi-index and data selection... To illustrate this, I use some fake spectral data with two discontinuous bands of different length / resolution: ``` In [1]: import pandas as pd In [2]: import xarray as xr In [3]: band = np.array(['foo', 'foo', 'bar', 'bar', 'bar']) In [4]: wavenumber = np.array([4050.2, 4050.3, 4100.1, 4100.3, 4100.5]) In [5]: spectrum = np.array([1.7e-4, 1.4e-4, 1.2e-4, 1.0e-4, 8.5e-5]) In [6]: s = pd.Series(spectrum, index=[band, wavenumber]) In [7]: s.index.names = ('band', 'wavenumber') In [8]: da = xr.DataArray(s, dims='band_wavenumber') In [9]: da Out[9]: <xarray.DataArray (band_wavenumber: 5)> array([ 1.70000000e-04, 1.40000000e-04, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05]) Coordinates: * band_wavenumber (band_wavenumber) object ('foo', 4050.2) ... ``` I extract the band 'bar' using ``` In [10]: da_bar = da.sel(band_wavenumber='bar') In [11]: da_bar Out[11]: <xarray.DataArray (band_wavenumber: 3)> array([ 1.20000000e-04, 1.00000000e-04, 8.50000000e-05]) Coordinates: * band_wavenumber (band_wavenumber) object ('bar', 4100.1) ... ``` It selects the data the way I want, although using the dimension name is confusing in this case. It would be nice if we can also use the Futhermore, Extracting the band 'bar' from the pandas ``` In [12]: s_bar = s.loc['bar'] In [13]: s_bar Out[13]: wavenumber 4100.1 0.000120 4100.3 0.000100 4100.5 0.000085 dtype: float64 ``` The problem is also that the unstacked ``` In [13]: da.unstack('band_wavenumber') Out[13]: <xarray.DataArray (band: 2, wavenumber: 5)> array([[ nan, nan, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05], [ 1.70000000e-04, 1.40000000e-04, nan, nan, nan]]) Coordinates: * band (band) object 'bar' 'foo' * wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03 In [14]: da_bar.unstack('band_wavenumber') Out[14]: <xarray.DataArray (band: 2, wavenumber: 5)> array([[ nan, nan, 1.20000000e-04, 1.00000000e-04, 8.50000000e-05], [ nan, nan, nan, nan, nan]]) Coordinates: * band (band) object 'bar' 'foo' * wavenumber (wavenumber) float64 4.05e+03 4.05e+03 4.1e+03 4.1e+03 4.1e+03 ``` |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/767/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue | ||||||
169368546 | MDU6SXNzdWUxNjkzNjg1NDY= | 942 | Filtering by data variable name | benbovy 4160723 | closed | 0 | 3 | 2016-08-04T13:01:20Z | 2016-08-04T19:09:07Z | 2016-08-04T19:09:07Z | MEMBER | Given #844 and #916, maybe it might be useful to also have a I currently deal with datasets that have many data variables with names like:
Using |
{ "url": "https://api.github.com/repos/pydata/xarray/issues/942/reactions", "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
completed | xarray 13221727 | issue |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issues] ( [id] INTEGER PRIMARY KEY, [node_id] TEXT, [number] INTEGER, [title] TEXT, [user] INTEGER REFERENCES [users]([id]), [state] TEXT, [locked] INTEGER, [assignee] INTEGER REFERENCES [users]([id]), [milestone] INTEGER REFERENCES [milestones]([id]), [comments] INTEGER, [created_at] TEXT, [updated_at] TEXT, [closed_at] TEXT, [author_association] TEXT, [active_lock_reason] TEXT, [draft] INTEGER, [pull_request] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [state_reason] TEXT, [repo] INTEGER REFERENCES [repos]([id]), [type] TEXT ); CREATE INDEX [idx_issues_repo] ON [issues] ([repo]); CREATE INDEX [idx_issues_milestone] ON [issues] ([milestone]); CREATE INDEX [idx_issues_assignee] ON [issues] ([assignee]); CREATE INDEX [idx_issues_user] ON [issues] ([user]);