home / github

Menu
  • GraphQL API
  • Search all tables

issues

Table actions
  • GraphQL API for issues

5 rows where comments = 5, state = "open" and user = 14808389 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 2

  • issue 4
  • pull 1

state 1

  • open · 5 ✖

repo 1

  • xarray 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2234142680 PR_kwDOAMm_X85sK0g8 8923 `"source"` encoding for datasets opened from `fsspec` objects keewis 14808389 open 0     5 2024-04-09T19:12:45Z 2024-04-23T16:54:09Z   MEMBER   0 pydata/xarray/pulls/8923

When opening files from path-like objects (str, pathlib.Path), the backend machinery (_dataset_from_backend_dataset) sets the "source" encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when using fsspec to open remote files.

In this PR, I'm extracting the path attribute that most fsspec objects have to set that value. I've considered using isinstance checks instead of the getattr-with-default, but the list of potential classes is too big to be practical (at least 4 classes just within fsspec itself).

If this sounds like a good idea, I'll update the documentation of the "source" encoding to mention this feature.

  • [x] Tests added
  • [ ] User visible changes (including notable bug fixes) are documented in whats-new.rst
{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8923/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
683142059 MDU6SXNzdWU2ODMxNDIwNTk= 4361 restructure the contributing guide keewis 14808389 open 0     5 2020-08-20T22:51:39Z 2023-03-31T17:39:00Z   MEMBER      

From #4355

@max-sixty:

Stepping back on the contributing doc — I admit I haven't look at it in a while — I wonder whether we can slim it down a bit, for example by linking to other docs for generic tooling — I imagine we're unlikely to have the best docs on working with GH, for example. Or referencing our PR template rather than the (now out-of-date) PR checklist.

We could also add a docstring guide since the numpydoc guide does not cover every little detail (for example, default notation, type spec vs. type hint, space before the colon separating parameter names from types, no colon for parameters without types, etc.)

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4361/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
789106802 MDU6SXNzdWU3ODkxMDY4MDI= 4825 clean up the API for renaming and changing dimensions / coordinates keewis 14808389 open 0     5 2021-01-19T15:11:55Z 2021-09-10T15:04:14Z   MEMBER      

From #4108:

I wonder if it would be better to first "reorganize" all of the existing functions: we currently have rename (and Dataset.rename_dims / Dataset.rename_vars), set_coords, reset_coords, set_index, reset_index and swap_dims, which overlap partially. For example, the code sample from #4417 works if instead of python ds = ds.rename(b='x') ds = ds.set_coords('x') we use python ds = ds.set_index(x="b") and something similar for the code sample in #4107.

I believe we currently have these use cases (not sure if that list is complete, though): - rename a DataArray → rename - rename a existing variable to a name that is not yet in the object → rename / Dataset.rename_vars / Dataset.rename_dims - convert a data variable to a coordinate (not a dimension coordinate) → set_coords - convert a coordinate (not a dimension coordinate) to a data variable → reset_coords - swap a existing dimension coordinate with a coordinate (which may not exist) and rename the dimension → swap_dims - use a existing coordinate / data variable as a dimension coordinate (do not rename the dimension) → set_index - stop using a coordinate as dimension coordinate and append _ to its name (do not rename the dimension) → reset_index - use two existing coordinates / data variables as a MultiIndex → set_index - stop using a MultiIndex as a dimension coordinate and use its levels as coordinates → reset_index

Sometimes, some of these can be emulated by combinations of others, for example: ```python

x is a dimension without coordinates

assert_identical(ds.set_index({"x": "b"}), ds.swap_dims({"x": "b"}).rename({"b": "x"})) assert_identical(ds.swap_dims({"x": "b"}), ds.set_index({"x": "b"}).rename({"x": "b"})) and, with this PR:python assert_identical(ds.set_index({"x": "b"}), ds.set_coords("b").rename({"b": "x"})) assert_identical(ds.swap_dims({"x": "b"}), ds.rename({"b": "x"})) `` which means that it would increase the overlap ofrename,set_index, andswap_dims`.

In any case I think we should add a guide which explains which method to pick in which situation (or extend howdoi).

Originally posted by @keewis in https://github.com/pydata/xarray/issues/4108#issuecomment-761907785

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4825/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
791277757 MDU6SXNzdWU3OTEyNzc3NTc= 4837 expose _to_temp_dataset / _from_temp_dataset as semi-public API? keewis 14808389 open 0     5 2021-01-21T16:11:32Z 2021-01-22T02:07:08Z   MEMBER      

When writing accessors which behave the same for both Dataset and DataArray, it would be incredibly useful to be able to use DataArray._to_temp_dataset / DataArray._from_temp_dataset to deduplicate code. Is it safe to use those in external packages (like pint-xarray)?

Otherwise I guess it would be possible to use python name = da.name if da.name is None else "__temp" temp_ds = da.to_dataset(name=name) new_da = temp_ds[name] if da.name is None: new_da = new_da.rename(da.name) assert_identical(da, new_da) but that seems less efficient.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/4837/reactions",
    "total_count": 2,
    "+1": 2,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue
517195073 MDU6SXNzdWU1MTcxOTUwNzM= 3483 assign_coords with mixed DataArray / array args removes coords keewis 14808389 open 0     5 2019-11-04T14:38:40Z 2019-11-07T15:46:15Z   MEMBER      

I'm not sure if using assign_coords to overwrite the data of coords is the best way to do so, but using mixed args (on current master) turns out to have surprising results: ```python

obj = xr.DataArray( ... data=[6, 3, 4, 6], ... coords={"x": list("abcd"), "y": ("x", range(4))}, ... dims="x", ... ) obj <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'a' 'b' 'c' 'd' y (x) int64 0 1 2 3

works as expected

obj.assign_coords(coords={"x": list("efgh"), "y": ("x", [0, 2, 4, 6])}) <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'e' 'f' 'g' 'h' y (x) int64 0 2 4 6

works, too (same as .data / .values)

obj.assign_coords(coords={ ... "x": obj.x.copy(data=list("efgh")).variable, ... "y": ("x", [0, 2, 4, 6]), ... }) <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'e' 'f' 'g' 'h' y (x) int64 0 2 4 6

this drops "y"

obj.assign_coords(coords={ ... "x": obj.x.copy(data=list("efgh")), ... "y": ("x", [0, 2, 4, 6]), ... }) <xarray.DataArray 'obj' (x: 4)> array([6, 3, 4, 6]) Coordinates: * x (x) <U1 'e' 'f' 'g' 'h' Passing a `DataArray` for `y`, like `obj.y * 2` while also changing `x` (the type does not matter) always results in a `MergeError`:python obj.assign_coords(x=list("efgh"), y=obj.y * 2) xarray.core.merge.MergeError: conflicting values for index 'x' on objects to be combined: first value: Index(['e', 'f', 'g', 'h'], dtype='object', name='x') second value: Index(['a', 'b', 'c', 'd'], dtype='object', name='x') ```

I would expect the result to be the same regardless of the type of the new coords.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/3483/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 45.425ms · About: xarray-datasette