home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where state = "closed" and user = 43999641 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 2

  • issue 1
  • pull 1

state 1

  • closed · 2 ✖

repo 1

  • xarray 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association active_lock_reason draft pull_request body reactions performed_via_github_app state_reason repo type
2125478394 PR_kwDOAMm_X85mZIzr 8723 (feat): Support for `pandas` `ExtensionArray` ilan-gold 43999641 closed 0     23 2024-02-08T15:38:18Z 2024-04-18T12:52:06Z 2024-04-18T12:52:03Z CONTRIBUTOR   0 pydata/xarray/pulls/8723

Some outstanding points/decisions brought up by this PR: - [ ] Confirm type promotion rules and write them out. As it stands now, if everything is of the same extension array type, it is passed onwards and otherwise is converted to numpy. (related: https://github.com/pydata/xarray/pull/8714) ~- [ ] Acceptance of plum as a dispatch method. Without it, the behavior should be fallen back on from before (cast to numpy types). I am a big fan of dispatching and think it could serve as a model going forward for making support of other data types/arrays more feasible. The other option, I think, would be to just use the underlying array of the ExtensionDuckArray class to decide and then have some central registry that serves as the basis for a decorator (like the api for accessors via _CachedAccessor). That being said, the current defaults are quite good so this is a marginal feature, in all likelihood.~ - [ ] Do we allow just pandas ExtensionArray directly or can we also allow Series?

Possible missing something else! Let me know!

Checklist: - [x] Closes #8463 and Closes #5287 - [x] Tests added - [x] User visible changes (including notable bug fixes) are documented in whats-new.rst - [ ] New functions/methods are listed in api.rst

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8723/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
    xarray 13221727 pull
1999657332 I_kwDOAMm_X853MFl0 8463 Categorical Array ilan-gold 43999641 closed 0     19 2023-11-17T17:57:12Z 2024-04-18T12:52:04Z 2024-04-18T12:52:04Z CONTRIBUTOR      

Is your feature request related to a problem?

We are looking to improve compatibility between AnnData and xarray (see https://github.com/scverse/anndata/issues/744), and so categoricals are naturally on our roadmap. Thus, I think some sort of standard-use categoricals array would be desirable. It seems something similar has come up with netCDF, although my knowledge is limited so this issue may be more distinct than I am aware. So what comes of this issue may solve two birds with one stone, or it may work towards some common solution that can at least help both use-cases (AnnData and netCDF ENUM).

Describe the solution you'd like

The goal would be a standard-use categorical data type xarray container of some sort. I'm not sure what form this can take.

We have something functional here that inherits from ExplicitlyIndexedNDArrayMixin and returns pandas.CategoricalDtype. So let's say this implementation would be at least a conceptual starting point to work from (it also seems not dissimilar to what is done here for new CF types).

Some issues: 1. I have no idea what a standard "return type" for an xarray categorical array should be (i.e., numpy with the categories applied, pandas, something custom etc.). So I'm not sure if using pandas.CategoricalDtype type is acceptable as In do in the linked implementation. Relatedly.... 2. I don't think using pandas.CategoricalDtype really helps with the already existing CF Enum need if you want to have the return type be some sort of numpy array (although again, not sure about the return type). As I understand it, though, the whole point of categoricals is to use integers as the base type and then only show "strings" outwardly i.e., printing, the API for equality operations, accessors etc., while the internals are based on integers. So I'm not really sure numpy is even an option here. Maybe we roll our own solution? 3. I am not sure this is the right level at which to implement this (maybe it should be a Variable? I don't think so, but I am just a beginner here 😄 )

It seems you may want, in addition to the array container, some sort of i/o functionality for this feature (so maybe some on-disk specification?).

Describe alternatives you've considered

I think there is some route via VariableCoder as hinted here i.e., using encode/decode. This would probably be more general purpose as we could encode directly to other data types if using pandas is not desirable. Maybe this would be a way to support both netCDF and returning a pandas.CategoricalDtype (again, not sure what the netCDF return type should be for ENUM).

Additional context

So just for reference, the current behavior of to_xarray with pandas.CategoricalDtype is object dtype from numpy:

```python import pandas as pd df = pd.DataFrame({'cat': ['a', 'b', 'a', 'b', 'c']}) df['cat'] = df['cat'].astype('category') df.to_xarray()['cat']

<xarray.DataArray 'cat' (index: 5)>

array(['a', 'b', 'a', 'b', 'c'], dtype=object)

Coordinates:

* index (index) int64 0 1 2 3 4

```

And as stated in the netCDF issue, for that use-case, the information about ENUM is lost (from what I can read).

Apologies if I'm missing something here! Feedback welcome! Sorry if this is a bit chaotic, just trying to cover my bases.

{
    "url": "https://api.github.com/repos/pydata/xarray/issues/8463/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed xarray 13221727 issue

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [active_lock_reason] TEXT,
   [draft] INTEGER,
   [pull_request] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [state_reason] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
);
CREATE INDEX [idx_issues_repo]
    ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
    ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
    ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
    ON [issues] ([user]);
Powered by Datasette · Queries took 7680.134ms · About: xarray-datasette