home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "MEMBER" and issue = 485708282 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 3

  • crusaderky 7
  • fmaussion 2
  • shoyer 1

issue 1

  • Stateful user-defined accessors · 10 ✖

author_association 1

  • MEMBER · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
539463105 https://github.com/pydata/xarray/issues/3268#issuecomment-539463105 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUzOTQ2MzEwNQ== crusaderky 6213168 2019-10-08T11:04:06Z 2019-10-08T11:04:06Z MEMBER

Thing is, the whole thing is undefined. What does the accessor state contain? As a xarray developer, I don't know. Is it variable names? Is it references to objects that make up the Dataset, e.g. Variables or the attrs dict? Is it objects whose contents rely on the current state of the Dataset, e.g. aggregated measures? Is it objects whose contents rely on historical events (like in your case)?

Dataset.copy() will create a copy of everything up to and excluding the numpy arrays. In order to allow you to retain accessor state, we'd need to plant a hook in it and invoke some agreed duck-type API in your object that basically states, "I called copy(), and this is the new object I created, please create a copy of yourself accordingly making extra sure you don't retain references to components of the previous object". And then there are all the other methods that currently nuke the accessor state - including many in-place ones - because they could potentially invalidate it. What should they do? Invoke a special API on the accessor? If not, why should copy() trigger special accessor API and e.g. roll() shouldn't? Planting accessor-refresher hooks in every single method that currently just wipes it away is out of question as it would need to be almost everywhere and - more importantly - it would be born broken.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
539240950 https://github.com/pydata/xarray/issues/3268#issuecomment-539240950 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUzOTI0MDk1MA== crusaderky 6213168 2019-10-07T23:03:28Z 2019-10-07T23:04:25Z MEMBER

Would that make any sense that the xr.DataSet.copy() method also return a copy of the accessors ?

It's been discussed above in this same thread. It's impossible without breaking the accessor API, as it would require you (the accessor developer) to define a copy method. The more high level discussion is that the statefulness of the accessor is something that is OK to use for caching and performance improvements, and not OK for storing functional information like yours.

Have you considered storing a flag in Variable.attrs instead?

```python

def add(self, da):
    da.attrs["cleanable"] = True
    self.obj[da.name] = da
    return self.obj

def clean(self):
    return self.obj.drop([
        k for k, v in self.obj.variables.items()
        if v.attrs.get("cleanable")
    ])

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
538577126 https://github.com/pydata/xarray/issues/3268#issuecomment-538577126 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUzODU3NzEyNg== crusaderky 6213168 2019-10-04T22:14:38Z 2019-10-04T22:16:03Z MEMBER

@gmaze Dataset.drop does not mutate the state of the original object, so it's conceptually wrong for your clean() method to mutate the accessor state too. It should be: python def clean(self): return self.obj.drop(self.added) The new dataset returned will have no accessor cache, and will recreate an instance on the fly on first access.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
525377722 https://github.com/pydata/xarray/issues/3268#issuecomment-525377722 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUyNTM3NzcyMg== shoyer 1217238 2019-08-27T16:22:44Z 2019-08-27T16:22:44Z MEMBER

It isn't just methods that use _to_temp_dataset() that result in losing the accessor state -- any operation that creates a new DataArray will (by design) lose the accessor, which don't get propagated in any operations.

Accessors are also not preserved when indexing a DataArray out of a Dataset (https://github.com/pydata/xarray/issues/3205), cc @djhoese.

I had not contemplated the issue of circular references, which I agree is not ideal. If we had realized that when creating accessors in the first place we might have chosen a different design, but there are a number of users who rely upon it.

The circular reference issue could also be worked around in a user-friendly way by having the decorator automatically add methods to the decorated class

I like the look of this solution. It will still require users to update their code to avoid circular references (e.g., by removing their own __init__ method), but it will make the default behavior more sane.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
525268978 https://github.com/pydata/xarray/issues/3268#issuecomment-525268978 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUyNTI2ODk3OA== fmaussion 10050469 2019-08-27T12:00:02Z 2019-08-27T12:00:02Z MEMBER

could you change your accessor code to store its state in Dataset.attrs instead?

Yes, although attrs would be bad as well because they are lost in many operations. The current "best practice" would be to use coords for this, as in https://github.com/pydata/xarray/issues/1271#issuecomment-280574680

I never bothered to get this done (my library is quite niche and not mean to become widely used), but if we remove the caching mechanism in xarray, it would be one more incentive to make the switch.

So you work on the assumption that no new potentially useful coords will be added after the first invocation of your accessor?

Yes. In almost all use cases I know of this won't happen. Typically, the users have to create (or open) an xarray object that salem understands before calling the accessor anyway.

Or do you have logic that invalidates your cache every time the state of the coords changes?

No. I didn't think about that before today ;-). But you are right: if users of salem's accessor change the coordinates after calling the accessor, then bad things might happen. Experience shows that this rarely happens (never happened to me), but I can see how this can fire back.

Altogether, these are good arguments to remove the caching mechanism I believe.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
525260148 https://github.com/pydata/xarray/issues/3268#issuecomment-525260148 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUyNTI2MDE0OA== crusaderky 6213168 2019-08-27T11:30:22Z 2019-08-27T11:32:13Z MEMBER

The circular reference issue could also be worked around in a user-friendly way by having the decorator automatically add methods to the decorated class, copying the design of @dataclass:

```python import weakref

class C: def init(self, owner): self._owner = weakref.ref(owner) if hasattr(self, "post_init"): self.post_init()

@property
def owner(self):
    out = self._owner()
    if out is None:
        raise AttributeError("Orphaned accessor")
    return out

``` This would also allow for shallow copies to change the pointer.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
525257435 https://github.com/pydata/xarray/issues/3268#issuecomment-525257435 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUyNTI1NzQzNQ== crusaderky 6213168 2019-08-27T11:20:50Z 2019-08-27T11:21:14Z MEMBER

store the names of the coordinate variabless we know are going to be useful to us later. So you work on the assumption that no new potentially useful coords will be added after the first invocation of your accessor? Or do you have logic that invalidates your cache every time the state of the coords changes?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
525256306 https://github.com/pydata/xarray/issues/3268#issuecomment-525256306 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUyNTI1NjMwNg== crusaderky 6213168 2019-08-27T11:16:48Z 2019-08-27T11:16:48Z MEMBER

@fmaussion could you change your accessor code to store its state in Dataset.attrs instead?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
525252528 https://github.com/pydata/xarray/issues/3268#issuecomment-525252528 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUyNTI1MjUyOA== fmaussion 10050469 2019-08-27T11:03:47Z 2019-08-27T11:04:10Z MEMBER

Interesting, thanks!

As an accessor maintainer, I can ensure that at least one accessor implementation is storing state ;-). But this state is based on the xarray object itself: for example, we derive georeferencing information and store the names of the coordinate variabless we know are going to be useful to us later. That is, every new call to __init__ based on a modified object will trigger a new parsing, and we don't come into the situation you describe above.

Getting rid of the caching logic would mean some performance loss to us, yes, but I don't know if it's "worse" than the circular reference issue you describe or not.

{
    "total_count": 1,
    "+1": 0,
    "-1": 0,
    "laugh": 1,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282
525240363 https://github.com/pydata/xarray/issues/3268#issuecomment-525240363 https://api.github.com/repos/pydata/xarray/issues/3268 MDEyOklzc3VlQ29tbWVudDUyNTI0MDM2Mw== crusaderky 6213168 2019-08-27T10:24:38Z 2019-08-27T10:24:38Z MEMBER

Demonstation on the circular reference issue: ```python import gc import weakref import xarray

class C: pass

@xarray.register_dataset_accessor('foo') class Foo: def init(self, obj): self.obj = obj

ds = xarray.Dataset() w = weakref.ref(ds) print("No accessor, in scope:", w() is not None) del ds print("No accessor, descoped:", w() is not None)

ds = xarray.Dataset() ds.foo w = weakref.ref(ds) print("with accessor, in scope:", w() is not None) del ds print("with accessor, descoped:", w() is not None) gc.collect() print("with accessor, after gc pass:", w() is not None) Output: No accessor, in scope: True No accessor, descoped: False with accessor, in scope: True with accessor, descoped: True with accessor, after gc pass: False ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Stateful user-defined accessors 485708282

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 15.637ms · About: xarray-datasette