home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 538461456

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/issues/3268#issuecomment-538461456 https://api.github.com/repos/pydata/xarray/issues/3268 538461456 MDEyOklzc3VlQ29tbWVudDUzODQ2MTQ1Ng== 1956032 2019-10-04T16:07:21Z 2019-10-04T16:09:00Z CONTRIBUTOR

Hi all, I recently encountered an issue that look like this with accessor, but not sure. Here is a peace of code that reproduces the issue.

Starting from a class with the core of the code and an accessor to implement the user API:

``` python import xarray

class BaseEstimator(): def fit(self, this_ds, x=None): # Do something with this_ds: x = x**2 # and create a new array with results: da = xarray.DataArray(x).rename('fit_data') # Return results: return da

def transform(self, this_ds, **kw):
    # Do something with this_ds:
    val = kw['y'] + this_ds['fit_data']
    # and create a new array with results:
    da = xarray.DataArray(val).rename('trf_data')
    # Return results:
    return da

@xarray.register_dataset_accessor('my_accessor') class Foo: def init(self, obj): self.obj = obj self.added = list()

def add(self, da):
    self.obj[da.name] = da
    self.added.append(da.name)
    return self.obj

def clean(self):
    for v in self.added:
        self.obj = self.obj.drop(v)
        self.added.remove(v)
    return self.obj

def fit(self, estimator, **kw):
    this_da = estimator.fit(self, **kw)
    return self.add(this_da)

def transform(self, estimator, **kw):
    this_da = estimator.transform(self.obj, **kw)
    return self.add(this_da)

```

Now if we consider this workflow: ``` python

ds = xarray.Dataset() ds['ext_data'] = xarray.DataArray(1.)

my_estimator = BaseEstimator() ds = ds.my_accessor.fit(my_estimator, x=2.)

print("Before clean:") print("xr.DataSet var :", list(ds.data_vars)) print("accessor.obj var:", list(ds.my_accessor.obj.data_vars))

print("\nAfter clean:")

ds.my_accessor.clean() # This does nothing to ds but clean the accessor.obj

ds = ds.my_accessor.clean() # Cleaning ok for both ds and accessor.obj

ds_clean = ds.my_accessor.clean() # Cleaning ok on new ds, does nothing to ds as expected but clean in accessor.obj print("xr.DataSet var :", list(ds.data_vars)) print("accessor.obj var :", list(ds.my_accessor.obj.data_vars)) print("Cleaned xr.DataSet var:", list(ds_clean.data_vars)) We have the following output:python Before clean: xr.DataSet var : ['ext_data', 'fit_data'] accessor.obj var: ['ext_data', 'fit_data']

After clean: xr.DataSet var : ['ext_data', 'fit_data'] accessor.obj var : ['ext_data'] Cleaned xr.DataSet var: ['ext_data'] ``` The issue is clear here: the base space dataset has the 'fit_data' variable but not the accessor object: they've been "disconnected" and it's not apparent to users.

So if users later proceed to run the "transform":

python ds.my_accessor.transform(my_estimator, y=2.) they get an KeyError raised because the 'fit_data' is not in the accessor, although it still appears on the list of the ds variables, which is more than confusing.

Sorry for this long post, I'm not sure it's relevant to this issue but it seems so to me. I don't see a solution to this from the accessor developer side, except for not "interfering" with the content of the accessed object.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  485708282
Powered by Datasette · Queries took 0.745ms · About: xarray-datasette