home / github / issue_comments

Menu
  • GraphQL API
  • Search all tables

issue_comments: 813168052

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions performed_via_github_app issue
https://github.com/pydata/xarray/pull/5089#issuecomment-813168052 https://api.github.com/repos/pydata/xarray/issues/5089 813168052 MDEyOklzc3VlQ29tbWVudDgxMzE2ODA1Mg== 1217238 2021-04-05T04:00:54Z 2021-04-05T04:05:16Z MEMBER

From an API perspective, I think the name drop_duplicates() would be fine. I would guess that handling arbitrary variables in a Dataset would not be any harder than handling only coordinates?

One thing that is a little puzzling to me is how deduplicating across multiple dimensions is handled. It looks like this function preserves existing dimensions, but inserts NA is the arrays would be ragged? This seems a little strange to me. I think it could make more sense to "flatten" all dimensions in the contained variables into a new dimension when dropping duplicates.

This would require specifying the name for the new dimension(s), but perhaps that could work by switching to the de-duplicated variable name? For example, ds.drop_duplicates('valid') on the example in the PR description would result in a "valid" coordinate/dimension of length 3. The original 'init' and 'tau' dimensions could be preserved as coordinates, e.g., python ds = xr.DataArray( [[1, 2, 3], [4, 5, 6]], coords={"init": [0, 1], "tau": [1, 2, 3]}, dims=["init", "tau"], ).to_dataset(name="test") ds.coords["valid"] = (("init", "tau"), np.array([[8, 6, 6], [7, 7, 7]])) result = ds.drop_duplicates('valid') would result in: ```

result <xarray.Dataset> Dimensions: (valid: 3) Coordinates: init (valid) int64 0 0 1 tau (valid) int64 1 2 1 * valid (valid) int64 8 6 7 Data variables: test (valid) int64 1 2 4 `` i.e., the exact same thing that would be obtained by indexing with the positions of the de-duplicated values:ds.isel(init=('valid', [0, 0, 1]), tau=('valid', [0, 1, 0]))`.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  842940980
Powered by Datasette · Queries took 0.741ms · About: xarray-datasette