home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

23 rows where issue = 329575874 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 5

  • WeatherGod 9
  • shoyer 9
  • dcherian 2
  • maschull 2
  • rabernat 1

author_association 3

  • MEMBER 12
  • CONTRIBUTOR 9
  • NONE 2

issue 1

  • tolerance for alignment · 23 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
733254878 https://github.com/pydata/xarray/issues/2217#issuecomment-733254878 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDczMzI1NDg3OA== dcherian 2448579 2020-11-24T21:54:14Z 2020-11-24T21:54:14Z MEMBER

reopening since we have a PR to fix this properly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
540642038 https://github.com/pydata/xarray/issues/2217#issuecomment-540642038 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDU0MDY0MjAzOA== maschull 28443905 2019-10-10T15:30:12Z 2019-10-10T15:30:12Z NONE

ah wonderful! I will update to 1.13.0

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
540636805 https://github.com/pydata/xarray/issues/2217#issuecomment-540636805 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDU0MDYzNjgwNQ== dcherian 2448579 2019-10-10T15:18:28Z 2019-10-10T15:18:28Z MEMBER

Yes on xarray>=0.13.0, xr.open_mfdataset(..., join="override") assuming that all files are on the same coordinate system.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
540635551 https://github.com/pydata/xarray/issues/2217#issuecomment-540635551 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDU0MDYzNTU1MQ== maschull 28443905 2019-10-10T15:15:36Z 2019-10-10T15:15:36Z NONE

any work around to this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
407547050 https://github.com/pydata/xarray/issues/2217#issuecomment-407547050 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDQwNzU0NzA1MA== WeatherGod 291576 2018-07-24T20:48:53Z 2018-07-24T20:48:53Z CONTRIBUTOR

I have created a PR for my work-in-progress: pandas-dev/pandas#22043

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
400080478 https://github.com/pydata/xarray/issues/2217#issuecomment-400080478 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDQwMDA4MDQ3OA== shoyer 1217238 2018-06-25T20:14:00Z 2018-06-25T20:14:00Z MEMBER

Both of these sounds reasonable to me, but APIs for pandas are really best discussed in a pandas issue. I'm happy to chime in over there, but I haven't been an active pandas dev recently. On Mon, Jun 25, 2018 at 2:07 PM Benjamin Root notifications@github.com wrote:

Do we want to dive straight to that? Or, would it make more sense to first submit some PRs piping the support for a tolerance kwarg through more of the API? Or perhaps we should propose that a "tolerance" attribute should be an optional attribute that methods like get_indexer() and such could always check for? Not being a pandas dev, I am not sure how piecemeal we should approach this.

In addition, we are likely going to have to implement a decent chunk of code ourselves for compatibility's sake, I think.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2217#issuecomment-400043753, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1iqtMyN_GLM5htF2ncoqJ2AjgcjVks5uASb2gaJpZM4UbV3q .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
400043753 https://github.com/pydata/xarray/issues/2217#issuecomment-400043753 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDQwMDA0Mzc1Mw== WeatherGod 291576 2018-06-25T18:07:49Z 2018-06-25T18:07:49Z CONTRIBUTOR

Do we want to dive straight to that? Or, would it make more sense to first submit some PRs piping the support for a tolerance kwarg through more of the API? Or perhaps we should propose that a "tolerance" attribute should be an optional attribute that methods like get_indexer() and such could always check for? Not being a pandas dev, I am not sure how piecemeal we should approach this.

In addition, we are likely going to have to implement a decent chunk of code ourselves for compatibility's sake, I think.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399615463 https://github.com/pydata/xarray/issues/2217#issuecomment-399615463 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTYxNTQ2Mw== shoyer 1217238 2018-06-23T00:26:19Z 2018-06-23T00:26:19Z MEMBER

OK, I think I'm convinced. Now it's probably a good time to go back to the pandas issues (or open a new one) with a proposal to add tolerance to Float64Index.

On Fri, Jun 22, 2018 at 4:56 PM Benjamin Root notifications@github.com wrote:

I am not concerned about the non-commutativeness of the indexer itself. There is no way around that. At some point, you have to choose values, whether it is done by an indexer or done by some particular set operation.

As for the different sizes, that happens when the tolerance is greater than half the smallest delta. I figure a final implementation would enforce such a constraint on the tolerance.

On Fri, Jun 22, 2018 at 5:56 PM, Stephan Hoyer notifications@github.com wrote:

@WeatherGod https://github.com/WeatherGod One problem with your definition of tolerance is that it isn't commutative, even if both indexes have the same tolerance:

a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) a.tolerance = 0.1 b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) b.tolerance = 0.1print(a.union(b)) # ImpreciseIndex([0.1, 0.2, 0.3, 0.4, 0.501, 0.601], dtype='float64')print(b.union(a)) # ImpreciseIndex([0.1, 0.2, 0.301, 0.401, 0.501, 0.601], dtype='float64')

If you try a little harder, you could even have cases where the result has a different size, e.g.,

a = ImpreciseIndex([1, 2, 3]) a.tolerance = 0.5 b = ImpreciseIndex([1, 1.9, 2.1, 3]) b.tolerance = 0.5print(a.union(b)) # ImpreciseIndex([1.0, 2.0, 3.0], dtype='float64')print(b.union(a)) # ImpreciseIndex([1.0, 1.9, 2.1, 3.0], dtype='float64')

Maybe these aren't really problems in practice, but it's at least a little strange/surprising.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2217#issuecomment-399593224, or mute the thread < https://github.com/notifications/unsubscribe-auth/AARy-BUsm4Pcs-LC7s1iNAhPvCVRrGtwks5t_WgDgaJpZM4UbV3q

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2217#issuecomment-399612490, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1rA1V9mD7qzKWoeED-6wQatUyAXhks5t_YQ7gaJpZM4UbV3q .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399612490 https://github.com/pydata/xarray/issues/2217#issuecomment-399612490 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTYxMjQ5MA== WeatherGod 291576 2018-06-22T23:56:41Z 2018-06-22T23:56:41Z CONTRIBUTOR

I am not concerned about the non-commutativeness of the indexer itself. There is no way around that. At some point, you have to choose values, whether it is done by an indexer or done by some particular set operation.

As for the different sizes, that happens when the tolerance is greater than half the smallest delta. I figure a final implementation would enforce such a constraint on the tolerance.

On Fri, Jun 22, 2018 at 5:56 PM, Stephan Hoyer notifications@github.com wrote:

@WeatherGod https://github.com/WeatherGod One problem with your definition of tolerance is that it isn't commutative, even if both indexes have the same tolerance:

a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) a.tolerance = 0.1 b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) b.tolerance = 0.1print(a.union(b)) # ImpreciseIndex([0.1, 0.2, 0.3, 0.4, 0.501, 0.601], dtype='float64')print(b.union(a)) # ImpreciseIndex([0.1, 0.2, 0.301, 0.401, 0.501, 0.601], dtype='float64')

If you try a little harder, you could even have cases where the result has a different size, e.g.,

a = ImpreciseIndex([1, 2, 3]) a.tolerance = 0.5 b = ImpreciseIndex([1, 1.9, 2.1, 3]) b.tolerance = 0.5print(a.union(b)) # ImpreciseIndex([1.0, 2.0, 3.0], dtype='float64')print(b.union(a)) # ImpreciseIndex([1.0, 1.9, 2.1, 3.0], dtype='float64')

Maybe these aren't really problems in practice, but it's at least a little strange/surprising.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2217#issuecomment-399593224, or mute the thread https://github.com/notifications/unsubscribe-auth/AARy-BUsm4Pcs-LC7s1iNAhPvCVRrGtwks5t_WgDgaJpZM4UbV3q .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399593224 https://github.com/pydata/xarray/issues/2217#issuecomment-399593224 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTU5MzIyNA== shoyer 1217238 2018-06-22T21:56:17Z 2018-06-22T21:56:17Z MEMBER

@WeatherGod One problem with your definition of tolerance is that it isn't commutative, even if both indexes have the same tolerance: python a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) a.tolerance = 0.1 b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) b.tolerance = 0.1 print(a.union(b)) # ImpreciseIndex([0.1, 0.2, 0.3, 0.4, 0.501, 0.601], dtype='float64') print(b.union(a)) # ImpreciseIndex([0.1, 0.2, 0.301, 0.401, 0.501, 0.601], dtype='float64')

If you try a little harder, you could even have cases where the result has a different size, e.g., python a = ImpreciseIndex([1, 2, 3]) a.tolerance = 0.5 b = ImpreciseIndex([1, 1.9, 2.1, 3]) b.tolerance = 0.5 print(a.union(b)) # ImpreciseIndex([1.0, 2.0, 3.0], dtype='float64') print(b.union(a)) # ImpreciseIndex([1.0, 1.9, 2.1, 3.0], dtype='float64')

Maybe these aren't really problems in practice, but it's at least a little strange/surprising.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399584169 https://github.com/pydata/xarray/issues/2217#issuecomment-399584169 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTU4NDE2OQ== WeatherGod 291576 2018-06-22T21:15:06Z 2018-06-22T21:15:06Z CONTRIBUTOR

Actually, I disagree. Pandas's set operations methods are mostly index-based. For union and intersection, they have an optimization that dives down into some c-code when the Indexes are monotonic, but everywhere else, it all works off of results from get_indexer(). I have made a quick toy demo code that seems to work. Note, I didn't know how to properly make a constructor for a subclassed Index, so I added the tolerance attribute after construction just for the purposes of this demo.

``` python from future import print_function import warnings from pandas import Index import numpy as np

from pandas.indexes.base import is_object_dtype, algos, is_dtype_equal from pandas.indexes.base import _ensure_index, _concat, _values_from_object, _unsortable_types from pandas.indexes.numeric import Float64Index

def _choose_tolerance(this, that, tolerance): if tolerance is None: tolerance = max(this.tolerance, getattr(that, 'tolerance', 0.0)) return tolerance

class ImpreciseIndex(Float64Index): def astype(self, dtype, copy=True): return ImpreciseIndex(self.values.astype(dtype=dtype, copy=copy), name=self.name, dtype=dtype)

@property
def tolerance(self):
    return self._tolerance

@tolerance.setter
def tolerance(self, tolerance):
    self._tolerance = self._convert_tolerance(tolerance)

def union(self, other, tolerance=None):
    self._assert_can_do_setop(other)
    other = _ensure_index(other)

    if len(other) == 0 or self.equals(other, tolerance=tolerance):
        return self._get_consensus_name(other)

    if len(self) == 0:
        return other._get_consensus_name(self)

    if not is_dtype_equal(self.dtype, other.dtype):
        this = self.astype('O')
        other = other.astype('O')
        return this.union(other, tolerance=tolerance)

    tolerance = _choose_tolerance(self, other, tolerance)

    indexer = self.get_indexer(other, tolerance=tolerance)
    indexer, = (indexer == -1).nonzero()

    if len(indexer) > 0:
        other_diff = algos.take_nd(other._values, indexer,
                                   allow_fill=False)
        result = _concat._concat_compat((self._values, other_diff))

        try:
            self._values[0] < other_diff[0]
        except TypeError as e:
            warnings.warn("%s, sort order is undefined for "
                          "incomparable objects" % e, RuntimeWarning,
                          stacklevel=3)
        else:
            types = frozenset((self.inferred_type,
                               other.inferred_type))
            if not types & _unsortable_types:
                result.sort()
   else:
        result = self._values

        try:
            result = np.sort(result)
        except TypeError as e:
            warnings.warn("%s, sort order is undefined for "
                          "incomparable objects" % e, RuntimeWarning,
                          stacklevel=3)

    # for subclasses
    return self._wrap_union_result(other, result)


def equals(self, other, tolerance=None):
    if self.is_(other):
        return True

    if not isinstance(other, Index):
        return False

    if is_object_dtype(self) and not is_object_dtype(other):
        # if other is not object, use other's logic for coercion
        if isinstance(other, ImpreciseIndex):
            return other.equals(self, tolerance=tolerance)
        else:
            return other.equals(self)

    if len(self) != len(other):
        return False

    tolerance = _choose_tolerance(self, other, tolerance)
    diff = np.abs(_values_from_object(self) -
                  _values_from_object(other))
    return np.all(diff < tolerance)

def intersection(self, other, tolerance=None):
    self._assert_can_do_setop(other)
    other = _ensure_index(other)

    if self.equals(other, tolerance=tolerance):
        return self._get_consensus_name(other)

    if not is_dtype_equal(self.dtype, other.dtype):
        this = self.astype('O')
        other = other.astype('O')
        return this.intersection(other, tolerance=tolerance)

    tolerance = _choose_tolerance(self, other, tolerance)
    try:
        indexer = self.get_indexer(other._values, tolerance=tolerance)
        indexer = indexer.take((indexer != -1).nonzero()[0])
    except:
        # duplicates
        # FIXME: get_indexer_non_unique() doesn't take a tolerance argument
        indexer = Index(self._values).get_indexer_non_unique(
            other._values)[0].unique()
        indexer = indexer[indexer != -1]

    taken = self.take(indexer)
    if self.name != other.name:
        taken.name = None
    return taken

# TODO: Do I need to re-implement _get_unique_index()?

def get_loc(self, key, method=None, tolerance=None):
    if tolerance is None:
        tolerance = self.tolerance
    if tolerance > 0 and method is None:
        method = 'nearest'
    return super(ImpreciseIndex, self).get_loc(key, method, tolerance)

def get_indexer(self, target, method=None, limit=None, tolerance=None):
    if tolerance is None:
        tolerance = self.tolerance
    if tolerance > 0 and method is None:
        method = 'nearest'
    return super(ImpreciseIndex, self).get_indexer(target, method, limit, tolerance)

if name == 'main': a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) a.tolerance = 0.01 b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) b.tolerance = 0.025 print(a, b) print("a | b :", a.union(b)) print("a & b :", a.intersection(b)) print("a.get_indexer(b):", a.get_indexer(b)) print("b.get_indexer(a):", b.get_indexer(a)) ```

Run this and get the following results: ImpreciseIndex([0.1, 0.2, 0.3, 0.4], dtype='float64') ImpreciseIndex([0.301, 0.401, 0.501, 0.601], dtype='float64') a | b : ImpreciseIndex([0.1, 0.2, 0.3, 0.4, 0.501, 0.601], dtype='float64') a & b : ImpreciseIndex([0.3, 0.4], dtype='float64') a.get_indexer(b): [ 2 3 -1 -1] b.get_indexer(a): [-1 -1 0 1]

This is mostly lifted from the Index base class methods, just with me taking out the monotonic optimization path, and supplying the tolerance argument to the respective calls to get_indexer. The choice of tolerance for a given operation is that unless provided as a keyword argument, then use the larger tolerance of the two objects being compared (with a failback if the other isn't an ImpreciseIndex).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399540641 https://github.com/pydata/xarray/issues/2217#issuecomment-399540641 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTU0MDY0MQ== shoyer 1217238 2018-06-22T18:39:28Z 2018-06-22T18:39:28Z MEMBER

Again, I think the first big challenge here is writing fast approximate union/intersection algorithms. Then we can figure out how to wire them into the pandas/xarray API :).

On Fri, Jun 22, 2018 at 10:42 AM Benjamin Root notifications@github.com wrote:

Ok, I see how you implemented it for pandas's reindex. You essentially inserted an inexact filter within .get_indexer(). And the intersection() and union() uses these methods, so, in theory, one could pipe a tolerance argument through them (as well as for the other set operations). The work needs to be expanded a bit, though, as get_indexer_non_unique() needs the tolerance parameter, too, I think.

For xarray, though, I think we can work around backwards compatibility by having Dataset hold specialized subclasses of Index for floating-point data types that would have the needed changes to the Index class. We can have this specialized class have some default tolerance (say 100*finfo(dtype).resolution?), and it would have its methods use the stored tolerance by default, so it should be completely transparent to the end-user (hopefully). This way, xr.open_mfdataset() would "just work".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2217#issuecomment-399522595, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKS1nkizKUQqKcLlnM0mn3rT_rqFfo5ks5t_SyGgaJpZM4UbV3q .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399522595 https://github.com/pydata/xarray/issues/2217#issuecomment-399522595 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTUyMjU5NQ== WeatherGod 291576 2018-06-22T17:42:29Z 2018-06-22T17:42:29Z CONTRIBUTOR

Ok, I see how you implemented it for pandas's reindex. You essentially inserted an inexact filter within .get_indexer(). And the intersection() and union() uses these methods, so, in theory, one could pipe a tolerance argument through them (as well as for the other set operations). The work needs to be expanded a bit, though, as get_indexer_non_unique() needs the tolerance parameter, too, I think.

For xarray, though, I think we can work around backwards compatibility by having Dataset hold specialized subclasses of Index for floating-point data types that would have the needed changes to the Index class. We can have this specialized class have some default tolerance (say 100*finfo(dtype).resolution?), and it would have its methods use the stored tolerance by default, so it should be completely transparent to the end-user (hopefully). This way, xr.open_mfdataset() would "just work".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399317060 https://github.com/pydata/xarray/issues/2217#issuecomment-399317060 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTMxNzA2MA== shoyer 1217238 2018-06-22T04:27:30Z 2018-06-22T04:27:30Z MEMBER

See https://github.com/pandas-dev/pandas/issues/9817 and https://github.com/pandas-dev/pandas/issues/9530 for the relevant pandas issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399293141 https://github.com/pydata/xarray/issues/2217#issuecomment-399293141 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI5MzE0MQ== shoyer 1217238 2018-06-22T01:32:56Z 2018-06-22T01:32:56Z MEMBER

I think a tolerance argument for set-methods like Index.union would be an easier sell than an epsilon argument for the Index construction. You'd still need to figure out the right algorithmic approach, though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399286310 https://github.com/pydata/xarray/issues/2217#issuecomment-399286310 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI4NjMxMA== WeatherGod 291576 2018-06-22T00:45:19Z 2018-06-22T00:45:19Z CONTRIBUTOR

@shoyer, I am thinking your original intuition was right about needing to introduce improve the Index classes to perhaps work with an optional epsilon argument to its constructor. How receptive do you think pandas would be to that? And even if they would accept such a feature, we probably would need to implement it a bit ourselves in situations where older pandas versions are used.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399285369 https://github.com/pydata/xarray/issues/2217#issuecomment-399285369 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI4NTM2OQ== WeatherGod 291576 2018-06-22T00:38:34Z 2018-06-22T00:38:34Z CONTRIBUTOR

Well, I need this to work for join='outer', so, it is gonna happen one way or another...

One concept I was toying with today was a distinction between aligning coords (which is what it does now) and aligning bounding boxes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399258602 https://github.com/pydata/xarray/issues/2217#issuecomment-399258602 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI1ODYwMg== shoyer 1217238 2018-06-21T22:07:14Z 2018-06-21T22:07:14Z MEMBER

To be clear, my use-case would not be solved by join='override' (isn't that just join='left'?). I have moving nests of coordinates that can have some floating-point noise in them, but are otherwise identical.

`join='left'' will reindex all arguments to match the coordinates of the first object. In practice, that means that if coordinates differ by floating point noise, the second object would end up converted to all NaNs.

join='override' would just relabel coordinates, assuming that the shapes match. The data wouldn't change at all.

I guess another way to do this would be to include method and tolerance arguments from reindex on align, and only allow them when join='left' or join='right'. But this would be a little trickier to pass on through other functions like open_mfdataset().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399254317 https://github.com/pydata/xarray/issues/2217#issuecomment-399254317 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI1NDMxNw== WeatherGod 291576 2018-06-21T21:48:28Z 2018-06-21T21:48:28Z CONTRIBUTOR

To be clear, my use-case would not be solved by join='override' (isn't that just join='left'?). I have moving nests of coordinates that can have some floating-point noise in them, but are otherwise identical.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399253493 https://github.com/pydata/xarray/issues/2217#issuecomment-399253493 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI1MzQ5Mw== WeatherGod 291576 2018-06-21T21:44:58Z 2018-06-21T21:44:58Z CONTRIBUTOR

I was just pointed to this issue yesterday, and I have an immediate need for this feature in xarray for a work project. I'll take responsibility to implement this feature tomorrow.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
395117968 https://github.com/pydata/xarray/issues/2217#issuecomment-395117968 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5NTExNzk2OA== shoyer 1217238 2018-06-06T15:49:09Z 2018-06-06T15:49:09Z MEMBER

For example xr.align(da1, da2, join='override') This would just check that the shapes of the different coordinates match and then replace da2's coordinates with those from da1.

I like this idea! This would be certainly be much easier to implement than general purpose approximate alignment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
395065697 https://github.com/pydata/xarray/issues/2217#issuecomment-395065697 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5NTA2NTY5Nw== rabernat 1197350 2018-06-06T13:20:20Z 2018-06-06T13:20:34Z MEMBER

An alternative approach to fixing this issue would be the long-discussed idea of a "fast path" for open_mfdataset (#1823). In this case, @naomi-henderson knows a-priori that the coordinates for these files should be the same, numerical noise notwithstanding. There should be a way to just skip the alignment check completely and override the coordinates with the values from the first file.

For example python xr.align(da1, da2, join='override')

This would just check that the shapes of the different coordinates match and then replace da2's coordinates with those from da1.

{
    "total_count": 8,
    "+1": 7,
    "-1": 0,
    "laugh": 0,
    "hooray": 1,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
394912948 https://github.com/pydata/xarray/issues/2217#issuecomment-394912948 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5NDkxMjk0OA== shoyer 1217238 2018-06-06T01:43:33Z 2018-06-06T01:46:59Z MEMBER

I agree that this would be useful.

One option that works currently would be to determine the proper grid (e.g., from one file) and then use the preprocess argument of open_mfdataset to reindex() each dataset to the desired grid.

To do this systematically in xarray, we would want to update xarray.align to be capable of approximate alignment. This would in turn require approximate versions of pandas.Index.union (for join='outer') and pandas.Index.intersection (for join='inner').

Ideally, we would do this work upstream in pandas, and utilize it downstream in xarray. Either way, someone will need to figure out and implement the appropriate algorithm to take an approximate union of two sets of points. This could be somewhat tricky when you start to consider sets where some but not all points are within tolerance of each other (e.g., {0, 1, 2, 3, 4, 5} with tolerance=1.5).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 5361.477ms · About: xarray-datasette