home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

77 rows where user = 291576 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 20

  • Vectorized lazy indexing 17
  • Pointwise indexing -- something like sel_points 11
  • tolerance for alignment 9
  • Slicing DataArray can take longer than not slicing 6
  • Slow performance of isel 6
  • getting a "truth value of an array" error when supplying my own `concat_dim`. 4
  • can't use datetime or pandas datetime to index time dimension 3
  • open_mfdataset() on a single file drops the concat_dim 3
  • Change an `==` to an `is`. Fix tests so that this won't happen again. 3
  • API design for pointwise indexing 2
  • Possible regression with PyNIO data not being lazily loaded 2
  • Pynio tests are being skipped on TravisCI 2
  • concat_dim for auto_combine for a single object is now respected 2
  • Plot methods 1
  • align silently upcasts data arrays when NaNs are inserted 1
  • groupby reduction sometimes collapses variables into scalars 1
  • add pynio backend 1
  • operations with pd.to_timedelta() now fails 1
  • can't do in-place clip() with DataArrays. 1
  • Should we make "rasterio" an engine option? 1

user 1

  • WeatherGod · 77 ✖

author_association 1

  • CONTRIBUTOR 77
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
738189796 https://github.com/pydata/xarray/issues/2004#issuecomment-738189796 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDczODE4OTc5Ng== WeatherGod 291576 2020-12-03T18:15:35Z 2020-12-03T18:15:35Z CONTRIBUTOR

I think so, at least in terms of my original problem.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
642253287 https://github.com/pydata/xarray/issues/4142#issuecomment-642253287 https://api.github.com/repos/pydata/xarray/issues/4142 MDEyOklzc3VlQ29tbWVudDY0MjI1MzI4Nw== WeatherGod 291576 2020-06-10T20:55:32Z 2020-06-10T20:55:32Z CONTRIBUTOR

So, one important difference I see off the bat is that zarr already had a DataStore implementation, while rasterio does not. I take it that implementing one would be the preferred approach?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Should we make "rasterio" an engine option? 636493109
451626366 https://github.com/pydata/xarray/pull/2648#issuecomment-451626366 https://api.github.com/repos/pydata/xarray/issues/2648 MDEyOklzc3VlQ29tbWVudDQ1MTYyNjM2Ng== WeatherGod 291576 2019-01-05T04:18:50Z 2019-01-05T04:18:50Z CONTRIBUTOR

I completely forgotten about that little quirk of cpython. I try to ignore implementation details like that. Heck, I still don't fully trust dictionaries to be ordered!

I removed the WIP. We can deal with the concat dim default object separately, including turning it into a ReprObject (not exactly sure what the advantage of it is over just using the string, but, meh).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Change an `==` to an `is`. Fix tests so that this won't happen again. 396008054
451583970 https://github.com/pydata/xarray/pull/2648#issuecomment-451583970 https://api.github.com/repos/pydata/xarray/issues/2648 MDEyOklzc3VlQ29tbWVudDQ1MTU4Mzk3MA== WeatherGod 291576 2019-01-04T22:12:44Z 2019-01-04T22:12:44Z CONTRIBUTOR

Is the following statement True or False: "The user should be allowed to explicitly declare that they want the concatenation dimension to be inferred by passing a keyword argument". If this is True, then you need to test equivalence. If it is False, then there is nothing more I need to do for the PR, as changing this to use a ReprObject is orthogonal to these changes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Change an `==` to an `is`. Fix tests so that this won't happen again. 396008054
451581103 https://github.com/pydata/xarray/pull/2648#issuecomment-451581103 https://api.github.com/repos/pydata/xarray/issues/2648 MDEyOklzc3VlQ29tbWVudDQ1MTU4MTEwMw== WeatherGod 291576 2019-01-04T22:00:10Z 2019-01-04T22:00:10Z CONTRIBUTOR

ok, so we use the ReprObject for the default, and then test if concat_dim is of type `ReprObject and then test its equivalance?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Change an `==` to an `is`. Fix tests so that this won't happen again. 396008054
451504997 https://github.com/pydata/xarray/issues/2647#issuecomment-451504997 https://api.github.com/repos/pydata/xarray/issues/2647 MDEyOklzc3VlQ29tbWVudDQ1MTUwNDk5Nw== WeatherGod 291576 2019-01-04T17:06:50Z 2019-01-04T17:06:50Z CONTRIBUTOR

scratch that... the test was an or, not a and.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  getting a "truth value of an array" error when supplying my own `concat_dim`. 395994055
451504462 https://github.com/pydata/xarray/issues/2647#issuecomment-451504462 https://api.github.com/repos/pydata/xarray/issues/2647 MDEyOklzc3VlQ29tbWVudDQ1MTUwNDQ2Mg== WeatherGod 291576 2019-01-04T17:05:00Z 2019-01-04T17:05:00Z CONTRIBUTOR

actually, we could simplify the conditional to be just concat_dim is _CONCAT_DIM_DEFAULT and not bother with the None test.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  getting a "truth value of an array" error when supplying my own `concat_dim`. 395994055
451504141 https://github.com/pydata/xarray/issues/2647#issuecomment-451504141 https://api.github.com/repos/pydata/xarray/issues/2647 MDEyOklzc3VlQ29tbWVudDQ1MTUwNDE0MQ== WeatherGod 291576 2019-01-04T17:03:54Z 2019-01-04T17:03:54Z CONTRIBUTOR

ah! that's why it snuck through! I have been raking my brain on this for the past hour! shall I go ahead and make a PR?

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  getting a "truth value of an array" error when supplying my own `concat_dim`. 395994055
451501740 https://github.com/pydata/xarray/issues/2647#issuecomment-451501740 https://api.github.com/repos/pydata/xarray/issues/2647 MDEyOklzc3VlQ29tbWVudDQ1MTUwMTc0MA== WeatherGod 291576 2019-01-04T16:55:40Z 2019-01-04T16:55:40Z CONTRIBUTOR

To be more explicit, the issue is that concat_dim == _CONCAT_DIM_DEFAULT is ill-advised because the type of concat_dim is not guaranteed to be a scalar. In fact, the elif of that area of code in api.py explicitly tests if concat_dim is or is not a list.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  getting a "truth value of an array" error when supplying my own `concat_dim`. 395994055
425224969 https://github.com/pydata/xarray/issues/2227#issuecomment-425224969 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDQyNTIyNDk2OQ== WeatherGod 291576 2018-09-27T20:05:05Z 2018-09-27T20:05:05Z CONTRIBUTOR

It would be ten files opened via xr.open_mfdataset() concatenated across a time dimension, each one looking like: ``` netcdf convect_gust_20180301_0000 { dimensions: latitude = 3502 ; longitude = 7002 ; variables: double latitude(latitude) ; latitude:_FillValue = NaN ; latitude:_Storage = "contiguous" ; latitude:_Endianness = "little" ; double longitude(longitude) ; longitude:_FillValue = NaN ; longitude:_Storage = "contiguous" ; longitude:_Endianness = "little" ; float gust(latitude, longitude) ; gust:_FillValue = NaNf ; gust:units = "m/s" ; gust:description = "gust winds" ; gust:_Storage = "chunked" ; gust:_ChunkSizes = 701, 1401 ; gust:_DeflateLevel = 8 ; gust:_Shuffle = "true" ; gust:_Endianness = "little" ;

// global attributes: :start_date = "03/01/2018 00:00" ; :end_date = "03/01/2018 01:00" ; :interval = "half-open" ; :init_date = "02/28/2018 22:00" ; :history = "Created 2018-09-12 15:53:44.468144" ; :description = "Convective Downscaling, format V2.0" ; :_NCProperties = "version=1|netcdflibversion=4.6.1|hdf5libversion=1.10.1" ; :_SuperblockVersion = 0 ; :_IsNetcdf4 = 1 ; :_Format = "netCDF-4" ; ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
424795330 https://github.com/pydata/xarray/issues/2227#issuecomment-424795330 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDQyNDc5NTMzMA== WeatherGod 291576 2018-09-26T17:06:44Z 2018-09-26T17:06:44Z CONTRIBUTOR

No, it does not make a difference. The example above peaks at around 5GB of memory (a bit much, but manageable). And it peaks similarly if we chunk it like you suggested.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
424485235 https://github.com/pydata/xarray/issues/2227#issuecomment-424485235 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDQyNDQ4NTIzNQ== WeatherGod 291576 2018-09-25T20:14:02Z 2018-09-25T20:14:02Z CONTRIBUTOR

Yeah, it looks like if da is backed by a dask array, and you do a .isel(win=window.compute()) because otherwise isel barfs on dask indexers, it seems, then the memory usage shoots through the roof. Note that in my case, the dask chunks are (1, 3000, 7000). If I do a window.load() prior to window.isel(), then the memory usage is perfectly reasonable.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
424479421 https://github.com/pydata/xarray/issues/2227#issuecomment-424479421 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDQyNDQ3OTQyMQ== WeatherGod 291576 2018-09-25T19:54:59Z 2018-09-25T19:54:59Z CONTRIBUTOR

Just for posterity, though, here is my simplified (working!) example: ``` import numpy as np import xarray as xr

da = xr.DataArray(np.random.randn(10, 3000, 7000), dims=('time', 'latitude', 'longitude')) window = da.rolling(time=2).construct('win') indexes = window.argmax(dim='win') result = window.isel(win=indexes) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
424477465 https://github.com/pydata/xarray/issues/2227#issuecomment-424477465 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDQyNDQ3NzQ2NQ== WeatherGod 291576 2018-09-25T19:48:20Z 2018-09-25T19:48:20Z CONTRIBUTOR

Huh, strange... I just tried a simplified version of what I was doing (particularly, no dask arrays), and everything worked fine. I'll have to investigate further.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
424470752 https://github.com/pydata/xarray/issues/2227#issuecomment-424470752 https://api.github.com/repos/pydata/xarray/issues/2227 MDEyOklzc3VlQ29tbWVudDQyNDQ3MDc1Mg== WeatherGod 291576 2018-09-25T19:27:28Z 2018-09-25T19:27:28Z CONTRIBUTOR

I am looking into a similar performance issue with isel, but it seems that the issue is that it is creating arrays that are much bigger than needed. For my multidimensional case (time/x/y/window), what should end up only taking a few hundred MB is spiking up to 10's of GB of used RAM. Don't know if this might be a possible source of performance issues.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slow performance of isel 331668890
407547050 https://github.com/pydata/xarray/issues/2217#issuecomment-407547050 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDQwNzU0NzA1MA== WeatherGod 291576 2018-07-24T20:48:53Z 2018-07-24T20:48:53Z CONTRIBUTOR

I have created a PR for my work-in-progress: pandas-dev/pandas#22043

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
400043753 https://github.com/pydata/xarray/issues/2217#issuecomment-400043753 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDQwMDA0Mzc1Mw== WeatherGod 291576 2018-06-25T18:07:49Z 2018-06-25T18:07:49Z CONTRIBUTOR

Do we want to dive straight to that? Or, would it make more sense to first submit some PRs piping the support for a tolerance kwarg through more of the API? Or perhaps we should propose that a "tolerance" attribute should be an optional attribute that methods like get_indexer() and such could always check for? Not being a pandas dev, I am not sure how piecemeal we should approach this.

In addition, we are likely going to have to implement a decent chunk of code ourselves for compatibility's sake, I think.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399612490 https://github.com/pydata/xarray/issues/2217#issuecomment-399612490 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTYxMjQ5MA== WeatherGod 291576 2018-06-22T23:56:41Z 2018-06-22T23:56:41Z CONTRIBUTOR

I am not concerned about the non-commutativeness of the indexer itself. There is no way around that. At some point, you have to choose values, whether it is done by an indexer or done by some particular set operation.

As for the different sizes, that happens when the tolerance is greater than half the smallest delta. I figure a final implementation would enforce such a constraint on the tolerance.

On Fri, Jun 22, 2018 at 5:56 PM, Stephan Hoyer notifications@github.com wrote:

@WeatherGod https://github.com/WeatherGod One problem with your definition of tolerance is that it isn't commutative, even if both indexes have the same tolerance:

a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) a.tolerance = 0.1 b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) b.tolerance = 0.1print(a.union(b)) # ImpreciseIndex([0.1, 0.2, 0.3, 0.4, 0.501, 0.601], dtype='float64')print(b.union(a)) # ImpreciseIndex([0.1, 0.2, 0.301, 0.401, 0.501, 0.601], dtype='float64')

If you try a little harder, you could even have cases where the result has a different size, e.g.,

a = ImpreciseIndex([1, 2, 3]) a.tolerance = 0.5 b = ImpreciseIndex([1, 1.9, 2.1, 3]) b.tolerance = 0.5print(a.union(b)) # ImpreciseIndex([1.0, 2.0, 3.0], dtype='float64')print(b.union(a)) # ImpreciseIndex([1.0, 1.9, 2.1, 3.0], dtype='float64')

Maybe these aren't really problems in practice, but it's at least a little strange/surprising.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/2217#issuecomment-399593224, or mute the thread https://github.com/notifications/unsubscribe-auth/AARy-BUsm4Pcs-LC7s1iNAhPvCVRrGtwks5t_WgDgaJpZM4UbV3q .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399584169 https://github.com/pydata/xarray/issues/2217#issuecomment-399584169 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTU4NDE2OQ== WeatherGod 291576 2018-06-22T21:15:06Z 2018-06-22T21:15:06Z CONTRIBUTOR

Actually, I disagree. Pandas's set operations methods are mostly index-based. For union and intersection, they have an optimization that dives down into some c-code when the Indexes are monotonic, but everywhere else, it all works off of results from get_indexer(). I have made a quick toy demo code that seems to work. Note, I didn't know how to properly make a constructor for a subclassed Index, so I added the tolerance attribute after construction just for the purposes of this demo.

``` python from future import print_function import warnings from pandas import Index import numpy as np

from pandas.indexes.base import is_object_dtype, algos, is_dtype_equal from pandas.indexes.base import _ensure_index, _concat, _values_from_object, _unsortable_types from pandas.indexes.numeric import Float64Index

def _choose_tolerance(this, that, tolerance): if tolerance is None: tolerance = max(this.tolerance, getattr(that, 'tolerance', 0.0)) return tolerance

class ImpreciseIndex(Float64Index): def astype(self, dtype, copy=True): return ImpreciseIndex(self.values.astype(dtype=dtype, copy=copy), name=self.name, dtype=dtype)

@property
def tolerance(self):
    return self._tolerance

@tolerance.setter
def tolerance(self, tolerance):
    self._tolerance = self._convert_tolerance(tolerance)

def union(self, other, tolerance=None):
    self._assert_can_do_setop(other)
    other = _ensure_index(other)

    if len(other) == 0 or self.equals(other, tolerance=tolerance):
        return self._get_consensus_name(other)

    if len(self) == 0:
        return other._get_consensus_name(self)

    if not is_dtype_equal(self.dtype, other.dtype):
        this = self.astype('O')
        other = other.astype('O')
        return this.union(other, tolerance=tolerance)

    tolerance = _choose_tolerance(self, other, tolerance)

    indexer = self.get_indexer(other, tolerance=tolerance)
    indexer, = (indexer == -1).nonzero()

    if len(indexer) > 0:
        other_diff = algos.take_nd(other._values, indexer,
                                   allow_fill=False)
        result = _concat._concat_compat((self._values, other_diff))

        try:
            self._values[0] < other_diff[0]
        except TypeError as e:
            warnings.warn("%s, sort order is undefined for "
                          "incomparable objects" % e, RuntimeWarning,
                          stacklevel=3)
        else:
            types = frozenset((self.inferred_type,
                               other.inferred_type))
            if not types & _unsortable_types:
                result.sort()
   else:
        result = self._values

        try:
            result = np.sort(result)
        except TypeError as e:
            warnings.warn("%s, sort order is undefined for "
                          "incomparable objects" % e, RuntimeWarning,
                          stacklevel=3)

    # for subclasses
    return self._wrap_union_result(other, result)


def equals(self, other, tolerance=None):
    if self.is_(other):
        return True

    if not isinstance(other, Index):
        return False

    if is_object_dtype(self) and not is_object_dtype(other):
        # if other is not object, use other's logic for coercion
        if isinstance(other, ImpreciseIndex):
            return other.equals(self, tolerance=tolerance)
        else:
            return other.equals(self)

    if len(self) != len(other):
        return False

    tolerance = _choose_tolerance(self, other, tolerance)
    diff = np.abs(_values_from_object(self) -
                  _values_from_object(other))
    return np.all(diff < tolerance)

def intersection(self, other, tolerance=None):
    self._assert_can_do_setop(other)
    other = _ensure_index(other)

    if self.equals(other, tolerance=tolerance):
        return self._get_consensus_name(other)

    if not is_dtype_equal(self.dtype, other.dtype):
        this = self.astype('O')
        other = other.astype('O')
        return this.intersection(other, tolerance=tolerance)

    tolerance = _choose_tolerance(self, other, tolerance)
    try:
        indexer = self.get_indexer(other._values, tolerance=tolerance)
        indexer = indexer.take((indexer != -1).nonzero()[0])
    except:
        # duplicates
        # FIXME: get_indexer_non_unique() doesn't take a tolerance argument
        indexer = Index(self._values).get_indexer_non_unique(
            other._values)[0].unique()
        indexer = indexer[indexer != -1]

    taken = self.take(indexer)
    if self.name != other.name:
        taken.name = None
    return taken

# TODO: Do I need to re-implement _get_unique_index()?

def get_loc(self, key, method=None, tolerance=None):
    if tolerance is None:
        tolerance = self.tolerance
    if tolerance > 0 and method is None:
        method = 'nearest'
    return super(ImpreciseIndex, self).get_loc(key, method, tolerance)

def get_indexer(self, target, method=None, limit=None, tolerance=None):
    if tolerance is None:
        tolerance = self.tolerance
    if tolerance > 0 and method is None:
        method = 'nearest'
    return super(ImpreciseIndex, self).get_indexer(target, method, limit, tolerance)

if name == 'main': a = ImpreciseIndex([0.1, 0.2, 0.3, 0.4]) a.tolerance = 0.01 b = ImpreciseIndex([0.301, 0.401, 0.501, 0.601]) b.tolerance = 0.025 print(a, b) print("a | b :", a.union(b)) print("a & b :", a.intersection(b)) print("a.get_indexer(b):", a.get_indexer(b)) print("b.get_indexer(a):", b.get_indexer(a)) ```

Run this and get the following results: ImpreciseIndex([0.1, 0.2, 0.3, 0.4], dtype='float64') ImpreciseIndex([0.301, 0.401, 0.501, 0.601], dtype='float64') a | b : ImpreciseIndex([0.1, 0.2, 0.3, 0.4, 0.501, 0.601], dtype='float64') a & b : ImpreciseIndex([0.3, 0.4], dtype='float64') a.get_indexer(b): [ 2 3 -1 -1] b.get_indexer(a): [-1 -1 0 1]

This is mostly lifted from the Index base class methods, just with me taking out the monotonic optimization path, and supplying the tolerance argument to the respective calls to get_indexer. The choice of tolerance for a given operation is that unless provided as a keyword argument, then use the larger tolerance of the two objects being compared (with a failback if the other isn't an ImpreciseIndex).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399522595 https://github.com/pydata/xarray/issues/2217#issuecomment-399522595 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTUyMjU5NQ== WeatherGod 291576 2018-06-22T17:42:29Z 2018-06-22T17:42:29Z CONTRIBUTOR

Ok, I see how you implemented it for pandas's reindex. You essentially inserted an inexact filter within .get_indexer(). And the intersection() and union() uses these methods, so, in theory, one could pipe a tolerance argument through them (as well as for the other set operations). The work needs to be expanded a bit, though, as get_indexer_non_unique() needs the tolerance parameter, too, I think.

For xarray, though, I think we can work around backwards compatibility by having Dataset hold specialized subclasses of Index for floating-point data types that would have the needed changes to the Index class. We can have this specialized class have some default tolerance (say 100*finfo(dtype).resolution?), and it would have its methods use the stored tolerance by default, so it should be completely transparent to the end-user (hopefully). This way, xr.open_mfdataset() would "just work".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399286310 https://github.com/pydata/xarray/issues/2217#issuecomment-399286310 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI4NjMxMA== WeatherGod 291576 2018-06-22T00:45:19Z 2018-06-22T00:45:19Z CONTRIBUTOR

@shoyer, I am thinking your original intuition was right about needing to introduce improve the Index classes to perhaps work with an optional epsilon argument to its constructor. How receptive do you think pandas would be to that? And even if they would accept such a feature, we probably would need to implement it a bit ourselves in situations where older pandas versions are used.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399285369 https://github.com/pydata/xarray/issues/2217#issuecomment-399285369 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI4NTM2OQ== WeatherGod 291576 2018-06-22T00:38:34Z 2018-06-22T00:38:34Z CONTRIBUTOR

Well, I need this to work for join='outer', so, it is gonna happen one way or another...

One concept I was toying with today was a distinction between aligning coords (which is what it does now) and aligning bounding boxes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399254317 https://github.com/pydata/xarray/issues/2217#issuecomment-399254317 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI1NDMxNw== WeatherGod 291576 2018-06-21T21:48:28Z 2018-06-21T21:48:28Z CONTRIBUTOR

To be clear, my use-case would not be solved by join='override' (isn't that just join='left'?). I have moving nests of coordinates that can have some floating-point noise in them, but are otherwise identical.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
399253493 https://github.com/pydata/xarray/issues/2217#issuecomment-399253493 https://api.github.com/repos/pydata/xarray/issues/2217 MDEyOklzc3VlQ29tbWVudDM5OTI1MzQ5Mw== WeatherGod 291576 2018-06-21T21:44:58Z 2018-06-21T21:44:58Z CONTRIBUTOR

I was just pointed to this issue yesterday, and I have an immediate need for this feature in xarray for a work project. I'll take responsibility to implement this feature tomorrow.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  tolerance for alignment 329575874
380241636 https://github.com/pydata/xarray/pull/2048#issuecomment-380241636 https://api.github.com/repos/pydata/xarray/issues/2048 MDEyOklzc3VlQ29tbWVudDM4MDI0MTYzNg== WeatherGod 291576 2018-04-10T20:48:25Z 2018-04-10T20:48:25Z CONTRIBUTOR

What's new entry added.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat_dim for auto_combine for a single object is now respected 312998259
380203653 https://github.com/pydata/xarray/pull/2048#issuecomment-380203653 https://api.github.com/repos/pydata/xarray/issues/2048 MDEyOklzc3VlQ29tbWVudDM4MDIwMzY1Mw== WeatherGod 291576 2018-04-10T18:34:32Z 2018-04-10T18:34:32Z CONTRIBUTOR

Travis failures seem to be unrelated?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  concat_dim for auto_combine for a single object is now respected 312998259
380137124 https://github.com/pydata/xarray/issues/1988#issuecomment-380137124 https://api.github.com/repos/pydata/xarray/issues/1988 MDEyOklzc3VlQ29tbWVudDM4MDEzNzEyNA== WeatherGod 291576 2018-04-10T15:12:05Z 2018-04-10T15:12:05Z CONTRIBUTOR

Yup... looks like that did the trick (for auto_combine and open_mfdataset). I even have a simple test to demonstrate it. PR coming shortly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() on a single file drops the concat_dim 305327479
379939574 https://github.com/pydata/xarray/issues/1988#issuecomment-379939574 https://api.github.com/repos/pydata/xarray/issues/1988 MDEyOklzc3VlQ29tbWVudDM3OTkzOTU3NA== WeatherGod 291576 2018-04-10T00:55:48Z 2018-04-10T00:55:48Z CONTRIBUTOR

I'll give it a go tomorrow. My work has gotten to this point now, and I have some unit tests that happen to exercise this edge case.

On a somewhat related note, would a allow_missing feature be welcomed in open_mfdataset()? I have written up some code that expects a concat_dim, and a list of filenames. It will then pass to open_mfdataset() only the files (and corresponding concat_dim values) that exists, and then calls reindex() with the original concat_dim to have a nan-filled slab where-ever there was a missing file.

Any interest?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() on a single file drops the concat_dim 305327479
379901414 https://github.com/pydata/xarray/issues/1988#issuecomment-379901414 https://api.github.com/repos/pydata/xarray/issues/1988 MDEyOklzc3VlQ29tbWVudDM3OTkwMTQxNA== WeatherGod 291576 2018-04-09T21:35:11Z 2018-04-09T21:35:11Z CONTRIBUTOR

Could the fix be as simple as if len(datasets) == 1 and dim is None:?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() on a single file drops the concat_dim 305327479
375056363 https://github.com/pydata/xarray/issues/2004#issuecomment-375056363 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTA1NjM2Mw== WeatherGod 291576 2018-03-21T18:50:58Z 2018-03-21T18:50:58Z CONTRIBUTOR

Ah, nevermind, I see that our examples only had one greater-than-one stride

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375056077 https://github.com/pydata/xarray/issues/2004#issuecomment-375056077 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTA1NjA3Nw== WeatherGod 291576 2018-03-21T18:50:01Z 2018-03-21T18:50:01Z CONTRIBUTOR

Dunno. I can't seem to get that engine working on my system.

Reading through that thread, I wonder if the optimization they added only applies if there is only one stride greater than one?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375036951 https://github.com/pydata/xarray/issues/2004#issuecomment-375036951 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAzNjk1MQ== WeatherGod 291576 2018-03-21T17:51:54Z 2018-03-21T17:51:54Z CONTRIBUTOR

This might be relevant: https://github.com/Unidata/netcdf4-python/issues/680

Still reading through the thread.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375034973 https://github.com/pydata/xarray/issues/2004#issuecomment-375034973 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAzNDk3Mw== WeatherGod 291576 2018-03-21T17:46:09Z 2018-03-21T17:46:09Z CONTRIBUTOR

my bet is probably netCDF4-python. Don't want to write up the C code though to confirm it. Sigh... this isn't going to be a fun one to track down. Shall I open a bug report over there?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
375014480 https://github.com/pydata/xarray/issues/2004#issuecomment-375014480 https://api.github.com/repos/pydata/xarray/issues/2004 MDEyOklzc3VlQ29tbWVudDM3NTAxNDQ4MA== WeatherGod 291576 2018-03-21T16:50:59Z 2018-03-21T16:56:13Z CONTRIBUTOR

Yeah, good example. Eliminates a lot of possible variables such as problems with netcdf4 compression and such. Probably should see if it happens in v0.10.0 to see if the changes to the indexing system caused this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Slicing DataArray can take longer than not slicing 307318224
373840044 https://github.com/pydata/xarray/issues/1997#issuecomment-373840044 https://api.github.com/repos/pydata/xarray/issues/1997 MDEyOklzc3VlQ29tbWVudDM3Mzg0MDA0NA== WeatherGod 291576 2018-03-16T20:45:39Z 2018-03-16T20:45:39Z CONTRIBUTOR

MaskedArrays had a similar problem, IIRC, because it was blindly copying the NDArray docstrings. Not going to be easy to do, though.

"we don't support out": Is that a general rule for xarray? Any notes on how to do what I want for clip? The function this was in was supposed to be general use (ndarrays and xarrays).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  can't do in-place clip() with DataArrays. 306067267
370986433 https://github.com/pydata/xarray/pull/1899#issuecomment-370986433 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM3MDk4NjQzMw== WeatherGod 291576 2018-03-07T01:08:36Z 2018-03-07T01:08:36Z CONTRIBUTOR

:tada:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
367077311 https://github.com/pydata/xarray/pull/1899#issuecomment-367077311 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NzA3NzMxMQ== WeatherGod 291576 2018-02-20T18:43:56Z 2018-02-20T18:43:56Z CONTRIBUTOR

I did some more investigation into the memory usage problem I was having. I had assumed that the vectorized indexed result of a lazily indexed data array would be an in-memory array. So, when I then started to use the result, it was then doing a read of all the data at once, resulting in a near-complete load of the data into memory.

I have adjusted my code to chunk out the indexing in order to keep the memory usage under control at reasonable performance penalty. I haven't looked into trying to identify the ideal chunking scheme to follow for an arbitrary dataarray and indexing. Perhaps we can make that a task for another day. At this point, I am satisfied with the features (negative step-sizes aside, of course).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366379465 https://github.com/pydata/xarray/pull/1899#issuecomment-366379465 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3OTQ2NQ== WeatherGod 291576 2018-02-16T22:40:06Z 2018-02-16T22:40:06Z CONTRIBUTOR

Ah-hah! Ok, so, the problem isn't some weird difference between the two examples I gave. The issue is that calling np.asarray(foo) triggered a full loading of the data!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366376400 https://github.com/pydata/xarray/pull/1899#issuecomment-366376400 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3NjQwMA== WeatherGod 291576 2018-02-16T22:25:59Z 2018-02-16T22:25:59Z CONTRIBUTOR

huh... now I am not so sure about that... must be something else triggering the load.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366374917 https://github.com/pydata/xarray/pull/1899#issuecomment-366374917 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3NDkxNw== WeatherGod 291576 2018-02-16T22:19:08Z 2018-02-16T22:19:08Z CONTRIBUTOR

also, at this point, I don't know if this is limited to the netcdf4 backend, as this type of indexing was only done on a variable I have in a netcdf file. I don't have 4-D variables in other file types.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366374041 https://github.com/pydata/xarray/pull/1899#issuecomment-366374041 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3NDA0MQ== WeatherGod 291576 2018-02-16T22:14:49Z 2018-02-16T22:14:49Z CONTRIBUTOR

CD by the way, has dimensions of scales, latitude, longitude, wind_direction.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366373479 https://github.com/pydata/xarray/pull/1899#issuecomment-366373479 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3MzQ3OQ== WeatherGod 291576 2018-02-16T22:12:18Z 2018-02-16T22:12:18Z CONTRIBUTOR

Ah, not a change in behavior, but a possible bug exposed by a tiny change on my part. So, I have a 4D data array, CD and a data array for indexing, wind_inds. The following does not trigger a full loading: CD[0][wind_direction=wind_inds], which is good! But, this does: CD[scales=0, wind_direction=wind_inds], which is bad.

So, somehow, the indexing system is effectively treating these two things as different.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366363419 https://github.com/pydata/xarray/pull/1899#issuecomment-366363419 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM2MzQxOQ== WeatherGod 291576 2018-02-16T21:28:09Z 2018-02-16T21:28:09Z CONTRIBUTOR

correction... the problem isn't with pynio... it is in the netcdf4 backend

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366360382 https://github.com/pydata/xarray/pull/1899#issuecomment-366360382 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM2MDM4Mg== WeatherGod 291576 2018-02-16T21:15:17Z 2018-02-16T21:15:17Z CONTRIBUTOR

Something changed. Now the indexing for pynio is forcing a full loading of the data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366059694 https://github.com/pydata/xarray/pull/1899#issuecomment-366059694 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjA1OTY5NA== WeatherGod 291576 2018-02-15T20:59:20Z 2018-02-15T20:59:20Z CONTRIBUTOR

I can confirm that with the latest changes, the pynio tests now pass locally for me. Now, as to whether or not the tests in there are actually exercising anything useful is a different question.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365734783 https://github.com/pydata/xarray/issues/1910#issuecomment-365734783 https://api.github.com/repos/pydata/xarray/issues/1910 MDEyOklzc3VlQ29tbWVudDM2NTczNDc4Mw== WeatherGod 291576 2018-02-14T20:27:38Z 2018-02-14T20:27:38Z CONTRIBUTOR

Looking through the travis logs, I do see that pynio is getting installed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pynio tests are being skipped on TravisCI 297227247
365734285 https://github.com/pydata/xarray/issues/1910#issuecomment-365734285 https://api.github.com/repos/pydata/xarray/issues/1910 MDEyOklzc3VlQ29tbWVudDM2NTczNDI4NQ== WeatherGod 291576 2018-02-14T20:25:52Z 2018-02-14T20:25:52Z CONTRIBUTOR

Zarr tests and pydap tests are also being skipped

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pynio tests are being skipped on TravisCI 297227247
365729433 https://github.com/pydata/xarray/pull/1899#issuecomment-365729433 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTcyOTQzMw== WeatherGod 291576 2018-02-14T20:07:55Z 2018-02-14T20:07:55Z CONTRIBUTOR

I am working on re-activating those tests. I think PyNio is now available for python3, too.

On Wed, Feb 14, 2018 at 2:59 PM, Joe Hamman notifications@github.com wrote:

@WeatherGod https://github.com/weathergod - you are right, all the pynio tests are being skipped on travis. I'll open a separate issue for that. Yikes!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1899#issuecomment-365727175, or mute the thread https://github.com/notifications/unsubscribe-auth/AARy-PE0F4-EugBO18rhnrogkZN1MLUOks5tUzssgaJpZM4R_x5o .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365722413 https://github.com/pydata/xarray/pull/1899#issuecomment-365722413 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTcyMjQxMw== WeatherGod 291576 2018-02-14T19:43:07Z 2018-02-14T19:43:07Z CONTRIBUTOR

It looks like the pynio backend isn't regularly tested, as several of them currently fail when I run the tests locally. Some of them are failing because they are asserting NotImplementedErrors that are now implemented.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365708385 https://github.com/pydata/xarray/pull/1899#issuecomment-365708385 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTcwODM4NQ== WeatherGod 291576 2018-02-14T18:55:43Z 2018-02-14T18:55:43Z CONTRIBUTOR

Just did some more debugging, putting in some debug statements within NioArrayWrapper.__getitem__(): ``` diff --git a/xarray/backends/pynio_.py b/xarray/backends/pynio_.py index c7e0ddf..b9f7151 100644 --- a/xarray/backends/pynio_.py +++ b/xarray/backends/pynio_.py @@ -27,16 +27,24 @@ class NioArrayWrapper(BackendArray): return self.datastore.ds.variables[self.variable_name]

 def __getitem__(self, key):
  • import logging
  • logger = logging.getLogger(name)
  • logger.addHandler(logging.NullHandler())
  • logger.debug("initial key: %s", key) key, np_inds = indexing.decompose_indexer(key, self.shape, mode='outer')
  • logger.debug("Decomposed indexers:\n%s\n%s", key, np_inds)
     with self.datastore.ensure_open(autoclose=True):
         array = self.get_array()
    
    • logger.debug("initial array: %r", array) if key == () and self.ndim == 0: return array.get_value()
       for ind in np_inds:
      
      • logger.debug("indexer: %s", ind) array = indexing.NumpyIndexingAdapter(array)[ind]
      • logger.debug("intermediate array: %r", array)

        return array

```

And here is the test script (data not included): import logging import xarray as xr logging.basicConfig(level=logging.DEBUG) fname1 = '../hrrr.t12z.wrfnatf02.grib2' ds = xr.open_dataset(fname1, engine='pynio') subset_isel = ds.isel(lv_HYBL0=7) sp = subset_isel['UGRD_P0_L105_GLC0'].values.shape

And here is the relevant output: DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339210> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339b90> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339d50> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339d90> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339190> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339190> (50, 1059, 1799)

So, the BasicIndexer((7, slice(None, None, None), slice(None, None, None))) isn't getting decomposed correctly, it looks like?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365692868 https://github.com/pydata/xarray/pull/1899#issuecomment-365692868 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY5Mjg2OA== WeatherGod 291576 2018-02-14T18:02:17Z 2018-02-14T18:06:24Z CONTRIBUTOR

Ah, interesting... so, this dataset was created by doing an isel() on the original: ```

ds['UGRD_P0_L105_GLC0'] <xarray.DataArray 'UGRD_P0_L105_GLC0' (lv_HYBL0: 50, ygrid_0: 1059, xgrid_0: 1799)> [95257050 values with dtype=float32] Coordinates: * lv_HYBL0 (lv_HYBL0) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ... gridlat_0 (ygrid_0, xgrid_0) float32 ... gridlon_0 (ygrid_0, xgrid_0) float32 ... Dimensions without coordinates: ygrid_0, xgrid_0 `` So, the original data has a 50x1059x1799 grid, and the new indexer isn't properly composing the indexer so that it fetches [7, slice(None), slice(None)] when I grab it's.values`.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365689883 https://github.com/pydata/xarray/pull/1899#issuecomment-365689883 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY4OTg4Mw== WeatherGod 291576 2018-02-14T17:52:24Z 2018-02-14T17:52:24Z CONTRIBUTOR

I can also confirm that the shape comes out correctly using master, so this is definitely isolated to this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365689003 https://github.com/pydata/xarray/pull/1899#issuecomment-365689003 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY4OTAwMw== WeatherGod 291576 2018-02-14T17:49:20Z 2018-02-14T17:49:20Z CONTRIBUTOR

Hmm, came across a bug with the pynio backend. Working on making a reproducible example, but just for your own inspection, here is some logging output: <xarray.Dataset> Dimensions: (xgrid_0: 1799, ygrid_0: 1059) Coordinates: lv_HYBL0 float32 8.0 longitude (ygrid_0, xgrid_0) float32 ... latitude (ygrid_0, xgrid_0) float32 ... Dimensions without coordinates: xgrid_0, ygrid_0 Data variables: UGRD (ygrid_0, xgrid_0) float32 ... VGRD (ygrid_0, xgrid_0) float32 ... DEBUG:hiresWind.downscale:shape of a data: (50, 1059, 1799) The first bit is the repr of my DataSet. The last line is output of ds['UGRD'].values.shape. It is supposed to be 3D, not 2D.

If I revert back to v0.10.0, then the shape is (1059, 1799}, just as expected.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365657502 https://github.com/pydata/xarray/pull/1899#issuecomment-365657502 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY1NzUwMg== WeatherGod 291576 2018-02-14T16:13:16Z 2018-02-14T16:13:16Z CONTRIBUTOR

Oh, wow... this worked like a charm for the netcdf4 backend! I have a ~13GB (uncompressed) 4-D netcdf4 variable that was giving me trouble for slicing a 2D surface out of. Here is a snippet where I am grabbing data at random indices in the last dimension. First for a specific latitude, then for the entire domain. ```

CD_subset = rough['CD'][0] wind_inds_decorated <xarray.DataArray (latitude: 3501, longitude: 7001)> array([[33, 15, 25, ..., 52, 66, 35], [ 6, 8, 55, ..., 59, 6, 50], [54, 2, 40, ..., 32, 19, 9], ..., [53, 18, 23, ..., 19, 3, 43], [ 9, 11, 66, ..., 51, 39, 58], [21, 54, 37, ..., 3, 0, 65]]) Dimensions without coordinates: latitude, longitude foo = CD_subset.isel(latitude=0, wind_direction=wind_inds_decorated[0]) foo <xarray.DataArray 'CD' (longitude: 7001)> array([ 0.004052, 0.005915, 0.002771, ..., 0.005604, 0.004715, 0.002756], dtype=float32) Coordinates: scales int16 60 latitude float64 54.99 * longitude (longitude) float64 -130.0 -130.0 -130.0 -130.0 -130.0 ... wind_direction (longitude) int16 165 75 125 5 235 345 315 175 85 35 290 ... foo = CD_subset.isel(wind_direction=wind_inds_decorated) foo <xarray.DataArray 'CD' (latitude: 3501, longitude: 7001)> [24510501 values with dtype=float32] Coordinates: scales int16 60 * latitude (latitude) float64 54.99 54.98 54.97 54.96 54.95 54.95 ... * longitude (longitude) float64 -130.0 -130.0 -130.0 -130.0 -130.0 ... wind_direction (latitude, longitude) int64 165 75 125 5 235 345 315 175 ... ``` All previous attempts at this would result in having to load the entire 13GB array into memory just to get 93.5 MB out. Or, I would try to fetch each individual point, which took way too long. This worked faster than loading the entire thing into memory, and it used less memory, too (I think I maxed out at about 1.2GB of total usage, which is totally acceptable for my use case).

I will try out similar things with the pynio and rasterio backends, and get back to you. Thanks for this work!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
345310488 https://github.com/pydata/xarray/issues/1720#issuecomment-345310488 https://api.github.com/repos/pydata/xarray/issues/1720 MDEyOklzc3VlQ29tbWVudDM0NTMxMDQ4OA== WeatherGod 291576 2017-11-17T17:33:13Z 2017-11-17T17:33:13Z CONTRIBUTOR

Awesome! Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Possible regression with PyNIO data not being lazily loaded 274308380
345124033 https://github.com/pydata/xarray/issues/1720#issuecomment-345124033 https://api.github.com/repos/pydata/xarray/issues/1720 MDEyOklzc3VlQ29tbWVudDM0NTEyNDAzMw== WeatherGod 291576 2017-11-17T02:08:50Z 2017-11-17T02:08:50Z CONTRIBUTOR

Is there a convenient sentinel I can check for loaded-ness? The only reason I noticed this was I was debugging another problem with my processing of HRRR files (~600mb each) and the memory usage shot up (did you know that top will report memory usage as fractions of terabytes when you get high enough?). I could test this with some smaller netcdf4 files if I could just loop through the variables and assert some sentinal.

On Thu, Nov 16, 2017 at 8:57 PM, Stephan Hoyer notifications@github.com wrote:

@WeatherGod https://github.com/weathergod can you verify that you don't get immediate loading when loading netCDF files, e.g., with scipy or netCDF4-python?

We did change how loading of data works with printing in this release (

1532 https://github.com/pydata/xarray/pull/1532), but if anything the

changes should go the other way, to do less loading of data.

I'm having trouble debugging this locally because I can't seem to get a working version of pynio installed from conda-forge on OS X (running into various ABI incompatibility issues when I try this in a new conda environment).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/issues/1720#issuecomment-345122204, or mute the thread https://github.com/notifications/unsubscribe-auth/AARy-MO7la8KSJnQoto8Kso5gBYedUKQks5s3OgSgaJpZM4Qflk- .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Possible regression with PyNIO data not being lazily loaded 274308380
342576941 https://github.com/pydata/xarray/issues/475#issuecomment-342576941 https://api.github.com/repos/pydata/xarray/issues/475 MDEyOklzc3VlQ29tbWVudDM0MjU3Njk0MQ== WeatherGod 291576 2017-11-07T18:29:12Z 2017-11-07T18:29:12Z CONTRIBUTOR

Yeah, we need to move something forward, because the main benefit of xarray is the ability to manage datasets from multiple sources in a consistent way. And data from different sources will almost always be in different projections.

My current problem that I need to solve right now is that I am ingesting model data that is in a LCC projection and ingesting radar data that is in a simple regular lat/lon grid. Both dataset objects have latitude and longitude coordinate arrays, I just need to get both datasets to have the same lat/lon grid.

I guess I could continue using my old scipy-based solution (using map_coordinates() or RectBivariateSpline), but at the very least, it would make sense to have some documentation demonstrating how one might go about this very common problem, even if it is showing how to use the scipy-based tools with xarrays. If that is of interest, I can see what I can write up after I am done my immediate task.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API design for pointwise indexing 95114700
342553465 https://github.com/pydata/xarray/issues/475#issuecomment-342553465 https://api.github.com/repos/pydata/xarray/issues/475 MDEyOklzc3VlQ29tbWVudDM0MjU1MzQ2NQ== WeatherGod 291576 2017-11-07T17:11:49Z 2017-11-07T17:11:49Z CONTRIBUTOR

So, what has become the consensus for performing regridding/resampling? I see a lot of suggestions, but I have no sense of what is mature enough to use in production-level code. I also haven't seen anything in the documentation about this topic, even if it just refers people to another project.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  API design for pointwise indexing 95114700
147797539 https://github.com/pydata/xarray/pull/459#issuecomment-147797539 https://api.github.com/repos/pydata/xarray/issues/459 MDEyOklzc3VlQ29tbWVudDE0Nzc5NzUzOQ== WeatherGod 291576 2015-10-13T18:03:56Z 2015-10-13T18:03:56Z CONTRIBUTOR

That's all the time I have at the moment. I do have some more notes from my old, incomplete implementation, though. I'll try to finish the review tomorrow.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  add pynio backend 94100328
146976549 https://github.com/pydata/xarray/issues/615#issuecomment-146976549 https://api.github.com/repos/pydata/xarray/issues/615 MDEyOklzc3VlQ29tbWVudDE0Njk3NjU0OQ== WeatherGod 291576 2015-10-09T20:15:49Z 2015-10-09T20:15:49Z CONTRIBUTOR

hmm, good point. I wish I knew why I ended up using pd.to_timedelta() in the first place. Did numpy not support converting timedelta objects at one point?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  operations with pd.to_timedelta() now fails 110726841
60429213 https://github.com/pydata/xarray/issues/268#issuecomment-60429213 https://api.github.com/repos/pydata/xarray/issues/268 MDEyOklzc3VlQ29tbWVudDYwNDI5MjEz WeatherGod 291576 2014-10-24T18:27:30Z 2014-10-24T18:27:30Z CONTRIBUTOR

Note, I mean that I at first thought that collapsing variables into scalars was a useful feature, not that it would happen only for datasets and not data arrays.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  groupby reduction sometimes collapses variables into scalars 46768521
60425242 https://github.com/pydata/xarray/issues/267#issuecomment-60425242 https://api.github.com/repos/pydata/xarray/issues/267 MDEyOklzc3VlQ29tbWVudDYwNDI1MjQy WeatherGod 291576 2014-10-24T17:58:37Z 2014-10-24T17:58:37Z CONTRIBUTOR

So, is the string approach I used above to grab a single day's data a bug or a feature? It is a nice short-hand, but I don't want to rely on it if it isn't intended to be a feature. Similarly, if I supply a Year-Month string, I get data for that month.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  can't use datetime or pandas datetime to index time dimension 46756880
60413505 https://github.com/pydata/xarray/issues/267#issuecomment-60413505 https://api.github.com/repos/pydata/xarray/issues/267 MDEyOklzc3VlQ29tbWVudDYwNDEzNTA1 WeatherGod 291576 2014-10-24T16:37:26Z 2014-10-24T16:37:26Z CONTRIBUTOR

Gah, I am sorry, please disregard my last comment. I can't add/subtract...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  can't use datetime or pandas datetime to index time dimension 46756880
60413356 https://github.com/pydata/xarray/issues/267#issuecomment-60413356 https://api.github.com/repos/pydata/xarray/issues/267 MDEyOklzc3VlQ29tbWVudDYwNDEzMzU2 WeatherGod 291576 2014-10-24T16:36:18Z 2014-10-24T16:36:18Z CONTRIBUTOR

A bit of a further wrinkle is that date selection seems to be limited to local time only because of this limitation. Consider the following:

```

c['time'][:25] <xray.DataArray 'time' (time: 25)> array(['2013-01-01T06:15:00.000000000-0500', '2013-01-01T07:00:00.000000000-0500', '2013-01-01T08:00:00.000000000-0500', '2013-01-01T09:00:00.000000000-0500', '2013-01-01T10:00:00.000000000-0500', '2013-01-01T11:00:00.000000000-0500', '2013-01-01T12:00:00.000000000-0500', '2013-01-01T13:00:00.000000000-0500', '2013-01-01T14:00:00.000000000-0500', '2013-01-01T15:00:00.000000000-0500', '2013-01-01T16:00:00.000000000-0500', '2013-01-01T17:00:00.000000000-0500', '2013-01-01T18:00:00.000000000-0500', '2013-01-01T19:00:00.000000000-0500', '2013-01-01T20:00:00.000000000-0500', '2013-01-01T21:00:00.000000000-0500', '2013-01-01T22:00:00.000000000-0500', '2013-01-01T23:00:00.000000000-0500', '2013-01-02T00:00:00.000000000-0500', '2013-01-02T01:00:00.000000000-0500', '2013-01-02T02:00:00.000000000-0500', '2013-01-02T03:00:00.000000000-0500', '2013-01-02T04:00:00.000000000-0500', '2013-01-02T05:00:00.000000000-0500', '2013-01-02T06:00:00.000000000-0500'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2013-01-01T11:15:00 ... latitude float32 64.833 elevation float32 137.5 longitude float32 -147.6 c.sel(time='2013-01-01')['time'] <xray.DataArray 'time' (time: 13)> array(['2013-01-01T06:15:00.000000000-0500', '2013-01-01T07:00:00.000000000-0500', '2013-01-01T08:00:00.000000000-0500', '2013-01-01T09:00:00.000000000-0500', '2013-01-01T10:00:00.000000000-0500', '2013-01-01T11:00:00.000000000-0500', '2013-01-01T12:00:00.000000000-0500', '2013-01-01T13:00:00.000000000-0500', '2013-01-01T14:00:00.000000000-0500', '2013-01-01T15:00:00.000000000-0500', '2013-01-01T16:00:00.000000000-0500', '2013-01-01T17:00:00.000000000-0500', '2013-01-01T18:00:00.000000000-0500'], dtype='datetime64[ns]') Coordinates: * time (time) datetime64[ns] 2013-01-01T11:15:00 ... latitude float32 64.833 elevation float32 137.5 longitude float32 -147.6 ```

I don't know how I would (easily) slice this data array such as to grab only data for a UTC day.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  can't use datetime or pandas datetime to index time dimension 46756880
60404650 https://github.com/pydata/xarray/issues/185#issuecomment-60404650 https://api.github.com/repos/pydata/xarray/issues/185 MDEyOklzc3VlQ29tbWVudDYwNDA0NjUw WeatherGod 291576 2014-10-24T15:37:00Z 2014-10-24T15:37:00Z CONTRIBUTOR

May I propose a name? xray.glasses

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Plot methods 38109425
60399616 https://github.com/pydata/xarray/issues/264#issuecomment-60399616 https://api.github.com/repos/pydata/xarray/issues/264 MDEyOklzc3VlQ29tbWVudDYwMzk5NjE2 WeatherGod 291576 2014-10-24T15:04:23Z 2014-10-24T15:04:23Z CONTRIBUTOR

I should note that if an inner join is performed, then no NaNs are inserted and the arrays remain float32.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  align silently upcasts data arrays when NaNs are inserted 46745063
58570858 https://github.com/pydata/xarray/issues/214#issuecomment-58570858 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTcwODU4 WeatherGod 291576 2014-10-09T20:19:12Z 2014-10-09T20:19:12Z CONTRIBUTOR

Ok, I think I got it (for reals this time...)

``` def bcast(spat_only, coord_names): coords = [] for i, n in enumerate(coord_names): if spat_only[n].ndim != len(spat_only.dims): # Needs new axes slices = [np.newaxis] * len(spat_only.dims) slices[i] = slice(None) else: slices = [slice(None)] * len(spat_only.dims) coords.append(spat_only[n].values[slices]) return np.broadcast_arrays(*coords)

def grid_to_points2(grid, points, coord_names): if not coord_names: raise ValueError("No coordinate names provided") spat_dims = {d for n in coord_names for d in grid[n].dims} not_spatial = set(grid.dims) - spat_dims spatial_selection = {n:0 for n in not_spatial} spat_only = grid.isel(**spatial_selection)

coords = bcast(spat_only, coord_names)

kd = KDTree(zip(*[c.ravel() for c in coords]))
_, indx = kd.query(zip(*[points[n].values for n in coord_names]))
indx = np.unravel_index(indx, coords[0].shape)

return xray.concat(
        (grid.isel(**{n:j for n, j in zip(spat_only.dims, i)})
         for i in zip(*indx)),
        dim='station')

```

Needs a lot more tests and comments and such, but I think this works. Best part is that it seems to do a very decent job of keeping memory usage low, and only operates upon the coordinates that I specify. Everything else is left alone. So, I have used this on 4-D data, picking out grid points at specified lat/lon positions, and get back a 3D result (time, level, station). And I have used this on just 2D data, getting back just a 1D result (dimension='station').

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58568933 https://github.com/pydata/xarray/issues/214#issuecomment-58568933 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTY4OTMz WeatherGod 291576 2014-10-09T20:05:01Z 2014-10-09T20:05:01Z CONTRIBUTOR

Consider the following Dataset:

<xray.Dataset> Dimensions: (lv_HTGL1: 2, lv_HTGL3: 2, lv_HTGL5: 2, lv_HTGL6: 2, lv_ISBL0: 37, lv_SPDL2: 6, lv_SPDL4: 3, time: 9, xgrid_0: 451, ygrid_0: 337) Coordinates: * xgrid_0 (xgrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * ygrid_0 (ygrid_0) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ... * lv_ISBL0 (lv_ISBL0) float32 10000.0 12500.0 15000.0 17500.0 20000.0 ... * lv_HTGL6 (lv_HTGL6) float32 1000.0 4000.0 * lv_HTGL1 (lv_HTGL1) float32 2.0 80.0 * lv_HTGL3 (lv_HTGL3) float32 10.0 80.0 latitude (ygrid_0, xgrid_0) float32 16.281 16.3084 16.3356 16.3628 16.3898 ... longitude (ygrid_0, xgrid_0) float32 233.862 233.984 234.106 234.229 ... * lv_HTGL5 (lv_HTGL5) int64 0 1 * lv_SPDL2 (lv_SPDL2) int64 0 1 2 3 4 5 * lv_SPDL4 (lv_SPDL4) int64 0 1 2 * time (time) datetime64[ns] 2014-09-25T01:00:00 ... Variables: gridrot_0 (ygrid_0, xgrid_0) float32 -0.229676 -0.228775 -0.227873 ... TMP_P0_L103_GLC0 (time, lv_HTGL1, ygrid_0, xgrid_0) float64 295.8 295.7 295.7 295.7 ...

The latitude and longitude variables are both dependent upon xgrid_0 and ygrid_0. Meanwhile...

<xray.Dataset> Dimensions: (station: 120, time: 4) Coordinates: latitude (station) float32 34.805 34.795 34.585 36.705 34.245 34.915 34.195 36.075 ... * station (station) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ... sixhourly (time) int64 0 1 2 3 longitude (station) float32 -98.025 -96.665 -99.335 -98.705 -95.665 -98.295 ... * time (time) datetime64[ns] 2014-10-07 2014-10-07T06:00:00 ... Variables: MaxGust (station, time) float64 7.794 7.47 8.675 4.788 7.071 7.903 8.641 5.533 ...

the latitude and longitude variables are independent of each other (they are 1-D).

The variable in the first one can not be accessed directly by lat/lon values, while the MaxGust variable in the second one can. This poses some difficulties.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58565934 https://github.com/pydata/xarray/issues/214#issuecomment-58565934 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTY1OTM0 WeatherGod 291576 2014-10-09T19:43:08Z 2014-10-09T19:43:08Z CONTRIBUTOR

Hmmm, limitation that I just encountered. When there are dependent coordinates, the variables representing those coordinates are not the index arrays (and thus, are not "dimensions" either), so my solution is completely broken for dependent coordinates. If I were to go back to my DataArray-only solution, then I still need to correct the code to use the dimension names of the coordinate variables, and still need to fix the coordinates != dimensions issue.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58562506 https://github.com/pydata/xarray/issues/214#issuecomment-58562506 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTYyNTA2 WeatherGod 291576 2014-10-09T19:16:52Z 2014-10-09T19:16:52Z CONTRIBUTOR

to/from_dateframe just ate up all my memory. I think I am going to stick with my broadcasting approach...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58558069 https://github.com/pydata/xarray/issues/214#issuecomment-58558069 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTU4MDY5 WeatherGod 291576 2014-10-09T18:47:22Z 2014-10-09T18:47:22Z CONTRIBUTOR

oooh, didn't realize that dims is different for DataSet and DataArray... Gonna have to fix that, too. I am checking out the broadcasting functions you pointed out. The one limitation I see right away with xray.core.variable.broadcast_variables is that it is limited to two variables (presumedly, I would be broadcasting N number of coordinates because the variables may or may not have extraneous dimensions that I don't care to broadcast)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58553935 https://github.com/pydata/xarray/issues/214#issuecomment-58553935 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUzOTM1 WeatherGod 291576 2014-10-09T18:21:16Z 2014-10-09T18:21:16Z CONTRIBUTOR

And, actually, the example I gave above has a bug in the dependent dimension case. This one should be much better (not fully tested yet, though):

``` def grid_to_points2(grid, points, coord_names): if not coord_names: raise ValueError("No coordinate names provided") not_spatial = set(grid.dims) - set(coord_names) spatial_selection = {n:0 for n in not_spatial} spat_only = grid.isel(*spatial_selection) coords = [] for i, n in enumerate(spat_only.dims): if spat_only[n].ndim != len(spat_only.dims): # Needs new axes slices = [np.newaxis] * len(spat_only.dims) slices[i] = slice(None) else: slices = [slice(None)] * len(spat_only.dims) coords.append(spat_only[n].values[slices]) coords = np.broadcast_arrays(coords)

kd = KDTree(zip(*[c.flatten() for c in coords]))
_, indx = kd.query(zip(*[points[n].values for n in spat_only.dims]))
indx = np.unravel_index(indx, coords[0].shape)

return xray.concat(
        (grid.sel(**{n:c[i] for n, c in zip(spat_only.dims, coords)})
         for i in zip(*indx)),
        dim='station')

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58551759 https://github.com/pydata/xarray/issues/214#issuecomment-58551759 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUxNzU5 WeatherGod 291576 2014-10-09T18:06:56Z 2014-10-09T18:06:56Z CONTRIBUTOR

And, I think I just realized how I could generalize it even more. Right now, grid can only be a DataArray, but I would like this to work for a DataSet as well. I bet if I use .sel() instead of .isel() and access the elements of the broadcasted arrays, I could make this work very nicely for both DataArray and DataSet.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58550741 https://github.com/pydata/xarray/issues/214#issuecomment-58550741 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUwNzQx WeatherGod 291576 2014-10-09T18:00:33Z 2014-10-09T18:00:33Z CONTRIBUTOR

Oh, and it does take advantage of a bunch of python2.7 features such as dictionary comprehensions and generator statements, so...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
58550403 https://github.com/pydata/xarray/issues/214#issuecomment-58550403 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU4NTUwNDAz WeatherGod 291576 2014-10-09T17:58:25Z 2014-10-09T17:58:25Z CONTRIBUTOR

Starting using the above snippet for more datasets, some with interdependent coordinates and some without (so the coordinates would be 1-d). I think I have generalized it significantly...

``` def grid_to_points(grid, points, coord_names): not_spatial = set(grid.dims) - set(coord_names) spatial_selection = {n:0 for n in not_spatial} spat_only = grid.isel(*spatial_selection) coords = [] for i, n in enumerate(spat_only.dims): if spat_only[n].ndim != len(spat_only.dims): # Needs new axes slices = [np.newaxis] * len(spat_only.dims) slices[i] = slice(None) else: slices = [slice(None)] * len(spat_only.dims) coords.append(spat_only[n].values[slices]) coords = [c.flatten() for c in np.broadcast_arrays(coords)]

kd = KDTree(zip(*coords))
_, indx = kd.query(zip(*[points[n].values for n in spat_only.dims]))
indx = np.unravel_index(indx, spat_only.shape)

return xray.concat((grid.isel(**{n:j for n, j in zip(spat_only.dims, i)})
                    for i in zip(*indx)), dim='station')

```

I can still imagine some situations where this won't work, such as a requested set of dimensions that are a mix of dependent and independent variables. Currently, if the dimensions are independent, then the number of dimensions of each one is assumed to be 1 and np.newaxis is used for the others. Meanwhile, if the dimensions are dependent, then the number of dimensions for each one is assumed to be the same as the number of dependent variables and is merely flattened (the broadcast is essentially no-op).

I should also note that this is technically not restricted to spatial coordinates even though the code says so. Just anything that can be represented in euclidean space.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
57857522 https://github.com/pydata/xarray/issues/214#issuecomment-57857522 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU3ODU3NTIy WeatherGod 291576 2014-10-03T20:48:35Z 2014-10-03T20:48:35Z CONTRIBUTOR

Just managed to implement this using your suggestion for my data:

from scipy.spatial import cKDTree as KDTree kd = KDTree(zip(model['longitude'].values.ravel(), model['latitude'].values.ravel())) dists, indx = kd.query(zip(obs['longitude'], obs['latitude'])) indx = np.unravel_index(indx, mod['longitude'].shape) mod_points = xray.concat([mod.isel(x=x, y=y) for y, x in zip(*indx)], dim='station')

Not entirely certain why I needed to reverse y and x in that last part, but, oh well...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257
57847940 https://github.com/pydata/xarray/issues/214#issuecomment-57847940 https://api.github.com/repos/pydata/xarray/issues/214 MDEyOklzc3VlQ29tbWVudDU3ODQ3OTQw WeatherGod 291576 2014-10-03T19:56:16Z 2014-10-03T19:56:16Z CONTRIBUTOR

Unless I am missing something about xray, that selection operation could only work if pts had values that exactly matched coordinate values in ds. In most scenarios, that would not be the case. One would have to first build pts from a computation of nearest-neighbor indexs between the stations and the model grid.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Pointwise indexing -- something like sel_points 40395257

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 23.116ms · About: xarray-datasette