home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

17 rows where issue = 295838143 and user = 291576 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: reactions, created_at (date), updated_at (date)

user 1

  • WeatherGod · 17 ✖

issue 1

  • Vectorized lazy indexing · 17 ✖

author_association 1

  • CONTRIBUTOR 17
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
370986433 https://github.com/pydata/xarray/pull/1899#issuecomment-370986433 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM3MDk4NjQzMw== WeatherGod 291576 2018-03-07T01:08:36Z 2018-03-07T01:08:36Z CONTRIBUTOR

:tada:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
367077311 https://github.com/pydata/xarray/pull/1899#issuecomment-367077311 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NzA3NzMxMQ== WeatherGod 291576 2018-02-20T18:43:56Z 2018-02-20T18:43:56Z CONTRIBUTOR

I did some more investigation into the memory usage problem I was having. I had assumed that the vectorized indexed result of a lazily indexed data array would be an in-memory array. So, when I then started to use the result, it was then doing a read of all the data at once, resulting in a near-complete load of the data into memory.

I have adjusted my code to chunk out the indexing in order to keep the memory usage under control at reasonable performance penalty. I haven't looked into trying to identify the ideal chunking scheme to follow for an arbitrary dataarray and indexing. Perhaps we can make that a task for another day. At this point, I am satisfied with the features (negative step-sizes aside, of course).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366379465 https://github.com/pydata/xarray/pull/1899#issuecomment-366379465 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3OTQ2NQ== WeatherGod 291576 2018-02-16T22:40:06Z 2018-02-16T22:40:06Z CONTRIBUTOR

Ah-hah! Ok, so, the problem isn't some weird difference between the two examples I gave. The issue is that calling np.asarray(foo) triggered a full loading of the data!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366376400 https://github.com/pydata/xarray/pull/1899#issuecomment-366376400 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3NjQwMA== WeatherGod 291576 2018-02-16T22:25:59Z 2018-02-16T22:25:59Z CONTRIBUTOR

huh... now I am not so sure about that... must be something else triggering the load.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366374917 https://github.com/pydata/xarray/pull/1899#issuecomment-366374917 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3NDkxNw== WeatherGod 291576 2018-02-16T22:19:08Z 2018-02-16T22:19:08Z CONTRIBUTOR

also, at this point, I don't know if this is limited to the netcdf4 backend, as this type of indexing was only done on a variable I have in a netcdf file. I don't have 4-D variables in other file types.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366374041 https://github.com/pydata/xarray/pull/1899#issuecomment-366374041 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3NDA0MQ== WeatherGod 291576 2018-02-16T22:14:49Z 2018-02-16T22:14:49Z CONTRIBUTOR

CD by the way, has dimensions of scales, latitude, longitude, wind_direction.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366373479 https://github.com/pydata/xarray/pull/1899#issuecomment-366373479 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM3MzQ3OQ== WeatherGod 291576 2018-02-16T22:12:18Z 2018-02-16T22:12:18Z CONTRIBUTOR

Ah, not a change in behavior, but a possible bug exposed by a tiny change on my part. So, I have a 4D data array, CD and a data array for indexing, wind_inds. The following does not trigger a full loading: CD[0][wind_direction=wind_inds], which is good! But, this does: CD[scales=0, wind_direction=wind_inds], which is bad.

So, somehow, the indexing system is effectively treating these two things as different.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366363419 https://github.com/pydata/xarray/pull/1899#issuecomment-366363419 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM2MzQxOQ== WeatherGod 291576 2018-02-16T21:28:09Z 2018-02-16T21:28:09Z CONTRIBUTOR

correction... the problem isn't with pynio... it is in the netcdf4 backend

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366360382 https://github.com/pydata/xarray/pull/1899#issuecomment-366360382 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjM2MDM4Mg== WeatherGod 291576 2018-02-16T21:15:17Z 2018-02-16T21:15:17Z CONTRIBUTOR

Something changed. Now the indexing for pynio is forcing a full loading of the data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
366059694 https://github.com/pydata/xarray/pull/1899#issuecomment-366059694 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NjA1OTY5NA== WeatherGod 291576 2018-02-15T20:59:20Z 2018-02-15T20:59:20Z CONTRIBUTOR

I can confirm that with the latest changes, the pynio tests now pass locally for me. Now, as to whether or not the tests in there are actually exercising anything useful is a different question.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365729433 https://github.com/pydata/xarray/pull/1899#issuecomment-365729433 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTcyOTQzMw== WeatherGod 291576 2018-02-14T20:07:55Z 2018-02-14T20:07:55Z CONTRIBUTOR

I am working on re-activating those tests. I think PyNio is now available for python3, too.

On Wed, Feb 14, 2018 at 2:59 PM, Joe Hamman notifications@github.com wrote:

@WeatherGod https://github.com/weathergod - you are right, all the pynio tests are being skipped on travis. I'll open a separate issue for that. Yikes!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pydata/xarray/pull/1899#issuecomment-365727175, or mute the thread https://github.com/notifications/unsubscribe-auth/AARy-PE0F4-EugBO18rhnrogkZN1MLUOks5tUzssgaJpZM4R_x5o .

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365722413 https://github.com/pydata/xarray/pull/1899#issuecomment-365722413 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTcyMjQxMw== WeatherGod 291576 2018-02-14T19:43:07Z 2018-02-14T19:43:07Z CONTRIBUTOR

It looks like the pynio backend isn't regularly tested, as several of them currently fail when I run the tests locally. Some of them are failing because they are asserting NotImplementedErrors that are now implemented.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365708385 https://github.com/pydata/xarray/pull/1899#issuecomment-365708385 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTcwODM4NQ== WeatherGod 291576 2018-02-14T18:55:43Z 2018-02-14T18:55:43Z CONTRIBUTOR

Just did some more debugging, putting in some debug statements within NioArrayWrapper.__getitem__(): ``` diff --git a/xarray/backends/pynio_.py b/xarray/backends/pynio_.py index c7e0ddf..b9f7151 100644 --- a/xarray/backends/pynio_.py +++ b/xarray/backends/pynio_.py @@ -27,16 +27,24 @@ class NioArrayWrapper(BackendArray): return self.datastore.ds.variables[self.variable_name]

 def __getitem__(self, key):
  • import logging
  • logger = logging.getLogger(name)
  • logger.addHandler(logging.NullHandler())
  • logger.debug("initial key: %s", key) key, np_inds = indexing.decompose_indexer(key, self.shape, mode='outer')
  • logger.debug("Decomposed indexers:\n%s\n%s", key, np_inds)
     with self.datastore.ensure_open(autoclose=True):
         array = self.get_array()
    
    • logger.debug("initial array: %r", array) if key == () and self.ndim == 0: return array.get_value()
       for ind in np_inds:
      
      • logger.debug("indexer: %s", ind) array = indexing.NumpyIndexingAdapter(array)[ind]
      • logger.debug("intermediate array: %r", array)

        return array

```

And here is the test script (data not included): import logging import xarray as xr logging.basicConfig(level=logging.DEBUG) fname1 = '../hrrr.t12z.wrfnatf02.grib2' ds = xr.open_dataset(fname1, engine='pynio') subset_isel = ds.isel(lv_HYBL0=7) sp = subset_isel['UGRD_P0_L105_GLC0'].values.shape

And here is the relevant output: DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339210> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339b90> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339d50> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((slice(None, None, None),)) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((slice(None, None, None),)) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339d90> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339190> DEBUG:xarray.backends.pynio_:initial key: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) DEBUG:xarray.backends.pynio_:Decomposed indexers: BasicIndexer((7, slice(None, None, None), slice(None, None, None))) () DEBUG:xarray.backends.pynio_:initial array: <Nio.NioVariable object at 0x7f0f3c339190> (50, 1059, 1799)

So, the BasicIndexer((7, slice(None, None, None), slice(None, None, None))) isn't getting decomposed correctly, it looks like?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365692868 https://github.com/pydata/xarray/pull/1899#issuecomment-365692868 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY5Mjg2OA== WeatherGod 291576 2018-02-14T18:02:17Z 2018-02-14T18:06:24Z CONTRIBUTOR

Ah, interesting... so, this dataset was created by doing an isel() on the original: ```

ds['UGRD_P0_L105_GLC0'] <xarray.DataArray 'UGRD_P0_L105_GLC0' (lv_HYBL0: 50, ygrid_0: 1059, xgrid_0: 1799)> [95257050 values with dtype=float32] Coordinates: * lv_HYBL0 (lv_HYBL0) float32 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ... gridlat_0 (ygrid_0, xgrid_0) float32 ... gridlon_0 (ygrid_0, xgrid_0) float32 ... Dimensions without coordinates: ygrid_0, xgrid_0 `` So, the original data has a 50x1059x1799 grid, and the new indexer isn't properly composing the indexer so that it fetches [7, slice(None), slice(None)] when I grab it's.values`.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365689883 https://github.com/pydata/xarray/pull/1899#issuecomment-365689883 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY4OTg4Mw== WeatherGod 291576 2018-02-14T17:52:24Z 2018-02-14T17:52:24Z CONTRIBUTOR

I can also confirm that the shape comes out correctly using master, so this is definitely isolated to this PR.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365689003 https://github.com/pydata/xarray/pull/1899#issuecomment-365689003 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY4OTAwMw== WeatherGod 291576 2018-02-14T17:49:20Z 2018-02-14T17:49:20Z CONTRIBUTOR

Hmm, came across a bug with the pynio backend. Working on making a reproducible example, but just for your own inspection, here is some logging output: <xarray.Dataset> Dimensions: (xgrid_0: 1799, ygrid_0: 1059) Coordinates: lv_HYBL0 float32 8.0 longitude (ygrid_0, xgrid_0) float32 ... latitude (ygrid_0, xgrid_0) float32 ... Dimensions without coordinates: xgrid_0, ygrid_0 Data variables: UGRD (ygrid_0, xgrid_0) float32 ... VGRD (ygrid_0, xgrid_0) float32 ... DEBUG:hiresWind.downscale:shape of a data: (50, 1059, 1799) The first bit is the repr of my DataSet. The last line is output of ds['UGRD'].values.shape. It is supposed to be 3D, not 2D.

If I revert back to v0.10.0, then the shape is (1059, 1799}, just as expected.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143
365657502 https://github.com/pydata/xarray/pull/1899#issuecomment-365657502 https://api.github.com/repos/pydata/xarray/issues/1899 MDEyOklzc3VlQ29tbWVudDM2NTY1NzUwMg== WeatherGod 291576 2018-02-14T16:13:16Z 2018-02-14T16:13:16Z CONTRIBUTOR

Oh, wow... this worked like a charm for the netcdf4 backend! I have a ~13GB (uncompressed) 4-D netcdf4 variable that was giving me trouble for slicing a 2D surface out of. Here is a snippet where I am grabbing data at random indices in the last dimension. First for a specific latitude, then for the entire domain. ```

CD_subset = rough['CD'][0] wind_inds_decorated <xarray.DataArray (latitude: 3501, longitude: 7001)> array([[33, 15, 25, ..., 52, 66, 35], [ 6, 8, 55, ..., 59, 6, 50], [54, 2, 40, ..., 32, 19, 9], ..., [53, 18, 23, ..., 19, 3, 43], [ 9, 11, 66, ..., 51, 39, 58], [21, 54, 37, ..., 3, 0, 65]]) Dimensions without coordinates: latitude, longitude foo = CD_subset.isel(latitude=0, wind_direction=wind_inds_decorated[0]) foo <xarray.DataArray 'CD' (longitude: 7001)> array([ 0.004052, 0.005915, 0.002771, ..., 0.005604, 0.004715, 0.002756], dtype=float32) Coordinates: scales int16 60 latitude float64 54.99 * longitude (longitude) float64 -130.0 -130.0 -130.0 -130.0 -130.0 ... wind_direction (longitude) int16 165 75 125 5 235 345 315 175 85 35 290 ... foo = CD_subset.isel(wind_direction=wind_inds_decorated) foo <xarray.DataArray 'CD' (latitude: 3501, longitude: 7001)> [24510501 values with dtype=float32] Coordinates: scales int16 60 * latitude (latitude) float64 54.99 54.98 54.97 54.96 54.95 54.95 ... * longitude (longitude) float64 -130.0 -130.0 -130.0 -130.0 -130.0 ... wind_direction (latitude, longitude) int64 165 75 125 5 235 345 315 175 ... ``` All previous attempts at this would result in having to load the entire 13GB array into memory just to get 93.5 MB out. Or, I would try to fetch each individual point, which took way too long. This worked faster than loading the entire thing into memory, and it used less memory, too (I think I maxed out at about 1.2GB of total usage, which is totally acceptable for my use case).

I will try out similar things with the pynio and rasterio backends, and get back to you. Thanks for this work!

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Vectorized lazy indexing 295838143

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 18.574ms · About: xarray-datasette