issue_comments
22 rows where issue = 291332965 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Drop coordinates on loading large dataset. · 22 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
575163859 | https://github.com/pydata/xarray/issues/1854#issuecomment-575163859 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDU3NTE2Mzg1OQ== | stale[bot] 26384082 | 2020-01-16T14:00:19Z | 2020-01-16T14:00:19Z | NONE | In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
365925282 | https://github.com/pydata/xarray/issues/1854#issuecomment-365925282 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NTkyNTI4Mg== | jamesstidard 1797906 | 2018-02-15T13:21:33Z | 2018-02-15T13:24:46Z | NONE | @rabernat Still seem to get a SIGKILL 9 (exit code 137) when trying to run with that pre-processor as well. Maybe my expectations of how it lazy loads files is too high. The machine I'm running on has 8GB or ram and the files in total are just under 1Tb |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
365896646 | https://github.com/pydata/xarray/issues/1854#issuecomment-365896646 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NTg5NjY0Ng== | jamesstidard 1797906 | 2018-02-15T11:12:48Z | 2018-02-15T11:12:48Z | NONE | @jhamman Here's the ```bash netcdf \34.128_1900_01_05_05 { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (124 currently) variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:0.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short sst(time, latitude, longitude) ; sst:scale_factor = 0.000552094668668839 ; sst:add_offset = 285.983000319853 ; sst:_FillValue = -32767s ; sst:missing_value = -32767s ; sst:units = "K" ; sst:long_name = "Sea surface temperature" ; // global attributes: :Conventions = "CF-1.6" ; :history = "2017-08-04 06:17:58 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data05/scratch/_mars-atls09-95e2cf679cd58ee9b4db4dd119a05a8d-gF5gxN.grib -o /data/data04/scratch/_grib2netcdf-atls01-a562cefde8a29a7288fa0b8b7f9413f7-VvH7PP.nc -utime" ; :_Format = "64-bit offset" ; } ``` Unfortunately removing the chunks didn't seem to help. I'm running with the pre-process workaround this morning to see if that completes. Sorry for the late response on this - been pretty busy. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364498649 | https://github.com/pydata/xarray/issues/1854#issuecomment-364498649 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ5ODY0OQ== | jhamman 2443309 | 2018-02-09T17:18:53Z | 2018-02-09T17:18:53Z | MEMBER | @rabernat - good points. @jamesstidard - perhaps you can a single files ncdump using the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364494085 | https://github.com/pydata/xarray/issues/1854#issuecomment-364494085 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ5NDA4NQ== | rabernat 1197350 | 2018-02-09T17:03:06Z | 2018-02-09T17:03:06Z | MEMBER | @jhamman, chunking in lat and lon should not be necessary here. My understanding is that dask/dask#2364 made sure that the indexing operation happens before the concat. One possibility is that the files have HDF-level chunking / compression, as discussed in #1440. That could be screwing this up. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364492783 | https://github.com/pydata/xarray/issues/1854#issuecomment-364492783 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ5Mjc4Mw== | jamesstidard 1797906 | 2018-02-09T16:58:42Z | 2018-02-09T16:58:42Z | NONE | I'll give both of those a shot. For hosting, the files are currently on a local drive and they sum to about 1Tb. I can probably host a couple examples though. Thanks again for the support. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364491330 | https://github.com/pydata/xarray/issues/1854#issuecomment-364491330 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ5MTMzMA== | jhamman 2443309 | 2018-02-09T16:53:57Z | 2018-02-09T16:53:57Z | MEMBER | @jamesstidard - let's see how the distributed scheduler plays: ```Python from distributed import Client client = Client() ds = xr.open_mfdataset('path/to/ncs/*.nc', chunks={'latitude': 50, 'longitude': 50}) recs = ds.sel(latitude=10, longitude=10).to_dataframe().to_records() ``` Also, it would be worth updating distributed before you use its scheduler. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364490209 | https://github.com/pydata/xarray/issues/1854#issuecomment-364490209 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ5MDIwOQ== | rabernat 1197350 | 2018-02-09T16:50:13Z | 2018-02-09T16:50:13Z | MEMBER | Also, maybe you can post this dataset somewhere online for us to play around with? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364489976 | https://github.com/pydata/xarray/issues/1854#issuecomment-364489976 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ4OTk3Ng== | rabernat 1197350 | 2018-02-09T16:49:30Z | 2018-02-09T16:49:30Z | MEMBER | Did you try my workaround? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364488847 | https://github.com/pydata/xarray/issues/1854#issuecomment-364488847 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ4ODg0Nw== | jamesstidard 1797906 | 2018-02-09T16:45:51Z | 2018-02-09T16:45:51Z | NONE | That run was killed with the output
Process finished with exit code 137 (interrupted by signal 9: SIGKILL) ``` I wasn't watching the machine at the time but I assume that's it falling over to memory pressure. Hi @jhamman, I'm using I'm just using whatever the default scheduler is as that's pretty much all the code I've got written above. I'm unsure how to do a performance check as the dataset can't even be fully loaded currently. I've tried different chuck sizes in the past hoping to stumble on a magic size, but have been unsuccessful with that. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364478761 | https://github.com/pydata/xarray/issues/1854#issuecomment-364478761 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ3ODc2MQ== | jhamman 2443309 | 2018-02-09T16:12:22Z | 2018-02-09T16:12:22Z | MEMBER | @jamesstidard - it would be good to know a few more details here:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364465016 | https://github.com/pydata/xarray/issues/1854#issuecomment-364465016 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ2NTAxNg== | rabernat 1197350 | 2018-02-09T15:26:40Z | 2018-02-09T15:26:40Z | MEMBER | The way this should work is that the selection of a single point should happen before the data is concatenated. It is up to dask to properly "fuse" these two operations. It seems like that is failing for some reason. As a temporary workaround, you could preprocess the data to only select the specific point before concatenating.
But you shouldn't have to do this to get good performance here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364463855 | https://github.com/pydata/xarray/issues/1854#issuecomment-364463855 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ2Mzg1NQ== | jamesstidard 1797906 | 2018-02-09T15:22:38Z | 2018-02-09T15:22:38Z | NONE | Sure, I'm running that now. I'll reply once/if it finished. Though watching my system monitor memory usage, it does not appear to be growing. I seem to remember the open function continually allocating itself more ram until it was killed. I'll take a read through that issue while I wait. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364462150 | https://github.com/pydata/xarray/issues/1854#issuecomment-364462150 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ2MjE1MA== | rabernat 1197350 | 2018-02-09T15:16:54Z | 2018-02-09T15:16:54Z | MEMBER | This sounds similar to #1396, which I thought was resolved (but is still marked as open). |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364461729 | https://github.com/pydata/xarray/issues/1854#issuecomment-364461729 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ2MTcyOQ== | rabernat 1197350 | 2018-02-09T15:15:28Z | 2018-02-09T15:15:28Z | MEMBER | Can you just try your full example without the |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364459162 | https://github.com/pydata/xarray/issues/1854#issuecomment-364459162 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ1OTE2Mg== | jamesstidard 1797906 | 2018-02-09T15:06:37Z | 2018-02-09T15:09:02Z | NONE | That's true, maybe I misread last time or it's month dependant. Hopefully this is what you're after - let me know if not. I used 3
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364456174 | https://github.com/pydata/xarray/issues/1854#issuecomment-364456174 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ1NjE3NA== | rabernat 1197350 | 2018-02-09T14:56:36Z | 2018-02-09T14:56:36Z | MEMBER | No, I meant this:
Also, your comment says that "127 is normally the size of the time dimension in each file", but the info you posted indicates that it's 248. Can you also try |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364451782 | https://github.com/pydata/xarray/issues/1854#issuecomment-364451782 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ1MTc4Mg== | jamesstidard 1797906 | 2018-02-09T14:40:20Z | 2018-02-09T14:40:20Z | NONE | Sure, this is the repr of a single file:
Thanks |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364447957 | https://github.com/pydata/xarray/issues/1854#issuecomment-364447957 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDQ0Nzk1Nw== | rabernat 1197350 | 2018-02-09T14:26:46Z | 2018-02-09T14:26:46Z | MEMBER | I am puzzled by this. Selecting a single point should not require loading into memory the whole dataset. Can you post the output of |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
364399084 | https://github.com/pydata/xarray/issues/1854#issuecomment-364399084 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2NDM5OTA4NA== | jamesstidard 1797906 | 2018-02-09T10:41:28Z | 2018-02-09T10:41:28Z | NONE | Sorry to bump this. Still looking to a solution to this problem if anyone has had a similar experience. Thanks. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
361576685 | https://github.com/pydata/xarray/issues/1854#issuecomment-361576685 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2MTU3NjY4NQ== | jamesstidard 1797906 | 2018-01-30T12:19:12Z | 2018-01-30T12:19:12Z | NONE | Hi @rabernat, thanks for the response. Sorry it's taken me a few days to get back to you. Here's the info dump of one of the files: ``` xarray.Dataset { dimensions: latitude = 361 ; longitude = 720 ; time = 248 ; variables: float32 longitude(longitude) ; longitude:units = degrees_east ; longitude:long_name = longitude ; float32 latitude(latitude) ; latitude:units = degrees_north ; latitude:long_name = latitude ; datetime64[ns] time(time) ; time:long_name = time ; float64 mwd(time, latitude, longitude) ; mwd:units = Degree true ; mwd:long_name = Mean wave direction ; // global attributes: :Conventions = CF-1.6 ; :history = 2017-08-09 18:15:34 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data05/scratch/_mars-atls02-70e05f9f8ba4e9d19932f1c45a7be8d8-Pwy6jZ.grib -o /data/data01/scratch/_grib2netcdf-atls02-95e2cf679cd58ee9b4db4dd119a05a8d-v4TKah.nc -utime ; ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 | |
360779298 | https://github.com/pydata/xarray/issues/1854#issuecomment-360779298 | https://api.github.com/repos/pydata/xarray/issues/1854 | MDEyOklzc3VlQ29tbWVudDM2MDc3OTI5OA== | rabernat 1197350 | 2018-01-26T13:04:31Z | 2018-01-26T13:04:31Z | MEMBER | Can you provide a bit more info about the structure of the individual files? Open a single file and call ds.info(), then paste the contents here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Drop coordinates on loading large dataset. 291332965 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4