github: issue_comments: 22 rows where issue = 291332965 sorted by updated

22 rows where issue = 291332965 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
575163859	https://github.com/pydata/xarray/issues/1854#issuecomment-575163859	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDU3NTE2Mzg1OQ==	stale[bot] 26384082	2020-01-16T14:00:19Z	2020-01-16T14:00:19Z	NONE	In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the `stale` label; otherwise it will be marked as closed automatically	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
365925282	https://github.com/pydata/xarray/issues/1854#issuecomment-365925282	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NTkyNTI4Mg==	jamesstidard 1797906	2018-02-15T13:21:33Z	2018-02-15T13:24:46Z	NONE	@rabernat Still seem to get a SIGKILL 9 (exit code 137) when trying to run with that pre-processor as well. Maybe my expectations of how it lazy loads files is too high. The machine I'm running on has 8GB or ram and the files in total are just under 1Tb	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
365896646	https://github.com/pydata/xarray/issues/1854#issuecomment-365896646	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NTg5NjY0Ng==	jamesstidard 1797906	2018-02-15T11:12:48Z	2018-02-15T11:12:48Z	NONE	@jhamman Here's the `ncdump` of one of the resource files: ```bash netcdf \34.128_1900_01_05_05 { dimensions: longitude = 720 ; latitude = 361 ; time = UNLIMITED ; // (124 currently) variables: float longitude(longitude) ; longitude:units = "degrees_east" ; longitude:long_name = "longitude" ; float latitude(latitude) ; latitude:units = "degrees_north" ; latitude:long_name = "latitude" ; int time(time) ; time:units = "hours since 1900-01-01 00:00:0.0" ; time:long_name = "time" ; time:calendar = "gregorian" ; short sst(time, latitude, longitude) ; sst:scale_factor = 0.000552094668668839 ; sst:add_offset = 285.983000319853 ; sst:_FillValue = -32767s ; sst:missing_value = -32767s ; sst:units = "K" ; sst:long_name = "Sea surface temperature" ; // global attributes: :Conventions = "CF-1.6" ; :history = "2017-08-04 06:17:58 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data05/scratch/_mars-atls09-95e2cf679cd58ee9b4db4dd119a05a8d-gF5gxN.grib -o /data/data04/scratch/_grib2netcdf-atls01-a562cefde8a29a7288fa0b8b7f9413f7-VvH7PP.nc -utime" ; :_Format = "64-bit offset" ; } ``` Unfortunately removing the chunks didn't seem to help. I'm running with the pre-process workaround this morning to see if that completes. Sorry for the late response on this - been pretty busy.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364498649	https://github.com/pydata/xarray/issues/1854#issuecomment-364498649	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ5ODY0OQ==	jhamman 2443309	2018-02-09T17:18:53Z	2018-02-09T17:18:53Z	MEMBER	@rabernat - good points. @jamesstidard - perhaps you can a single files ncdump using the `ncdump -h -s filename.nc` syntax. That should tell us how the file is chunked on disk.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364494085	https://github.com/pydata/xarray/issues/1854#issuecomment-364494085	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ5NDA4NQ==	rabernat 1197350	2018-02-09T17:03:06Z	2018-02-09T17:03:06Z	MEMBER	@jhamman, chunking in lat and lon should not be necessary here. My understanding is that dask/dask#2364 made sure that the indexing operation happens before the concat. One possibility is that the files have HDF-level chunking / compression, as discussed in #1440. That could be screwing this up.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364492783	https://github.com/pydata/xarray/issues/1854#issuecomment-364492783	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ5Mjc4Mw==	jamesstidard 1797906	2018-02-09T16:58:42Z	2018-02-09T16:58:42Z	NONE	I'll give both of those a shot. For hosting, the files are currently on a local drive and they sum to about 1Tb. I can probably host a couple examples though. Thanks again for the support.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364491330	https://github.com/pydata/xarray/issues/1854#issuecomment-364491330	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ5MTMzMA==	jhamman 2443309	2018-02-09T16:53:57Z	2018-02-09T16:53:57Z	MEMBER	@jamesstidard - let's see how the distributed scheduler plays: ```Python from distributed import Client client = Client() ds = xr.open_mfdataset('path/to/ncs/*.nc', chunks={'latitude': 50, 'longitude': 50}) recs = ds.sel(latitude=10, longitude=10).to_dataframe().to_records() ``` Also, it would be worth updating distributed before you use its scheduler.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364490209	https://github.com/pydata/xarray/issues/1854#issuecomment-364490209	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ5MDIwOQ==	rabernat 1197350	2018-02-09T16:50:13Z	2018-02-09T16:50:13Z	MEMBER	Also, maybe you can post this dataset somewhere online for us to play around with?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364489976	https://github.com/pydata/xarray/issues/1854#issuecomment-364489976	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ4OTk3Ng==	rabernat 1197350	2018-02-09T16:49:30Z	2018-02-09T16:49:30Z	MEMBER	Did you try my workaround?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364488847	https://github.com/pydata/xarray/issues/1854#issuecomment-364488847	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ4ODg0Nw==	jamesstidard 1797906	2018-02-09T16:45:51Z	2018-02-09T16:45:51Z	NONE	That run was killed with the output ``bash ~/.pyenv/versions/3.4.6/lib/python3.4/site-packages/xarray/core/dtypes.py:23: FutureWarning: Conversion of the second argument of issubdtype fromfloat`to`np.floating`is deprecated. In future, it will be treated as`np.float64 == np.dtype(float).type`. if np.issubdtype(dtype, float): Process finished with exit code 137 (interrupted by signal 9: SIGKILL) ``` I wasn't watching the machine at the time but I assume that's it falling over to memory pressure. Hi @jhamman, I'm using `0.10.0` of `xarray` with `dask` `0.16.1` and `distrobuted` `1.18.0`. I realise that last one is out of date, I will update and retry. I'm just using whatever the default scheduler is as that's pretty much all the code I've got written above. I'm unsure how to do a performance check as the dataset can't even be fully loaded currently. I've tried different chuck sizes in the past hoping to stumble on a magic size, but have been unsuccessful with that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364478761	https://github.com/pydata/xarray/issues/1854#issuecomment-364478761	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ3ODc2MQ==	jhamman 2443309	2018-02-09T16:12:22Z	2018-02-09T16:12:22Z	MEMBER	@jamesstidard - it would be good to know a few more details here: what dask scheduler you're using (you might also try with the distributed scheduler) what versions of dask/distributed/etc you're using how using a smaller chunk size in space (latitude and longitude) impacts performance	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364465016	https://github.com/pydata/xarray/issues/1854#issuecomment-364465016	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ2NTAxNg==	rabernat 1197350	2018-02-09T15:26:40Z	2018-02-09T15:26:40Z	MEMBER	The way this should work is that the selection of a single point should happen before the data is concatenated. It is up to dask to properly "fuse" these two operations. It seems like that is failing for some reason. As a temporary workaround, you could preprocess the data to only select the specific point before concatenating. `python def select_point(ds): return ds.sel(latitude=10, longitude=10) ds = xr.open_mfdataset('*.nc', preprocesses=select_point)` But you shouldn't have to do this to get good performance here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364463855	https://github.com/pydata/xarray/issues/1854#issuecomment-364463855	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ2Mzg1NQ==	jamesstidard 1797906	2018-02-09T15:22:38Z	2018-02-09T15:22:38Z	NONE	Sure, I'm running that now. I'll reply once/if it finished. Though watching my system monitor memory usage, it does not appear to be growing. I seem to remember the open function continually allocating itself more ram until it was killed. I'll take a read through that issue while I wait.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364462150	https://github.com/pydata/xarray/issues/1854#issuecomment-364462150	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ2MjE1MA==	rabernat 1197350	2018-02-09T15:16:54Z	2018-02-09T15:16:54Z	MEMBER	This sounds similar to #1396, which I thought was resolved (but is still marked as open).	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364461729	https://github.com/pydata/xarray/issues/1854#issuecomment-364461729	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ2MTcyOQ==	rabernat 1197350	2018-02-09T15:15:28Z	2018-02-09T15:15:28Z	MEMBER	Can you just try your full example without the `chunks` argument and see if it works any better?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364459162	https://github.com/pydata/xarray/issues/1854#issuecomment-364459162	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ1OTE2Mg==	jamesstidard 1797906	2018-02-09T15:06:37Z	2018-02-09T15:09:02Z	NONE	That's true, maybe I misread last time or it's month dependant. Hopefully this is what you're after - let me know if not. I used 3 `.nc` files to make this, with the snippet you posted above. `bash <xarray.Dataset> Dimensions: (time: 728) Coordinates: longitude float32 10.0 latitude float32 10.0 time (time) datetime64[ns] 1992-01-01 1992-01-01T03:00:00 ... Data variables: mwp (time) float64 dask.array<shape=(728,), chunksize=(127,)> Attributes: Conventions: CF-1.6 history: 2017-08-10 04:58:48 GMT by grib_to_netcdf-2.4.0: grib_to_ne...` If you're after the entire dataset, I should be able to get that but may take some time.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364456174	https://github.com/pydata/xarray/issues/1854#issuecomment-364456174	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ1NjE3NA==	rabernat 1197350	2018-02-09T14:56:36Z	2018-02-09T14:56:36Z	MEMBER	No, I meant this: `python ds = xr.open_mfdataset('path/to/ncs/*.nc', chunks={'time': 127}) ds_point = ds.sel(latitude=10, longitude=10) repr(ds_point)` Also, your comment says that "127 is normally the size of the time dimension in each file", but the info you posted indicates that it's 248. Can you also try `open_mfsdataset` without the `chunks` argument?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364451782	https://github.com/pydata/xarray/issues/1854#issuecomment-364451782	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ1MTc4Mg==	jamesstidard 1797906	2018-02-09T14:40:20Z	2018-02-09T14:40:20Z	NONE	Sure, this is the repr of a single file: `bash <xarray.Dataset> Dimensions: (time: 248) Coordinates: longitude float32 10.0 latitude float32 10.0 * time (time) datetime64[ns] 2004-12-01 2004-12-01T03:00:00 ... Data variables: mwd (time) float64 dask.array<shape=(248,), chunksize=(248,)> Attributes: Conventions: CF-1.6 history: 2017-08-09 16:22:56 GMT by grib_to_netcdf-2.4.0: grib_to_ne...` Thanks	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364447957	https://github.com/pydata/xarray/issues/1854#issuecomment-364447957	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDQ0Nzk1Nw==	rabernat 1197350	2018-02-09T14:26:46Z	2018-02-09T14:26:46Z	MEMBER	I am puzzled by this. Selecting a single point should not require loading into memory the whole dataset. Can you post the output of `repr(ds.sel(latitude=10, longitude=10))`?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
364399084	https://github.com/pydata/xarray/issues/1854#issuecomment-364399084	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2NDM5OTA4NA==	jamesstidard 1797906	2018-02-09T10:41:28Z	2018-02-09T10:41:28Z	NONE	Sorry to bump this. Still looking to a solution to this problem if anyone has had a similar experience. Thanks.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
361576685	https://github.com/pydata/xarray/issues/1854#issuecomment-361576685	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2MTU3NjY4NQ==	jamesstidard 1797906	2018-01-30T12:19:12Z	2018-01-30T12:19:12Z	NONE	Hi @rabernat, thanks for the response. Sorry it's taken me a few days to get back to you. Here's the info dump of one of the files: ``` xarray.Dataset { dimensions: latitude = 361 ; longitude = 720 ; time = 248 ; variables: float32 longitude(longitude) ; longitude:units = degrees_east ; longitude:long_name = longitude ; float32 latitude(latitude) ; latitude:units = degrees_north ; latitude:long_name = latitude ; datetime64[ns] time(time) ; time:long_name = time ; float64 mwd(time, latitude, longitude) ; mwd:units = Degree true ; mwd:long_name = Mean wave direction ; // global attributes: :Conventions = CF-1.6 ; :history = 2017-08-09 18:15:34 GMT by grib_to_netcdf-2.4.0: grib_to_netcdf /data/data05/scratch/_mars-atls02-70e05f9f8ba4e9d19932f1c45a7be8d8-Pwy6jZ.grib -o /data/data01/scratch/_grib2netcdf-atls02-95e2cf679cd58ee9b4db4dd119a05a8d-v4TKah.nc -utime ; ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965
360779298	https://github.com/pydata/xarray/issues/1854#issuecomment-360779298	https://api.github.com/repos/pydata/xarray/issues/1854	MDEyOklzc3VlQ29tbWVudDM2MDc3OTI5OA==	rabernat 1197350	2018-01-26T13:04:31Z	2018-01-26T13:04:31Z	MEMBER	Can you provide a bit more info about the structure of the individual files? Open a single file and call ds.info(), then paste the contents here.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Drop coordinates on loading large dataset. 291332965

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);