html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2139#issuecomment-389620638,https://api.github.com/repos/pydata/xarray/issues/2139,389620638,MDEyOklzc3VlQ29tbWVudDM4OTYyMDYzOA==,1217238,2018-05-16T18:31:35Z,2018-05-16T18:31:35Z,MEMBER,"MetaCSV looks interesting but I haven't used it myself. My guess would be that it just wraps pandas/xarray for processing data, so I think it's unlikely to give a performance boost. It's more about a declarative way to specify how to load a CSV into pandas/xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323703742
https://github.com/pydata/xarray/issues/2139#issuecomment-389598338,https://api.github.com/repos/pydata/xarray/issues/2139,389598338,MDEyOklzc3VlQ29tbWVudDM4OTU5ODMzOA==,1217238,2018-05-16T17:20:03Z,2018-05-16T17:20:03Z,MEMBER,"If you don't want the full Cartesian product, you need to ensure that the index only contains the variables you want to expand into a grid, e.g., time, lat and lon.
If the problem is only running out of memory (which is indeed likely with 1e9 rows), then you'll need to think about a more clever way to convert the data. One good option might be to groups over subsets of the data (using dask or another parallel processing library like spark or beam), and write a bunch of smaller netCDF which you then open with xarray's `open_mfdataset()`. It's probably most convenient to split over time, e.g., into files for each day or month.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,323703742