issue_comments
6 rows where issue = 201617371 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date), updated_at (date)
issue 1
- Using where() in datasets with dataarrays with different dimensions results in huge RAM consumption · 6 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
457093294 | https://github.com/pydata/xarray/issues/1217#issuecomment-457093294 | https://api.github.com/repos/pydata/xarray/issues/1217 | MDEyOklzc3VlQ29tbWVudDQ1NzA5MzI5NA== | stale[bot] 26384082 | 2019-01-24T07:20:42Z | 2019-01-24T07:20:42Z | NONE | In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Using where() in datasets with dataarrays with different dimensions results in huge RAM consumption 201617371 | |
273687248 | https://github.com/pydata/xarray/issues/1217#issuecomment-273687248 | https://api.github.com/repos/pydata/xarray/issues/1217 | MDEyOklzc3VlQ29tbWVudDI3MzY4NzI0OA== | shoyer 1217238 | 2017-01-19T05:42:25Z | 2017-01-19T05:43:22Z | MEMBER | For reference, it may be helpful to try your example on a smaller dataset:
I suspect this probably isn't really doing what you want, unless you really want two-dimensional versions of Broadcasting producing gigantic arrays without any warning is really a NumPy issue, e.g., try |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Using where() in datasets with dataarrays with different dimensions results in huge RAM consumption 201617371 | |
273529203 | https://github.com/pydata/xarray/issues/1217#issuecomment-273529203 | https://api.github.com/repos/pydata/xarray/issues/1217 | MDEyOklzc3VlQ29tbWVudDI3MzUyOTIwMw== | jacklovell 4849151 | 2017-01-18T16:43:03Z | 2017-01-19T05:15:52Z | NONE | The problem isn't as bad with a smaller example (though the runtime is doubled). I've attached a minimum working example, which seems to suggest that maybe there was a problem with xarray creating a MultiIndex and duplicating all the data? (I've left in input() to allow checking memory usage before the program exists, but there isn't much difference in this example). xrmin.py.txt Edit by @shoyer: added code from attachment inline: ```python !/usr/bin/env python3import time import sys import numpy as np import xarray as xr ds = xr.Dataset() ds['data1'] = xr.DataArray(np.arange(1000), coords={'t1': np.linspace(0, 1, 1000)}) ds['data1b'] = xr.DataArray(np.arange(1000, 2000), coords={'t1': np.linspace(0, 1, 1000)}) ds['data2'] = xr.DataArray(np.arange(2000, 5000), coords={'t2': np.linspace(0, 1, 3000)}) ds['data2b'] = xr.DataArray(np.arange(6000, 9000), coords={'t2': np.linspace(0, 1, 3000)}) if sys.argv[1] == "nodrop": now = time.time() print(ds.where(ds.data1 < 50, drop=True)) print("Took {} seconds".format(time.time() - now)) elif sys.argv[1] == "drop": ds1 = ds.drop('t2') now = time.time() print(ds1.where(ds1.data1 < 50, drop=True)) print("Took {} seconds".format(time.time() - now)) input("Press return to exit") ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Using where() in datasets with dataarrays with different dimensions results in huge RAM consumption 201617371 | |
273544152 | https://github.com/pydata/xarray/issues/1217#issuecomment-273544152 | https://api.github.com/repos/pydata/xarray/issues/1217 | MDEyOklzc3VlQ29tbWVudDI3MzU0NDE1Mg== | fmaussion 10050469 | 2017-01-18T17:34:13Z | 2017-01-18T17:34:13Z | MEMBER |
I'll let @shoyer give a definitive answer here, but I don't think that ```python import xarray as xr import numpy as np d1 = xr.DataArray(np.arange(3), coords={'t1': np.linspace(0, 1, 3)}, dims='t1') d2 = xr.DataArray(np.arange(4), coords={'t2': np.linspace(0, 1, 4)}, dims='t2') d2 * d1 <xarray.DataArray (t2: 4, t1: 3)> array([[0, 0, 0], [0, 1, 2], [0, 2, 4], [0, 3, 6]]) Coordinates: * t2 (t2) float64 0.0 0.3333 0.6667 1.0 * t1 (t1) float64 0.0 0.5 1.0 d2.where(d1 == 1) <xarray.DataArray (t2: 4, t1: 3)> array([[ nan, 0., nan], [ nan, 1., nan], [ nan, 2., nan], [ nan, 3., nan]]) Coordinates: * t2 (t2) float64 0.0 0.3333 0.6667 1.0 * t1 (t1) float64 0.0 0.5 1.0 ``` which "makes sense", but is going to have a huge memory consumption if your arrays are large. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Using where() in datasets with dataarrays with different dimensions results in huge RAM consumption 201617371 | |
273523770 | https://github.com/pydata/xarray/issues/1217#issuecomment-273523770 | https://api.github.com/repos/pydata/xarray/issues/1217 | MDEyOklzc3VlQ29tbWVudDI3MzUyMzc3MA== | jacklovell 4849151 | 2017-01-18T16:25:19Z | 2017-01-18T16:25:19Z | NONE | data1 and data2 represent two stages of data acquisition within one "shot" of our experiment. I'd like to be able to group each shot's data into a single dataset. I want to extract from the dataset only the values for which my where() condition is true, and I'll only be using DataArrays which share the same dimension as the one in the condition. For example, if I do: ds_low = ds.where(ds.data1 < 0.1, drop=True) I'll only use stuff in ds_low with the same dimension as ds.data1. So in my case extracting the data with the shared dimension using ds.drop(<unused dim>) is appropriate. It would be nice to have xarray throw a warning or error to prevent me chomping up all the RAM in my system if I do try to do this sort of thing though. Or it could simply mask off with NaN everything in the DataArrays which have a different dimension. Give me a second to provide a minimal working example. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Using where() in datasets with dataarrays with different dimensions results in huge RAM consumption 201617371 | |
273520435 | https://github.com/pydata/xarray/issues/1217#issuecomment-273520435 | https://api.github.com/repos/pydata/xarray/issues/1217 | MDEyOklzc3VlQ29tbWVudDI3MzUyMDQzNQ== | fmaussion 10050469 | 2017-01-18T16:14:19Z | 2017-01-18T16:14:19Z | MEMBER | Thanks for the report! It would be great if you could be a bit more specific:
- if data1 and data2 are unrelated, why do you want to apply |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Using where() in datasets with dataarrays with different dimensions results in huge RAM consumption 201617371 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 4