home / github

Menu
  • GraphQL API
  • Search all tables

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 1611288905 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • OttavioM 4
  • slevang 3
  • keewis 2

author_association 3

  • NONE 4
  • CONTRIBUTOR 3
  • MEMBER 2

issue 1

  • xr.where increase the bytes of the dataset · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1457723902 https://github.com/pydata/xarray/issues/7587#issuecomment-1457723902 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W4xn- OttavioM 54963611 2023-03-07T08:05:01Z 2023-03-07T08:05:01Z NONE

Thank you,

Next time I will triple check and exclude those variables from being expanded in dimension.

Thank you for your time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1457345587 https://github.com/pydata/xarray/issues/7587#issuecomment-1457345587 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W3VQz slevang 39069044 2023-03-07T01:34:12Z 2023-03-07T01:34:12Z CONTRIBUTOR

Your m0tot variable is also being broadcast in the fami dimension. So, an additional 10x384x1233x8/1e6=37MB.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1457131609 https://github.com/pydata/xarray/issues/7587#issuecomment-1457131609 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W2hBZ OttavioM 54963611 2023-03-06T22:35:50Z 2023-03-06T22:38:01Z NONE

Dear Slevang,

Thank you very much for your reply, I was indeed trying the same without the wshedOutvariable. Deleting this, the problem that the dataset increases too much seems to be of less impact, indeed I can use the xr.where on larger dataset, however:

(a.nbytes - da_fam_bulk_noWshed.nbytes)/1000000 37.87776 MB

The two datasets (a is after the xr.where of the da_fam_bulk_noWshed dataset) without this variable differ by about 37MB, being a bigger than the original. This small increment for me is important due to the fact that I have more than 1000 files.

There is a solution?

Thank you a lot,

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1457080267 https://github.com/pydata/xarray/issues/7587#issuecomment-1457080267 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W2UfL slevang 39069044 2023-03-06T22:06:11Z 2023-03-06T22:06:11Z CONTRIBUTOR

Same issue as #1234. This has tripped me up before as well. A kwarg to control this behavior would be a nice enhancement to .where().

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1457061064 https://github.com/pydata/xarray/issues/7587#issuecomment-1457061064 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85W2PzI slevang 39069044 2023-03-06T21:55:14Z 2023-03-06T21:55:14Z CONTRIBUTOR

Since you're using tp (dims fami, time, site) as the condition, these dimensions are broadcast across all other variables in the dataset. The problem looks to be your variable wshedOut, which is now broadcast across all 5 dimensions in the dataset, hence greatly increased memory usage.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1456417836 https://github.com/pydata/xarray/issues/7587#issuecomment-1456417836 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85Wzyws OttavioM 54963611 2023-03-06T16:07:00Z 2023-03-06T16:07:00Z NONE

Thank you so much for your very quick reply,

The files are .nc files (netCDF), generated with xarray, Here there is the Panoply screenshot:

This is the display(a)

I double-checked the data and they seem to be float64.

As you said, they do not change dtype and using only a variable, this is the result: da_fam_bulk['tp'].nbytes 41665536 xr.where(da_fam_bulk['tp'] != 0,da_fam_bulk['tp'],np.nan).nbytes 41665536

So using only one variable the problem disappears.

dm 41665536 xr.where 41665536 tp 41665536 xr.where 41665536 gamma_best 41665536 xr.where 41665536 m0 41665536 xr.where 41665536

I checked all the variables, the problem exists only when using the whole dataset.

Do you have any suggestion?

Thank you,

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1456139058 https://github.com/pydata/xarray/issues/7587#issuecomment-1456139058 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85Wyusy keewis 14808389 2023-03-06T13:29:14Z 2023-03-06T13:29:14Z MEMBER

thanks, that helps. However, it does not confirm my suspicion since all data variables are already in float64, and thus they shouldn't change dtypes. Could you also post the repr (either the text or the html repr should be sufficient) of a, and maybe also the file type of the file you're loading the dataset from (ifileFamBulk)?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1456097975 https://github.com/pydata/xarray/issues/7587#issuecomment-1456097975 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85Wykq3 OttavioM 54963611 2023-03-06T13:05:34Z 2023-03-06T13:12:20Z NONE

Thank you very much for your fast reply,

repr(da_fam_bulk)

<xarray.Dataset> Dimensions: (fami: 11, site: 1233, freq: 32, dir: 24, time: 384) Coordinates: * fami (fami) int64 1 2 3 4 5 6 7 8 9 10 11 * site (site) int64 51 54 72 75 90 93 ... 7004 7006 7049 7052 7094 7128 lat (site) float32 ... lon (site) float64 ... * freq (freq) float64 0.0373 0.04103 0.04513 ... 0.5917 0.6509 0.7159 * dir (dir) float64 0.0 15.0 30.0 45.0 ... 300.0 315.0 330.0 345.0 * time (time) datetime64[ns] 1989-01-01 1989-02-01 ... 2020-12-01 Data variables: dm (fami, time, site) float64 ... tp (fami, time, site) float64 ... gamma_best (fami, time, site) float64 ... m0 (fami, time, site) float64 0.04069 0.0 0.04612 ... 0.0 0.0 0.0 tm02 (fami, time, site) float64 ... hs (fami, time, site) float64 0.8068 0.0 0.8591 0.0 ... 0.0 0.0 0.0 SI (fami, time, site) float64 ... dp (fami, time, site) float64 ... m0tot (time, site) float64 0.04069 0.004237 0.04612 ... 0.1219 0.08013 m0_m0tot (fami, time, site) float64 1.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0 wshedOut (freq, dir, site) float64 ... I am not using a notebook, however, I paste here the screenshot of the notebook:

Thank you,

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905
1456063239 https://github.com/pydata/xarray/issues/7587#issuecomment-1456063239 https://api.github.com/repos/pydata/xarray/issues/7587 IC_kwDOAMm_X85WycMH keewis 14808389 2023-03-06T12:41:24Z 2023-03-06T12:41:49Z MEMBER

I can't really tell from the information you posted so far. Could you post the repr of da_fam_bulk (print(da_fam_bulk) or display(da_fam_bulk) using ipython / jupyter, plus maybe a screenshot of the HTML repr if you're in a notebook)?

I do suspect, however, that da_fam_bulk has a dtype that is not float64, but where will use float64(nan) as a fill value, casting the entire array to a dtype that has a much higher memory usage.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.where increase the bytes of the dataset  1611288905

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 14.429ms · About: xarray-datasette