html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/7587#issuecomment-1457723902,https://api.github.com/repos/pydata/xarray/issues/7587,1457723902,IC_kwDOAMm_X85W4xn-,54963611,2023-03-07T08:05:01Z,2023-03-07T08:05:01Z,NONE,"Thank you, 

Next time I will **triple** check and exclude those variables from being expanded in dimension.

Thank you for your time.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1457345587,https://api.github.com/repos/pydata/xarray/issues/7587,1457345587,IC_kwDOAMm_X85W3VQz,39069044,2023-03-07T01:34:12Z,2023-03-07T01:34:12Z,CONTRIBUTOR,"Your `m0tot` variable is also being broadcast in the `fami` dimension. So, an additional 10x384x1233x8/1e6=37MB.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1457131609,https://api.github.com/repos/pydata/xarray/issues/7587,1457131609,IC_kwDOAMm_X85W2hBZ,54963611,2023-03-06T22:35:50Z,2023-03-06T22:38:01Z,NONE,"Dear Slevang,

Thank you very much for your reply, I was indeed trying the same without the `wshedOut `variable.
Deleting this, the problem that the dataset increases too much seems to be of less impact, indeed I can use the xr.where on larger dataset, however:

`(a.nbytes - da_fam_bulk_noWshed.nbytes)/1000000`
**37.87776 MB**

The two datasets (`a` is after the xr.where of the `da_fam_bulk_noWshed ` dataset) without this variable differ by about 37MB, being `a` bigger than the original.
This small increment for me is important due to the fact that I have more than 1000 files.

There is a solution? 

Thank you a lot,
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1457080267,https://api.github.com/repos/pydata/xarray/issues/7587,1457080267,IC_kwDOAMm_X85W2UfL,39069044,2023-03-06T22:06:11Z,2023-03-06T22:06:11Z,CONTRIBUTOR,Same issue as #1234. This has tripped me up before as well. A kwarg to control this behavior would be a nice enhancement to `.where()`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1457061064,https://api.github.com/repos/pydata/xarray/issues/7587,1457061064,IC_kwDOAMm_X85W2PzI,39069044,2023-03-06T21:55:14Z,2023-03-06T21:55:14Z,CONTRIBUTOR,"Since you're using `tp` (dims `fami, time, site`) as the condition, these dimensions are broadcast across all other variables in the dataset. The problem looks to be your  variable `wshedOut`, which is now broadcast across all 5 dimensions in the dataset, hence greatly increased memory usage.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1456417836,https://api.github.com/repos/pydata/xarray/issues/7587,1456417836,IC_kwDOAMm_X85Wzyws,54963611,2023-03-06T16:07:00Z,2023-03-06T16:07:00Z,NONE,"Thank you so much for your very quick reply, 

The files are `.nc` files (netCDF), generated with xarray, 
Here there is the Panoply screenshot:
![image](https://user-images.githubusercontent.com/54963611/223158205-c4c9c559-af21-407c-bba3-413709c78711.png)

This is the `display(a)`
![image](https://user-images.githubusercontent.com/54963611/223161326-aaa4e2e6-e4c3-4015-916d-a9aec7ef75ce.png)

I double-checked the data and they seem to be float64.

As you said, they do not change dtype and using only a variable, this is the result:
`da_fam_bulk['tp'].nbytes`
`41665536`
`xr.where(da_fam_bulk['tp'] != 0,da_fam_bulk['tp'],np.nan).nbytes`
`41665536`

So using only one variable the problem disappears.

`dm`
`41665536`
`xr.where`
`41665536`
`tp`
`41665536`
`xr.where`
`41665536`
`gamma_best`
`41665536`
`xr.where`
`41665536`
`m0`
`41665536`
`xr.where`
`41665536`

I checked all the variables, the problem exists only when using the whole dataset.

Do you have any suggestion?

Thank you,

","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1456139058,https://api.github.com/repos/pydata/xarray/issues/7587,1456139058,IC_kwDOAMm_X85Wyusy,14808389,2023-03-06T13:29:14Z,2023-03-06T13:29:14Z,MEMBER,"thanks, that helps. However, it does not confirm my suspicion since all data variables are already in `float64`, and thus they shouldn't change dtypes. Could you also post the `repr` (either the text or the html repr should be sufficient) of `a`, and maybe also the file type of the file you're loading the dataset from (`ifileFamBulk`)?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1456097975,https://api.github.com/repos/pydata/xarray/issues/7587,1456097975,IC_kwDOAMm_X85Wykq3,54963611,2023-03-06T13:05:34Z,2023-03-06T13:12:20Z,NONE,"Thank you very much for your fast reply,

`repr(da_fam_bulk)`

```
<xarray.Dataset>
Dimensions:     (fami: 11, site: 1233, freq: 32, dir: 24, time: 384)
Coordinates:
  * fami        (fami) int64 1 2 3 4 5 6 7 8 9 10 11
  * site        (site) int64 51 54 72 75 90 93 ... 7004 7006 7049 7052 7094 7128
    lat         (site) float32 ...
    lon         (site) float64 ...
  * freq        (freq) float64 0.0373 0.04103 0.04513 ... 0.5917 0.6509 0.7159
  * dir         (dir) float64 0.0 15.0 30.0 45.0 ... 300.0 315.0 330.0 345.0
  * time        (time) datetime64[ns] 1989-01-01 1989-02-01 ... 2020-12-01
Data variables:
    dm          (fami, time, site) float64 ...
    tp          (fami, time, site) float64 ...
    gamma_best  (fami, time, site) float64 ...
    m0          (fami, time, site) float64 0.04069 0.0 0.04612 ... 0.0 0.0 0.0
    tm02        (fami, time, site) float64 ...
    hs          (fami, time, site) float64 0.8068 0.0 0.8591 0.0 ... 0.0 0.0 0.0
    SI          (fami, time, site) float64 ...
    dp          (fami, time, site) float64 ...
    m0tot       (time, site) float64 0.04069 0.004237 0.04612 ... 0.1219 0.08013
    m0_m0tot    (fami, time, site) float64 1.0 0.0 1.0 0.0 ... 0.0 0.0 0.0 0.0
    wshedOut    (freq, dir, site) float64 ...
```
I am not using a notebook, however, I paste here the screenshot of the notebook:
![image](https://user-images.githubusercontent.com/54963611/223118172-c39f6e34-85c9-4031-9cfc-24b2b7ac56ed.png)

Thank you,
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905
https://github.com/pydata/xarray/issues/7587#issuecomment-1456063239,https://api.github.com/repos/pydata/xarray/issues/7587,1456063239,IC_kwDOAMm_X85WycMH,14808389,2023-03-06T12:41:24Z,2023-03-06T12:41:49Z,MEMBER,"I can't really tell from the information you posted so far. Could you post the `repr` of `da_fam_bulk` (`print(da_fam_bulk)` or `display(da_fam_bulk)` using `ipython` / `jupyter`, plus maybe a screenshot of the HTML repr if you're in a notebook)?

I do suspect, however, that `da_fam_bulk` has a dtype that is not `float64`, but `where` will use `float64(nan)` as a fill value, casting the entire array to a dtype that has a much higher memory usage.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1611288905