id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 244016361,MDU6SXNzdWUyNDQwMTYzNjE=,1483,Loss of coordinate information from groupby.apply() on a stacked object,17701232,open,0,,,5,2017-07-19T11:59:48Z,2020-10-04T16:09:22Z,,NONE,,,,"I use this stack, groupby, unstack quite frequently. e.g. [here](https://gist.github.com/rabernat/bc4c6990eb20942246ce967e6c9c3dbe) An issue I have is that after `groupby('allpoints').apply()`, the coordinate names do not get carried through. i.e. the coordinate names are now: `allpoints_level_0` and `allpoints_level_1`. Then after `unstacking` I rename them back to lat/lon etc. Do you ever encounter this? Is there a way to carry them through and is this an issue for others? ``` import xarray as xr import numpy as np ds = xr.DataArray(np.ndarray((180,360,2000)), coords={'lat':np.arange(90,-90,-1), 'lon':np.arange(-180,180), 'time':range(2000)}) ds array([[[ 0.623891, -0.044304, ..., 1.015785, 0.009088], [-0.7375 , 0.380369, ..., 0.788351, -0.69295 ], ..., [ 0.171894, 0.517164, ..., -0.946908, -0.597802], [ 0.353743, 0.005539, ..., -1.436965, -0.190099]], .... Coordinates: * lat (lat) int32 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 ... * lon (lon) int32 -180 -179 -178 -177 -176 -175 -174 -173 -172 -171 ... * time (time) int32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 .. ``` Now we stack the data by allpoints. **Note that the info about original coordinates (lat / lon) is still there...** `dst = ds.stack(allpoints=['lat','lon'])` ``` array([[ 0.623891, -0.7375 , 0.053525, ..., 0.379701, 0.130618, 0.11094 ], [-0.044304, 0.380369, -0.410632, ..., -0.739881, 0.203219, -0.506303], [-1.762024, -1.019424, 2.580218, ..., 1.491677, 1.189149, -0.072223], ..., [-0.896298, 0.333163, -1.751641, ..., 1.90315 , 2.642813, -0.913787], [ 1.015785, 0.788351, 0.379997, ..., 0.864934, 0.889001, -1.363458], [ 0.009088, -0.69295 , -1.276184, ..., 1.220656, 0.895599, 0.848757]]) Coordinates: * time (time) int32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ... * allpoints (allpoints) MultiIndex - lat (allpoints) int64 90 90 90 90 90 90 90 90 90 90 90 90 90 90 ... - lon (allpoints) int64 -180 -179 -178 -177 -176 -175 -174 -173 ... ``` Now apply `groupby().apply()` `dsg=dst.groupby('allpoints').apply(my_custom_function) ` ``` array([ 0.013697, 0.006272, 0.009744, ..., -0.016265, -0.002108, -0.014733]) Coordinates: * allpoints (allpoints) MultiIndex - allpoints_level_0 (allpoints) int64 -89 -89 -89 -89 -89 -89 -89 -89 -89 ... - allpoints_level_1 (allpoints) int64 -180 -179 -178 -177 -176 -175 -174 ... ``` So now we have lost the `'lat','lon'`. However **if we skip the groupby part** and go straight to `unstack`, this would be carried through. `dst.unstack('allpoints')` ``` array([[[ 0.623891, -0.7375 , ..., 0.171894, 0.353743], [ 1.780691, -0.747431, ..., 0.038754, 0.615228], ..., Coordinates: * time (time) int32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... * lat (lat) int64 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 ... * lon (lon) int64 -180 -179 -178 -177 -176 -175 -174 -173 -172 -171 ... ``` ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1483/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,,13221727,issue 179969119,MDU6SXNzdWUxNzk5NjkxMTk=,1019,groupby_bins: exclude bin or assign bin with nan when bin has no values,17701232,closed,0,,,10,2016-09-29T07:09:02Z,2016-10-03T21:54:38Z,2016-10-03T15:22:15Z,NONE,,,,"When using groupby_bins there are cases where no values are found for some of the bins specified. Currently, it appears that in these cases, the bin is skipped, with no value neither a bin entry added to the output dataarray. Is there a way to identify which bins have been skipped. Or preferably, is it possible to have an option to include those bins, but with nan values. This would make comparing two dataarrays easier in cases where despite the same bin intervals as inputs, the outputs result in dataarrays with different variable and coordinates lengths. ``` import xarray as xr var = xr.open_dataset('c:\\users\\saveMWE.nc') pop = xr.open_dataset('c:\\users\\savePOP.nc') # binns includes very small bin to test this binns = [-100, -50, 0, 50, 50.00001, 100] binned = pop.p2010T.groupby_bins(var.EnsembleMean, binns).sum() print binned print binned.EnsembleMean_bins ``` In this case, no data falls in the 4th bin between 50 and 50.00001. ``` array([ 2.64352214e+09, 3.46869168e+09, 3.08998110e+08, 1.48247440e+07]) Coordinates: * EnsembleMean_bins (EnsembleMean_bins) object '(0, 50]' '(-50, 0]' ... array(['(0, 50]', '(-50, 0]', '(51, 100]', '(-100, -50]'], dtype=object) ``` Obviously one can count the lengths but this doesn't indicate which bin was skipped. An option to include the empty bin with a nan value would be useful! Thanks [bins_example.zip](https://github.com/pydata/xarray/files/499952/bins_example.zip) ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1019/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 155741762,MDU6SXNzdWUxNTU3NDE3NjI=,851,xr.concat and xr.to_netcdf new filesize,17701232,closed,0,,,4,2016-05-19T13:51:17Z,2016-05-20T08:08:44Z,2016-05-19T21:13:04Z,NONE,,,,"I am having an issue whereby I read in two very similar netcdfs. I concatenate them through one dimension (time), and write back to a new netcdf. However the new filesize is enourmous, and I can't work out why. More details in stackoverflow question here: http://stackoverflow.com/questions/37324106/python-xarray-concat-new-file-size Thanks ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/851/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue