id,node_id,number,title,user,state,locked,assignee,milestone,comments,created_at,updated_at,closed_at,author_association,active_lock_reason,draft,pull_request,body,reactions,performed_via_github_app,state_reason,repo,type 196541604,MDU6SXNzdWUxOTY1NDE2MDQ=,1173,Some queries,7300413,closed,0,,,11,2016-12-19T22:53:32Z,2019-01-13T06:27:38Z,2019-01-13T06:00:22Z,NONE,,,,"Hello @shoyer @pwolfram @mrocklin @rabernat , I was trying to write a design/requirements doc with ref. to the Columbia meetup, and I had a few queries, on which I wanted your inputs (basically to ask whether they make sense or not!) 1. If you serialize a labeled n-d data array using netCDF or HFD5, it gets written into a single file, which is not really a good option if you want to eventually do distributed processing of the data. Things like HDFS/lustreFS can split files, but that is not really what we want. How do you think this issue could be solved within the xarray+dask framework? * is it a matter of adding some code to the dataset.to_netcdf() method or adding a new method that would split the DataArray (based on some user guidelines) into multiple files? * Or does it make more sense to add a new serialization format like Zarr? 2. Continuing along similar lines, how does xarray+dask currently decide on how to distribute the workload between dask workers? are there any heuristics to handle data locality? or does experience say that network I/O is fast enough that this is not an issue? I'm asking this question because of this article by Matt: http://blaze.pydata.org/blog/2015/10/28/distributed-hdfs/ * If this is desirable, how would one go about implementing it?","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1173/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 274233261,MDU6SXNzdWUyNzQyMzMyNjE=,1717,colorbars in facet grids,7300413,closed,0,,,6,2017-11-15T17:06:15Z,2018-10-25T16:06:53Z,2018-10-25T16:06:53Z,NONE,,,,"Hello, In the 0.9.6 version, it does not appear to be possible to pass any arguments to the colorbar plotting routine. https://github.com/pydata/xarray/blob/8267fdb1093bba3934a172cf71128470698279cd/xarray/plot/facetgrid.py#L239 explicitly sets set_colorbar = False, which makes sense. However, if we want horizontal colorbars, or any way of adjusting the colorbar plotted (it is huge and unwieldy), it would be good if the plotting routine checks for and passes suitable arguments to https://github.com/pydata/xarray/blob/8267fdb1093bba3934a172cf71128470698279cd/xarray/plot/facetgrid.py#L256 I tried hacking something together, I can do something like the following now: ```python import xarray import matplotlib.pyplot as plt data = xarray.open_dataset('/data/ERSST/sst.mnmean.old.nc').sst data = data.loc[dict(time=slice('1999-1', '1999-4'))] data.plot.contourf(col='time', col_wrap=2, levels=12, cbar_kwargs=dict(orientation='horizontal', pad=0.1, aspect=30, shrink=0.6, ticks=[0, 10, 20 ,30])) ``` which produces: ![figure_1](https://user-images.githubusercontent.com/7300413/32849428-673c0aec-ca2f-11e7-8330-62872d96148a.png) Is something like this available in the development version? If not, and it seems like a useful feature, I can create a PR. Joy ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1717/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 219184224,MDU6SXNzdWUyMTkxODQyMjQ=,1351,Creating a 2D DataArray,7300413,closed,0,,,5,2017-04-04T09:04:37Z,2017-04-04T16:19:12Z,2017-04-04T15:52:31Z,NONE,,,,"Hello, I think I'm missing something simple here. I tried looking at the documentation, but no luck. I'm trying to create DataArrays whose coordinates are two dimensional as follows ```python from xarray import DataArray import numpy as np x_physical = DataArray(np.ones((2,2)), dims = ['x_logical', 'y_logical']) y_physical = DataArray(np.ones((2,2)), dims = ['x_logical', 'y_logical']) new_array = DataArray(np.zeros((2,2)), dims=['x_logical','y_logical'], coords=[x_physical, y_physical]) ``` trying to follow the multidimensional example in the docs. This gives me a ValueError: 'x_logical' has more than 1-dimension and the same name as one of its dimensions ('x_logical', 'y_logical'). xarray disallows such variables because they conflict with the coordinates used to label dimensions. I have tried multiple variants of this: * replace ``x/y_logical`` in the arguments to creating ``new_array`` with something else, which gives me a ValueError: dimensions ('x',) must have the same length as the number of data dimensions, ndim=2 and some other variants which in hindsight make no sense. Is there something I'm missing, and is creating multidimensional DataArrays documented somewhere? TIA, Joy","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1351/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 214088387,MDU6SXNzdWUyMTQwODgzODc=,1308,Using groupby with custom index,7300413,closed,0,,,8,2017-03-14T14:24:11Z,2017-03-15T15:32:34Z,2017-03-15T15:32:34Z,NONE,,,,"Hello, I have 6 hourly data (ERA Interim) for around 10 years. I want to calculate the annual 6 hourly climatology, i.e, 366*4 values, with each value corresponding to a 6 hourly interval. I am chunking the data along longitude. I'm using xarray 0.9.1 with Python 3.6 (Anaconda). For a daily climatology on this data, I do the usual: ```python mean = data.groupby('time.dayofyear').mean(dim='time').compute() ``` For the 6 hourly version, I am trying the following: ```python test = (data['time.hour']/24 + data['time.dayofyear']) test.name = 'dayHourly' new_test = data.groupby(test).mean(dim='time').compute() ``` The first one (daily climatology) takes around 15 minutes for my data, whereas the second one ran for almost 30 minutes after which I gave up and killed the process. Is there some obvious reason why the first is much faster than the second? ```data``` in both cases is the 6 hourly dataset. And is there an alternative way of expressing this computation which would make it faster? TIA, Joy","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/1308/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 158212793,MDU6SXNzdWUxNTgyMTI3OTM=,866,Drawing only one contour,7300413,closed,0,,,8,2016-06-02T18:50:36Z,2016-07-20T17:16:27Z,2016-07-20T17:16:27Z,NONE,,,,"Hello, I was trying to draw only a single contour by passing levels=[0], and nothing gets plotted. I checked utils.py, and the logic used to calculate **n_colors** in **_build_discrete_cmap** gives **n_colors**=0, since it will first set **extend** to 'neither', and so **ext_n** = 0, and n_colors = len(levels) + ext_n - 1 I'm not sure, but this might be the issue. Another issue, which might be unrelated, is when I'm trying to draw two contours. It plots only one contour, and it will plot two contours only if I set **norm**=None. Any suggestions? TIA, Joy ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/866/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 143422096,MDU6SXNzdWUxNDM0MjIwOTY=,803,Unable to reference variable,7300413,closed,0,,,2,2016-03-25T04:40:35Z,2016-03-26T03:59:24Z,2016-03-26T03:59:24Z,NONE,,,,"Hello, I was trying to use xarray to access the MERRA-2 monthly dataset. This dataset provides nc4 files, one for each month, with multiple analyzed variables (U,V,T, etc.,) in each file. if I open one file and try to access the zonal wind (U) as ``` python data = xarray.open_dataset('/data/MERRA2/MERRA2_100.instM_3d_ana_Np.198001.nc4') data = data.U ``` This works just fine. However, if I use ``` python data = data.T ``` I get back the entire dataset, i.e, I then have data.T.T, and data.T.T.T etc., I can open a single file using netCDF4, and the values seem ok. This seems to happen only with the temperature (T) field. all other fields seem to behave ok. Any ideas as to what is happening? TIA, Joy ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/803/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 120681918,MDU6SXNzdWUxMjA2ODE5MTg=,672,Making xray use multiple cores,7300413,closed,0,,,5,2015-12-07T01:41:17Z,2015-12-07T09:33:18Z,2015-12-07T09:33:17Z,NONE,,,,"Hello, I was trying out the 'chunks' argument to open dataset so that I could use the out-of-core functionality. It works very well, but when I run top I see only one core being utilised. Is there some argument I need to pass to make it use more cores? TIA, Joy ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/672/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 69141510,MDU6SXNzdWU2OTE0MTUxMA==,393,JJAS?,7300413,closed,0,,,2,2015-04-17T13:41:35Z,2015-04-19T00:00:56Z,2015-04-18T06:02:04Z,NONE,,,,"Hello, I noticed that you have added the 'time.season' attribute in the latest version of xray. Thanks! Those of us who study monsoons, especially the South Asian one, define the monsoon season as JJAS, which is not a valid value for time.season. How can one implement this sort of selection? TIA, Joy ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/393/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue 59467251,MDU6SXNzdWU1OTQ2NzI1MQ==,349,Query about concat,7300413,closed,0,,,11,2015-03-02T11:07:58Z,2015-04-10T06:16:02Z,2015-03-03T05:45:50Z,NONE,,,,"Hello, I have multiple nc files, and I want to pick one variable from all of them to write to a separate file, and if possible pick one vertical level. The issue is that it has no aggregation dimension, so MFDataset does not work. The idea is to get all data about one variable from one vertical level into a single file. When I use the example in the netCDF4-python website, concat merges all variables along all dimensions, making the in-memory size really large. I'm new to xray, and I was hoping something of this sort can be done. In fact, I don't really need to write it to a new file. Even if I can get one ""descriptor"" (instead of an array of Dataset objects) to access my data, I will be quite happy! TIA, Joy ","{""url"": ""https://api.github.com/repos/pydata/xarray/issues/349/reactions"", ""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,completed,13221727,issue