html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/2254#issuecomment-400825442,https://api.github.com/repos/pydata/xarray/issues/2254,400825442,MDEyOklzc3VlQ29tbWVudDQwMDgyNTQ0Mg==,1554921,2018-06-27T20:53:27Z,2018-06-27T20:53:27Z,CONTRIBUTOR,">So yes, it looks like we could fix this by checking chunks on each array independently like you suggest. There's no reason why all dask arrays need to have the same chunking for storing with to_netcdf().
I could throw together a pull request if that's all that's involved.
>This is because you need to indicate chunks for variables separately, via encoding: http://xarray.pydata.org/en/stable/io.html#writing-encoded-data
Thanks! I was able to write chunked output the netCDF file by adding `chunksizes` to the `encoding` attribute of the variables. I found I also had to specify `original_shape` as a workaround for #2198.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,336273865
https://github.com/pydata/xarray/issues/2242#issuecomment-399495668,https://api.github.com/repos/pydata/xarray/issues/2242,399495668,MDEyOklzc3VlQ29tbWVudDM5OTQ5NTY2OA==,1554921,2018-06-22T16:10:45Z,2018-06-22T16:10:45Z,CONTRIBUTOR,"True, I would expect _some_ performance hit due to writing chunk-by-chunk, however that same performance hit is present in both of the test cases.
In addition to the snippet @shoyer mentioned, I found that xarray also intentionally uses `autoclose=True` when writing chunks to netCDF:
https://github.com/pydata/xarray/blob/73b476e4db6631b2203954dd5b138cb650e4fb8c/xarray/backends/netCDF4_.py#L45-L48
However, `ensure_open` only uses `autoclose` if the file isn't already open:
https://github.com/pydata/xarray/blob/73b476e4db6631b2203954dd5b138cb650e4fb8c/xarray/backends/common.py#L496-L503
So if the file is already open before getting to `BaseNetCDF4Array__setitem__`, it will remain open. If the file isn't yet opened, it will be opened, but then immediately closed after writing the chunk. I suspect this is what's happening in the delayed version - the starting state of `NetCDF4DataStore._isopen` is `False` for some reason, and so it is doomed to re-close itself for each chunk processed.
If I remove the `autoclose=True` from `BaseNetCDF4Array__setitem__`, the file remains open and performance is comparable between the two tests.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,334633212
https://github.com/pydata/xarray/issues/1763#issuecomment-350292555,https://api.github.com/repos/pydata/xarray/issues/1763,350292555,MDEyOklzc3VlQ29tbWVudDM1MDI5MjU1NQ==,1554921,2017-12-08T15:34:01Z,2017-12-08T15:34:01Z,CONTRIBUTOR,"I think I've duplicated the logic from `_construct_dataarray` into `_encode_coordinates`. Test cases are passing, and my actual files are writing out properly. Hopefully nothing else got broken along the way.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,279832457
https://github.com/pydata/xarray/pull/1768#issuecomment-350090601,https://api.github.com/repos/pydata/xarray/issues/1768,350090601,MDEyOklzc3VlQ29tbWVudDM1MDA5MDYwMQ==,1554921,2017-12-07T20:51:27Z,2017-12-07T20:51:27Z,CONTRIBUTOR,"No fix yet, just added a test case.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,280274296
https://github.com/pydata/xarray/issues/1763#issuecomment-350015214,https://api.github.com/repos/pydata/xarray/issues/1763,350015214,MDEyOklzc3VlQ29tbWVudDM1MDAxNTIxNA==,1554921,2017-12-07T16:11:55Z,2017-12-07T16:11:55Z,CONTRIBUTOR,"I can try putting together a pull request, hopefully without breaking any existing use cases. I just tested switching the _any_ condition to _all_ in the above code, and it does fix my one test case...
..._However_, it breaks other cases, such as if there's another axis in the data (such as a time axis). I think the _all_ condition would require ""time"" to be one of the dimensions of the coordinates.
Here's an updated test case:
```python
import xarray as xr
import numpy as np
zeros1 = np.zeros((1,5,3))
zeros2 = np.zeros((1,6,3))
zeros3 = np.zeros((1,5,4))
d = xr.Dataset({
'lon1': (['x1','y1'], zeros1.squeeze(0), {}),
'lon2': (['x2','y1'], zeros2.squeeze(0), {}),
'lon3': (['x1','y2'], zeros3.squeeze(0), {}),
'lat1': (['x1','y1'], zeros1.squeeze(0), {}),
'lat2': (['x2','y1'], zeros2.squeeze(0), {}),
'lat3': (['x1','y2'], zeros3.squeeze(0), {}),
'foo1': (['time','x1','y1'], zeros1, {'coordinates': 'lon1 lat1'}),
'foo2': (['time','x2','y1'], zeros2, {'coordinates': 'lon2 lat2'}),
'foo3': (['time','x1','y2'], zeros3, {'coordinates': 'lon3 lat3'}),
'time': ('time', [0.], {'units': 'hours since 2017-01-01'}),
})
d = xr.conventions.decode_cf(d)
```
The resulting Dataset:
```
Dimensions: (time: 1, x1: 5, x2: 6, y1: 3, y2: 4)
Coordinates:
lat1 (x1, y1) float64 ...
* time (time) datetime64[ns] 2017-01-01
lat3 (x1, y2) float64 ...
lat2 (x2, y1) float64 ...
lon1 (x1, y1) float64 ...
lon3 (x1, y2) float64 ...
lon2 (x2, y1) float64 ...
Dimensions without coordinates: x1, x2, y1, y2
Data variables:
foo1 (time, x1, y1) float64 ...
foo2 (time, x2, y1) float64 ...
foo3 (time, x1, y2) float64 ...
```
saved to netCDF using
```python
d.to_netcdf(""test.nc"")
```
With the _any_ condition, I have too many coordinates:
```
~$ ncdump -h test.nc
netcdf test {
dimensions:
x1 = 5 ;
y1 = 3 ;
time = 1 ;
y2 = 4 ;
x2 = 6 ;
variables:
...
double foo1(time, x1, y1) ;
foo1:_FillValue = NaN ;
foo1:coordinates = ""lat1 lat3 lat2 lon1 lon3 lon2"" ;
double foo2(time, x2, y1) ;
foo2:_FillValue = NaN ;
foo2:coordinates = ""lon1 lon2 lat1 lat2"" ;
double foo3(time, x1, y2) ;
foo3:_FillValue = NaN ;
foo3:coordinates = ""lon1 lon3 lat1 lat3"" ;
...
}
```
With the _all_ condition, I don't get any variable coordinates (they're dumped into the global attributes):
```
~$ ncdump -h test.nc
netcdf test {
dimensions:
x1 = 5 ;
y1 = 3 ;
time = 1 ;
y2 = 4 ;
x2 = 6 ;
variables:
...
double foo1(time, x1, y1) ;
foo1:_FillValue = NaN ;
double foo2(time, x2, y1) ;
foo2:_FillValue = NaN ;
double foo3(time, x1, y2) ;
foo3:_FillValue = NaN ;
// global attributes:
:_NCProperties = ""version=1|netcdflibversion=4.4.1.1|hdf5libversion=1.8.18"" ;
:coordinates = ""lat1 lat3 lat2 lon1 lon3 lon2"" ;
}
```
So the update may be a bit more tricky to get right. I know the DataArray objects (foo1,foo2,foo3) already have the right coordinates associated with them before writing to netCDF, so maybe the logic in `_encode_coordinates` could be changed to utilize `v.coords` somehow? I'll see if I can get something working for my test cases...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,279832457