html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,performed_via_github_app,issue
https://github.com/pydata/xarray/issues/6688#issuecomment-1153302528,https://api.github.com/repos/pydata/xarray/issues/6688,1153302528,IC_kwDOAMm_X85EvgAA,7799184,2022-06-12T21:56:10Z,2022-06-12T21:56:10Z,CONTRIBUTOR,"That works thanks. I just checked the [example in the docs](https://docs.xarray.dev/en/stable/user-guide/interpolation.html#interpolation-methods) now and that uses `kwargs={""fill_value"": None}` in the 2D example with the result evaluating to NaNs. That one also works and returns actual values when using `""extrapolate""` instead so it looks like something might have changed in xarray or scipy.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1268630439
https://github.com/pydata/xarray/issues/6036#issuecomment-1010549000,https://api.github.com/repos/pydata/xarray/issues/6036,1010549000,IC_kwDOAMm_X848O8EI,7799184,2022-01-12T01:49:52Z,2022-01-12T01:49:52Z,CONTRIBUTOR,Related issue in dask: https://github.com/dask/dask/issues/6363,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,1068225524
https://github.com/pydata/xarray/pull/4461#issuecomment-748554375,https://api.github.com/repos/pydata/xarray/issues/4461,748554375,MDEyOklzc3VlQ29tbWVudDc0ODU1NDM3NQ==,7799184,2020-12-20T02:35:40Z,2020-12-20T09:10:27Z,CONTRIBUTOR,"> @rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?
@rsignell-usgs one other thing that can largely speed up loading of metadata / coordinates is ensuring coordinate variables are stored in one single chunk. For this particular dataset, chunk size for `time` coordinate is 672 yielding 339 chunks, which can take a while to load from remote bucket stores. If you rewrite `time` coordinate setting `dset.time.encoding[""chunks""] = (227904,)` you should see a very large performance increase. One thing we have been doing for the cases of zarr archives that are appended in time, is defining time coordinate with a very large chunk size (e.g., `dset.time.encoding[""chunks""] = (10000000,)`) when we first write the store. This ensures time coordinate will still fit in one single chunk after appending over time dimension, and does not affect chunking of the actual data variables.
One thing we have been having performance issues with is with loading coordinates / metadata from zarr archives that have too many chunks (millions), even when metadata is consolidated and coordinates are in one single chunk. There is an [open issue](https://github.com/dask/dask/issues/6363) in dask about this.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,709187212
https://github.com/pydata/xarray/pull/4035#issuecomment-721504192,https://api.github.com/repos/pydata/xarray/issues/4035,721504192,MDEyOklzc3VlQ29tbWVudDcyMTUwNDE5Mg==,7799184,2020-11-04T04:23:58Z,2020-11-04T04:23:58Z,CONTRIBUTOR,"@shoyer thanks for implementing this, it is going to be very useful. I am trying to write this dataset below:
dsregion:
```
Dimensions: (latitude: 2041, longitude: 4320, time: 31)
Coordinates:
* latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0
* time (time) datetime64[ns] 2008-10-01T12:00:00 ... 2008-10-31T12:00:00
* longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667
Data variables:
vo (time, latitude, longitude) float32 dask.array
uo (time, latitude, longitude) float32 dask.array
sst (time, latitude, longitude) float32 dask.array
ssh (time, latitude, longitude) float32 dask.array
```
As a region of this other dataset:
dset:
```
Dimensions: (latitude: 2041, longitude: 4320, time: 9490)
Coordinates:
* latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0
* longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667
* time (time) datetime64[ns] 1993-01-01T12:00:00 ... 2018-12-25T12:00:00
Data variables:
ssh (time, latitude, longitude) float64 dask.array
sst (time, latitude, longitude) float64 dask.array
uo (time, latitude, longitude) float64 dask.array
vo (time, latitude, longitude) float64 dask.array
```
Using the following call:
```
dsregion.to_zarr(dset_url, region={""time"": slice(5752, 5783)})
```
But I got stuck on the conditional below within `xarray/backends/api.py`:
```
1347 non_matching_vars = [
1348 k
1349 for k, v in ds_to_append.variables.items()
1350 if not set(region).intersection(v.dims)
1351 ]
1352 import ipdb; ipdb.set_trace()
-> 1353 if non_matching_vars:
1354 raise ValueError(
1355 f""when setting `region` explicitly in to_zarr(), all ""
1356 f""variables in the dataset to write must have at least ""
1357 f""one dimension in common with the region's dimensions ""
1358 f""{list(region.keys())}, but that is not ""
1359 f""the case for some variables here. To drop these variables ""
1360 f""from this dataset before exporting to zarr, write: ""
1361 f"".drop({non_matching_vars!r})""
1362 )
```
Apparently because `time` is not a dimension in coordinate variables [""longitude"", ""latitude""]:
```
ipdb> p non_matching_vars
['latitude', 'longitude']
ipdb> p set(region)
{'time'}
```
Should this checking be performed for all variables, or only for data_variables?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,613012939
https://github.com/pydata/xarray/issues/3942#issuecomment-610615621,https://api.github.com/repos/pydata/xarray/issues/3942,610615621,MDEyOklzc3VlQ29tbWVudDYxMDYxNTYyMQ==,7799184,2020-04-07T20:55:29Z,2020-04-07T21:07:31Z,CONTRIBUTOR,"Yep I managed to overcome this by manually setting encoding parameters, just wondering if there would be any downside in preferring `float64` over `int64` when automatically defining these? This seems to fix that issue. I guess it could result in some other precision losses due to float-point errors but these should be small..","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,595492608
https://github.com/pydata/xarray/issues/2656#issuecomment-572293244,https://api.github.com/repos/pydata/xarray/issues/2656,572293244,MDEyOklzc3VlQ29tbWVudDU3MjI5MzI0NA==,7799184,2020-01-08T22:42:01Z,2020-01-08T22:43:25Z,CONTRIBUTOR,Pandas has an option `date_format` in [to_json](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html) to serialize it either as iso8601 or epoch. The `encode_times` option to `to_dict` could also be useful...,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,396285440
https://github.com/pydata/xarray/issues/2656#issuecomment-572054942,https://api.github.com/repos/pydata/xarray/issues/2656,572054942,MDEyOklzc3VlQ29tbWVudDU3MjA1NDk0Mg==,7799184,2020-01-08T13:36:41Z,2020-01-08T13:36:41Z,CONTRIBUTOR,Would it make sense having `to_json` / `from_json` methods that would take care of datetime serialisation?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,396285440
https://github.com/pydata/xarray/issues/2511#issuecomment-563330352,https://api.github.com/repos/pydata/xarray/issues/2511,563330352,MDEyOklzc3VlQ29tbWVudDU2MzMzMDM1Mg==,7799184,2019-12-09T16:53:38Z,2019-12-09T16:53:38Z,CONTRIBUTOR,"I'm having similar issue, here is an example:
```
import numpy as np
import dask.array as da
import xarray as xr
darr = xr.DataArray(data=[0.2, 0.4, 0.6], coords={""z"": range(3)}, dims=(""z"",))
good_indexer = xr.DataArray(
data=np.random.randint(0, 3, 8).reshape(4, 2).astype(int),
coords={""y"": range(4), ""x"": range(2)},
dims=(""y"", ""x"")
)
bad_indexer = xr.DataArray(
data=da.random.randint(0, 3, 8).reshape(4, 2).astype(int),
coords={""y"": range(4), ""x"": range(2)},
dims=(""y"", ""x"")
)
In [5]: darr
Out[5]:
array([0.2, 0.4, 0.6])
Coordinates:
* z (z) int64 0 1 2
In [6]: good_indexer
Out[6]:
array([[0, 1],
[2, 2],
[1, 2],
[1, 0]])
Coordinates:
* y (y) int64 0 1 2 3
* x (x) int64 0 1
In [7]: bad_indexer
Out[7]:
dask.array
Coordinates:
* y (y) int64 0 1 2 3
* x (x) int64 0 1
In [8]: darr[good_indexer]
Out[8]:
array([[0.2, 0.4],
[0.6, 0.6],
[0.4, 0.6],
[0.4, 0.2]])
Coordinates:
z (y, x) int64 0 1 2 2 1 2 1 0
* y (y) int64 0 1 2 3
* x (x) int64 0 1
In [9]: darr[bad_indexer]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 darr[bad_indexer]
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
638 else:
639 # xarray-style array indexing
--> 640 return self.isel(indexers=self._item_key_to_dict(key))
641
642 def __setitem__(self, key: Any, value: Any) -> None:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs)
1012 """"""
1013 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, ""isel"")
-> 1014 ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers)
1015 return self._from_temp_dataset(ds)
1016
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs)
1920 if name in self.indexes:
1921 new_var, new_index = isel_variable_and_index(
-> 1922 name, var, self.indexes[name], var_indexers
1923 )
1924 if new_index is not None:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexes.py in isel_variable_and_index(name, variable, index, indexers)
79 )
80
---> 81 new_variable = variable.isel(indexers)
82
83 if new_variable.dims != (name,):
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in isel(self, indexers, **indexers_kwargs)
1052
1053 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims)
-> 1054 return self[key]
1055
1056 def squeeze(self, dim=None):
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in __getitem__(self, key)
700 array `x.values` directly.
701 """"""
--> 702 dims, indexer, new_order = self._broadcast_indexes(key)
703 data = as_indexable(self._data)[indexer]
704 if new_order:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes(self, key)
557 if isinstance(k, Variable):
558 if len(k.dims) > 1:
--> 559 return self._broadcast_indexes_vectorized(key)
560 dims.append(k.dims[0])
561 elif not isinstance(k, integer_types):
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes_vectorized(self, key)
685 new_order = None
686
--> 687 return out_dims, VectorizedIndexer(tuple(out_key)), new_order
688
689 def __getitem__(self: VariableType, key) -> VariableType:
~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexing.py in __init__(self, key)
447 else:
448 raise TypeError(
--> 449 f""unexpected indexer type for {type(self).__name__}: {k!r}""
450 )
451 new_key.append(k)
TypeError: unexpected indexer type for VectorizedIndexer: dask.array
In [10]: xr.__version__
Out[10]: '0.14.1'
In [11]: import dask; dask.__version__
Out[11]: '2.9.0'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,374025325
https://github.com/pydata/xarray/issues/3490#issuecomment-551963613,https://api.github.com/repos/pydata/xarray/issues/3490,551963613,MDEyOklzc3VlQ29tbWVudDU1MTk2MzYxMw==,7799184,2019-11-08T19:40:23Z,2019-11-08T19:40:23Z,CONTRIBUTOR,"Perhaps reflected operators (i.e., `__rmul__`) could be defined differently somewhere? I cannot see anything obvious within xarray.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,518966560
https://github.com/pydata/xarray/issues/1524#issuecomment-513996346,https://api.github.com/repos/pydata/xarray/issues/1524,513996346,MDEyOklzc3VlQ29tbWVudDUxMzk5NjM0Ng==,7799184,2019-07-22T23:47:13Z,2019-07-22T23:47:13Z,CONTRIBUTOR,@shoyer does https://github.com/dask/dask/pull/4677 solve those accuracy concerns?,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,252548859
https://github.com/pydata/xarray/issues/2501#issuecomment-512663861,https://api.github.com/repos/pydata/xarray/issues/2501,512663861,MDEyOklzc3VlQ29tbWVudDUxMjY2Mzg2MQ==,7799184,2019-07-18T04:51:06Z,2019-07-18T04:52:17Z,CONTRIBUTOR,"Hi guys, I'm having some issue that looks similar to @rsignell-usgs. Trying to open 413 netcdf files using `open_mfdataset` with `parallel=True`. The dataset (successfully opened with `parallel=False`) has ~300G on disk and looks like:
```ipython
In [1] import xarray as xr
In [2]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=False)
In [3]: dset
Out[3]:
Dimensions: (latitude: 190, longitude: 289, time: 302092)
Coordinates:
* longitude (longitude) float32 70.0 70.4 70.8 71.2 ... 184.4 184.8 185.2
* latitude (latitude) float32 -55.6 -55.2 -54.8 -54.4 ... 19.2 19.6 20.0
* time (time) datetime64[ns] 1979-01-01 ... 2013-05-31T23:00:00.000013440
Data variables:
hs (time, latitude, longitude) float32 dask.array
fp (time, latitude, longitude) float32 dask.array
dp (time, latitude, longitude) float32 dask.array
wl (time, latitude, longitude) float32 dask.array
U10 (time, latitude, longitude) float32 dask.array
V10 (time, latitude, longitude) float32 dask.array
hs1 (time, latitude, longitude) float32 dask.array
hs2 (time, latitude, longitude) float32 dask.array
tp1 (time, latitude, longitude) float32 dask.array
tp2 (time, latitude, longitude) float32 dask.array
lp0 (time, latitude, longitude) float32 dask.array
lp1 (time, latitude, longitude) float32 dask.array
lp2 (time, latitude, longitude) float32 dask.array
th0 (time, latitude, longitude) float32 dask.array
th1 (time, latitude, longitude) float32 dask.array
th2 (time, latitude, longitude) float32 dask.array
hs0 (time, latitude, longitude) float32 dask.array
tp0 (time, latitude, longitude) float32 dask.array
```
Trying to read it on a standard python session gives me core dumped:
```ipython
In [1]: import xarray as xr
In [2]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True)
Bus error (core dumped)
```
Trying to read it on a dask cluster I get:
```ipython
In [1]: from dask.distributed import Client
In [2]: import xarray as xr
In [3]: client = Client()
In [4]: dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitud
...: e': 100}, parallel=True)
free(): double free detected in tcache 2free(): double free detected in tcache 2
free(): double free detected in tcache 2
distributed.nanny - WARNING - Worker process 18744 was killed by signal 11
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 18740 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 18742 was killed by signal 7
distributed.nanny - WARNING - Worker process 18738 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Restarting worker
free(): double free detected in tcache 2munmap_chunk(): invalid pointer
free(): double free detected in tcache 2
free(): double free detected in tcache 2
distributed.nanny - WARNING - Worker process 19082 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
distributed.nanny - WARNING - Worker process 19073 was killed by signal 6
distributed.nanny - WARNING - Restarting worker
---------------------------------------------------------------------------
KilledWorker Traceback (most recent call last)
in ()
----> 1 dset = xr.open_mfdataset(""./bom-ww3/bom-ww3_*.nc"", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True)
/usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, **kwargs)
772 # calling compute here will return the datasets/file_objs lists,
773 # the underlying datasets will still be stored as dask arrays
--> 774 datasets, file_objs = dask.compute(datasets, file_objs)
775
776 # Combine all datasets, closing them in case of a ValueError
/usr/local/lib/python3.7/dist-packages/dask/base.py in compute(*args, **kwargs)
444 keys = [x.__dask_keys__() for x in collections]
445 postcomputes = [x.__dask_postcompute__() for x in collections]
--> 446 results = schedule(dsk, keys, **kwargs)
447 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
448
/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
2525 should_rejoin = False
2526 try:
-> 2527 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
2528 finally:
2529 for f in futures.values():
/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
1821 direct=direct,
1822 local_worker=local_worker,
-> 1823 asynchronous=asynchronous,
1824 )
1825
/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
761 else:
762 return sync(
--> 763 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
764 )
765
/home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
330 e.wait(10)
331 if error[0]:
--> 332 six.reraise(*error[0])
333 else:
334 return result[0]
/usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb)
691 if value.__traceback__ is not tb:
692 raise value.with_traceback(tb)
--> 693 raise value
694 finally:
695 value = None
/home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in f()
315 if callback_timeout is not None:
316 future = gen.with_timeout(timedelta(seconds=callback_timeout), future)
--> 317 result[0] = yield future
318 except Exception as exc:
319 error[0] = sys.exc_info()
/home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self)
733
734 try:
--> 735 value = future.result()
736 except Exception:
737 exc_info = sys.exc_info()
/home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self)
740 if exc_info is not None:
741 try:
--> 742 yielded = self.gen.throw(*exc_info) # type: ignore
743 finally:
744 # Break up a reference to itself
/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
1678 exc = CancelledError(key)
1679 else:
-> 1680 six.reraise(type(exception), exception, traceback)
1681 raise exc
1682 if errors == ""skip"":
/usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb)
691 if value.__traceback__ is not tb:
692 raise value.with_traceback(tb)
--> 693 raise value
694 finally:
695 value = None
KilledWorker: ('open_dataset-e7916acb-6d9f-4532-ab76-5b9c1b1a39c2', )
```
Is there anything obviously wrong I'm trying here please?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,372848074
https://github.com/pydata/xarray/issues/1081#issuecomment-323880231,https://api.github.com/repos/pydata/xarray/issues/1081,323880231,MDEyOklzc3VlQ29tbWVudDMyMzg4MDIzMQ==,7799184,2017-08-21T23:44:30Z,2017-08-21T23:56:54Z,CONTRIBUTOR,"I have also hit this issue, this method could be useful. I'm putting below my workaround in case it is any helpful:
```python
def reorder_dims(darray, dim1, dim2):
""""""
Interchange two dimensions of a DataArray in a similar way as numpy's swap_axes
""""""
dims = list(darray.dims)
assert set([dim1,dim2]).issubset(dims), 'dim1 and dim2 must be existing dimensions in darray'
ind1, ind2 = dims.index(dim1), dims.index(dim2)
dims[ind2], dims[ind1] = dims[ind1], dims[ind2]
return darray.transpose(*dims)
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,187393785
https://github.com/pydata/xarray/issues/1379#issuecomment-295993132,https://api.github.com/repos/pydata/xarray/issues/1379,295993132,MDEyOklzc3VlQ29tbWVudDI5NTk5MzEzMg==,7799184,2017-04-21T00:54:28Z,2017-04-21T10:05:27Z,CONTRIBUTOR,"I realised that some of the Datasets I was trying to concatenate had different coordinate values (for coordinates that I was assuming to be the same) so I guess xr.concat was trying to align these coordinates before concatenating and the resultant Dataset ended up being much larger than it should have been. When I ensure I only concatenate Datasets with consistent coordinates, I can do it.
However still resource consumption is quite high compared to when I so the same thing with numpy arrays. The memory increased by 42% using xr.concat (against 6% using np.concatenate) and the whole processing took about 4 times longer. ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,223231729
https://github.com/pydata/xarray/issues/1379#issuecomment-295970641,https://api.github.com/repos/pydata/xarray/issues/1379,295970641,MDEyOklzc3VlQ29tbWVudDI5NTk3MDY0MQ==,7799184,2017-04-20T23:41:38Z,2017-04-20T23:41:38Z,CONTRIBUTOR,"Also, reading all Datasets into a list and then trying to concatenate this list of Datasets at once also blows memory up.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,223231729
https://github.com/pydata/xarray/issues/1366#issuecomment-292853553,https://api.github.com/repos/pydata/xarray/issues/1366,292853553,MDEyOklzc3VlQ29tbWVudDI5Mjg1MzU1Mw==,7799184,2017-04-10T05:32:29Z,2017-04-10T05:32:29Z,CONTRIBUTOR,That makes sense thanks for explaining @shoyer ,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,220533356
https://github.com/pydata/xarray/issues/1324#issuecomment-289321422,https://api.github.com/repos/pydata/xarray/issues/1324,289321422,MDEyOklzc3VlQ29tbWVudDI4OTMyMTQyMg==,7799184,2017-03-26T22:25:25Z,2017-03-26T22:25:25Z,CONTRIBUTOR,Thanks!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,216626776
https://github.com/pydata/xarray/pull/806#issuecomment-202631361,https://api.github.com/repos/pydata/xarray/issues/806,202631361,MDEyOklzc3VlQ29tbWVudDIwMjYzMTM2MQ==,7799184,2016-03-28T23:52:52Z,2016-03-28T23:52:52Z,CONTRIBUTOR,":+1: nice one
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,143877458
https://github.com/pydata/xarray/issues/733#issuecomment-177056825,https://api.github.com/repos/pydata/xarray/issues/733,177056825,MDEyOklzc3VlQ29tbWVudDE3NzA1NjgyNQ==,7799184,2016-01-30T03:25:03Z,2016-01-30T03:25:03Z,CONTRIBUTOR,"I personally find it useful - maybe not too intuitive though that the behaviour changes depending on whether there are attrs defined for that coordinate variable or not. I agree some documentation on this would be definitely helpful!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,129630652
https://github.com/pydata/xarray/issues/728#issuecomment-176542303,https://api.github.com/repos/pydata/xarray/issues/728,176542303,MDEyOklzc3VlQ29tbWVudDE3NjU0MjMwMw==,7799184,2016-01-29T02:48:17Z,2016-01-29T02:48:17Z,CONTRIBUTOR,"Thanks @shoyer that works (:
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,128980804
https://github.com/pydata/xarray/issues/728#issuecomment-176485011,https://api.github.com/repos/pydata/xarray/issues/728,176485011,MDEyOklzc3VlQ29tbWVudDE3NjQ4NTAxMQ==,7799184,2016-01-28T23:44:58Z,2016-01-28T23:44:58Z,CONTRIBUTOR,"Thanks @shoyer ,
what do you mean by preserve the signature of `DataArray.__init__` please?
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,128980804
https://github.com/pydata/xarray/pull/726#issuecomment-175528287,https://api.github.com/repos/pydata/xarray/issues/726,175528287,MDEyOklzc3VlQ29tbWVudDE3NTUyODI4Nw==,7799184,2016-01-27T10:16:40Z,2016-01-27T10:16:40Z,CONTRIBUTOR,"Good point, done it
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,128749355
https://github.com/pydata/xarray/issues/706#issuecomment-170173475,https://api.github.com/repos/pydata/xarray/issues/706,170173475,MDEyOklzc3VlQ29tbWVudDE3MDE3MzQ3NQ==,7799184,2016-01-09T00:59:14Z,2016-01-09T00:59:14Z,CONTRIBUTOR,"Cool, thanks @shoyer. Yes @rabernat I totally agree with you and I would be very keen to collaborate on a library like that, I think that would be useful for many people.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,124915222
https://github.com/pydata/xarray/issues/682#issuecomment-169860884,https://api.github.com/repos/pydata/xarray/issues/682,169860884,MDEyOklzc3VlQ29tbWVudDE2OTg2MDg4NA==,7799184,2016-01-08T01:27:52Z,2016-01-08T01:27:52Z,CONTRIBUTOR,"See [#709](https://github.com/pydata/xarray/issues/709)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,123384529
https://github.com/pydata/xarray/issues/681#issuecomment-165520642,https://api.github.com/repos/pydata/xarray/issues/681,165520642,MDEyOklzc3VlQ29tbWVudDE2NTUyMDY0Mg==,7799184,2015-12-17T17:24:11Z,2015-12-17T17:24:11Z,CONTRIBUTOR,"I had that happening with python2 as well - just for netcdf4 files though, because of the new string type I guess.. when writing as netcdf4-classic that string output was not shown.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,122776511
https://github.com/pydata/xarray/issues/660#issuecomment-157576363,https://api.github.com/repos/pydata/xarray/issues/660,157576363,MDEyOklzc3VlQ29tbWVudDE1NzU3NjM2Mw==,7799184,2015-11-18T02:24:52Z,2015-11-18T02:24:52Z,CONTRIBUTOR,"Yes it is @shoyer !
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,117262604
https://github.com/pydata/xarray/issues/662#issuecomment-157572531,https://api.github.com/repos/pydata/xarray/issues/662,157572531,MDEyOklzc3VlQ29tbWVudDE1NzU3MjUzMQ==,7799184,2015-11-18T02:00:07Z,2015-11-18T02:00:07Z,CONTRIBUTOR,"Awesome, works here too with netCDF4==1.2.1
Thanks!
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,117478779
https://github.com/pydata/xarray/issues/662#issuecomment-157570185,https://api.github.com/repos/pydata/xarray/issues/662,157570185,MDEyOklzc3VlQ29tbWVudDE1NzU3MDE4NQ==,7799184,2015-11-18T01:43:01Z,2015-11-18T01:43:01Z,CONTRIBUTOR,"Hum... Ok I will try that in another machine too.. The versions are:
pandas==0.17.0
netCDF4==1.1.1
scipy==0.15.1
numpy==1.10.1
xray==0.6.1-15-g5109f4f
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,117478779
https://github.com/pydata/xarray/issues/662#issuecomment-157567446,https://api.github.com/repos/pydata/xarray/issues/662,157567446,MDEyOklzc3VlQ29tbWVudDE1NzU2NzQ0Ng==,7799184,2015-11-18T01:26:01Z,2015-11-18T01:26:01Z,CONTRIBUTOR,"@shoyer I'm sending you by email (was not able to attach here) a stripped version of one of the files I was using. The code below should reproduce the issue:
```
import xray
dset = xray.open_dataset('hycom_example.nc', decode_times=False)
ncvar = 'water_u'
dset_sliced = xray.Dataset()
slice_dict = {u'lat': [-30], u'lon': [0]}
dset_sliced[ncvar] = dset[ncvar].sel(method='nearest', **slice_dict)
dset_sliced.to_netcdf()
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,117478779
https://github.com/pydata/xarray/issues/662#issuecomment-157564974,https://api.github.com/repos/pydata/xarray/issues/662,157564974,MDEyOklzc3VlQ29tbWVudDE1NzU2NDk3NA==,7799184,2015-11-18T01:09:00Z,2015-11-18T01:09:00Z,CONTRIBUTOR,"@maximilianr I have managed to reproduce this with a different file with different number of dimensions (time, latitude, longitude). So I believe the below example should give same problem if you run on some other file and change the variable / dimension names accordingly:
```
ncvar = 'hs'
dset_sliced1 = xray.Dataset()
dset_sliced2 = xray.Dataset()
dset = xray.open_dataset(filename, decode_times=False)
slice_dict1 = {u'latitude': [-30], u'longitude': [0], u'time': [2.83996800e+08, 2.84007600e+08]}
dset_sliced1[ncvar] = dset[ncvar].sel(method='nearest', **slice_dict1)
slice_dict2 = {u'latitude': [-30], u'longitude': [0], u'time': [2.84018400e+08, 2.84029200e+08]}
dset_sliced2[ncvar] = dset[ncvar].sel(method='nearest', **slice_dict2)
dset_sliced1.to_netcdf('test.nc') # This fails
xray.concat([dset_sliced1, dset_sliced2], dim='time') # This also fails, same error
Traceback:
----> 1 xray.concat([dset_sliced1, dset_sliced2], dim='time') # This also fails
/source/xray/xray/core/combine.pyc in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over)
113 raise TypeError('can only concatenate xray Dataset and DataArray '
114 'objects')
--> 115 return f(objs, dim, data_vars, coords, compat, positions)
116
117
/source/xray/xray/core/combine.pyc in _dataset_concat(datasets, dim, data_vars, coords, compat, positions)
265 for k in concat_over:
266 vars = ensure_common_dims([ds.variables[k] for ds in datasets])
--> 267 combined = Variable.concat(vars, dim, positions)
268 insert_result_variable(k, combined)
269
/source/xray/xray/core/variable.pyc in concat(cls, variables, dim, positions, shortcut)
711 utils.remove_incompatible_items(attrs, var.attrs)
712
--> 713 return cls(dims, data, attrs)
714
715 def _data_equals(self, other):
/source/xray/xray/core/variable.pyc in __init__(self, dims, data, attrs, encoding, fastpath)
194 """"""
195 self._data = _as_compatible_data(data, fastpath=fastpath)
--> 196 self._dims = self._parse_dimensions(dims)
197 self._attrs = None
198 self._encoding = None
/source/xray/xray/core/variable.pyc in _parse_dimensions(self, dims)
302 raise ValueError('dimensions %s must have the same length as the '
303 'number of data dimensions, ndim=%s'
--> 304 % (dims, self.ndim))
305 return dims
306
ValueError: dimensions (u'time', u'latitude', u'longitude') must have the same length as the number of data dimensions, ndim=2
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,117478779
https://github.com/pydata/xarray/issues/662#issuecomment-157559691,https://api.github.com/repos/pydata/xarray/issues/662,157559691,MDEyOklzc3VlQ29tbWVudDE1NzU1OTY5MQ==,7799184,2015-11-18T00:43:32Z,2015-11-18T00:43:32Z,CONTRIBUTOR,"I was concatenating them as:
```
dset_concat = xray.concat([ds1, ds2], dim='time')
```
Trying to dump any of them as netcdf:
```
ds1.to_netcdf('test.nc')
```
would also yield the same problem
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",,117478779