home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

30 rows where author_association = "CONTRIBUTOR" and user = 7799184 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 23

  • Problem with checking in Variable._parse_dimensions() (xray.core.variable) 5
  • Cannot inherit DataArray anymore in 0.7 release 2
  • xr.concat consuming too much resources 2
  • dataset info in .json format 2
  • time slice cannot be list 1
  • to_netcdf on Python 3: "string" qualifier on attributes 1
  • to_netcdf: not able to set dtype encoding with netCDF4 backend 1
  • Subclassing Dataset and DataArray 1
  • Make import error of tokenize more explicit 1
  • coordinate variable not written in netcdf file in some cases 1
  • Decorators for registering custom accessors in xarray 1
  • Transpose some but not all dimensions 1
  • Choose time units in output netcdf 1
  • Setting attributes to multi-index coordinate 1
  • (trivial) xarray.quantile silently resolves dask arrays 1
  • open_mfdataset usage and limitations. 1
  • Array indexing with dask arrays 1
  • Dataset global attributes dropped when performing operations against numpy data type 1
  • Time dtype encoding defaulting to `int64` when writing netcdf or zarr 1
  • Support parallel writes to regions of zarr stores 1
  • Allow fsspec/zarr/mfdataset 1
  • `xarray.open_zarr()` takes too long to lazy load when the data arrays contain a large number of Dask chunks. 1
  • 2D extrapolation not working 1

user 1

  • rafa-guedes · 30 ✖

author_association 1

  • CONTRIBUTOR · 30 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
1153302528 https://github.com/pydata/xarray/issues/6688#issuecomment-1153302528 https://api.github.com/repos/pydata/xarray/issues/6688 IC_kwDOAMm_X85EvgAA rafa-guedes 7799184 2022-06-12T21:56:10Z 2022-06-12T21:56:10Z CONTRIBUTOR

That works thanks. I just checked the example in the docs now and that uses kwargs={"fill_value": None} in the 2D example with the result evaluating to NaNs. That one also works and returns actual values when using "extrapolate" instead so it looks like something might have changed in xarray or scipy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  2D extrapolation not working 1268630439
1010549000 https://github.com/pydata/xarray/issues/6036#issuecomment-1010549000 https://api.github.com/repos/pydata/xarray/issues/6036 IC_kwDOAMm_X848O8EI rafa-guedes 7799184 2022-01-12T01:49:52Z 2022-01-12T01:49:52Z CONTRIBUTOR

Related issue in dask: https://github.com/dask/dask/issues/6363

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  `xarray.open_zarr()` takes too long to lazy load when the data arrays contain a large number of Dask chunks. 1068225524
748554375 https://github.com/pydata/xarray/pull/4461#issuecomment-748554375 https://api.github.com/repos/pydata/xarray/issues/4461 MDEyOklzc3VlQ29tbWVudDc0ODU1NDM3NQ== rafa-guedes 7799184 2020-12-20T02:35:40Z 2020-12-20T09:10:27Z CONTRIBUTOR

@rabernat , awesome! I was stunned by the difference -- I guess the async loading of coordinate data is the big win, right?

@rsignell-usgs one other thing that can largely speed up loading of metadata / coordinates is ensuring coordinate variables are stored in one single chunk. For this particular dataset, chunk size for time coordinate is 672 yielding 339 chunks, which can take a while to load from remote bucket stores. If you rewrite time coordinate setting dset.time.encoding["chunks"] = (227904,) you should see a very large performance increase. One thing we have been doing for the cases of zarr archives that are appended in time, is defining time coordinate with a very large chunk size (e.g., dset.time.encoding["chunks"] = (10000000,)) when we first write the store. This ensures time coordinate will still fit in one single chunk after appending over time dimension, and does not affect chunking of the actual data variables.

One thing we have been having performance issues with is with loading coordinates / metadata from zarr archives that have too many chunks (millions), even when metadata is consolidated and coordinates are in one single chunk. There is an open issue in dask about this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Allow fsspec/zarr/mfdataset 709187212
721504192 https://github.com/pydata/xarray/pull/4035#issuecomment-721504192 https://api.github.com/repos/pydata/xarray/issues/4035 MDEyOklzc3VlQ29tbWVudDcyMTUwNDE5Mg== rafa-guedes 7799184 2020-11-04T04:23:58Z 2020-11-04T04:23:58Z CONTRIBUTOR

@shoyer thanks for implementing this, it is going to be very useful. I am trying to write this dataset below:

dsregion: ``` <xarray.Dataset> Dimensions: (latitude: 2041, longitude: 4320, time: 31) Coordinates: * latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0 * time (time) datetime64[ns] 2008-10-01T12:00:00 ... 2008-10-31T12:00:00 * longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667 Data variables: vo (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> uo (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> sst (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> ssh (time, latitude, longitude) float32 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray>

```

As a region of this other dataset:

dset: <xarray.Dataset> Dimensions: (latitude: 2041, longitude: 4320, time: 9490) Coordinates: * latitude (latitude) float32 -80.0 -79.916664 -79.833336 ... 89.916664 90.0 * longitude (longitude) float32 -180.0 -179.91667 ... 179.83333 179.91667 * time (time) datetime64[ns] 1993-01-01T12:00:00 ... 2018-12-25T12:00:00 Data variables: ssh (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> sst (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> uo (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray> vo (time, latitude, longitude) float64 dask.array<chunksize=(30, 510, 1080), meta=np.ndarray>

Using the following call:

dsregion.to_zarr(dset_url, region={"time": slice(5752, 5783)})

But I got stuck on the conditional below within xarray/backends/api.py:

1347 non_matching_vars = [ 1348 k 1349 for k, v in ds_to_append.variables.items() 1350 if not set(region).intersection(v.dims) 1351 ] 1352 import ipdb; ipdb.set_trace() -> 1353 if non_matching_vars: 1354 raise ValueError( 1355 f"when setting `region` explicitly in to_zarr(), all " 1356 f"variables in the dataset to write must have at least " 1357 f"one dimension in common with the region's dimensions " 1358 f"{list(region.keys())}, but that is not " 1359 f"the case for some variables here. To drop these variables " 1360 f"from this dataset before exporting to zarr, write: " 1361 f".drop({non_matching_vars!r})" 1362 )

Apparently because time is not a dimension in coordinate variables ["longitude", "latitude"]:

ipdb> p non_matching_vars ['latitude', 'longitude'] ipdb> p set(region) {'time'}

Should this checking be performed for all variables, or only for data_variables?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Support parallel writes to regions of zarr stores 613012939
610615621 https://github.com/pydata/xarray/issues/3942#issuecomment-610615621 https://api.github.com/repos/pydata/xarray/issues/3942 MDEyOklzc3VlQ29tbWVudDYxMDYxNTYyMQ== rafa-guedes 7799184 2020-04-07T20:55:29Z 2020-04-07T21:07:31Z CONTRIBUTOR

Yep I managed to overcome this by manually setting encoding parameters, just wondering if there would be any downside in preferring float64 over int64 when automatically defining these? This seems to fix that issue. I guess it could result in some other precision losses due to float-point errors but these should be small..

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Time dtype encoding defaulting to `int64` when writing netcdf or zarr 595492608
572293244 https://github.com/pydata/xarray/issues/2656#issuecomment-572293244 https://api.github.com/repos/pydata/xarray/issues/2656 MDEyOklzc3VlQ29tbWVudDU3MjI5MzI0NA== rafa-guedes 7799184 2020-01-08T22:42:01Z 2020-01-08T22:43:25Z CONTRIBUTOR

Pandas has an option date_format in to_json to serialize it either as iso8601 or epoch. The encode_times option to to_dict could also be useful...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dataset info in .json format 396285440
572054942 https://github.com/pydata/xarray/issues/2656#issuecomment-572054942 https://api.github.com/repos/pydata/xarray/issues/2656 MDEyOklzc3VlQ29tbWVudDU3MjA1NDk0Mg== rafa-guedes 7799184 2020-01-08T13:36:41Z 2020-01-08T13:36:41Z CONTRIBUTOR

Would it make sense having to_json / from_json methods that would take care of datetime serialisation?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  dataset info in .json format 396285440
563330352 https://github.com/pydata/xarray/issues/2511#issuecomment-563330352 https://api.github.com/repos/pydata/xarray/issues/2511 MDEyOklzc3VlQ29tbWVudDU2MzMzMDM1Mg== rafa-guedes 7799184 2019-12-09T16:53:38Z 2019-12-09T16:53:38Z CONTRIBUTOR

I'm having similar issue, here is an example:

``` import numpy as np import dask.array as da import xarray as xr

darr = xr.DataArray(data=[0.2, 0.4, 0.6], coords={"z": range(3)}, dims=("z",)) good_indexer = xr.DataArray( data=np.random.randint(0, 3, 8).reshape(4, 2).astype(int), coords={"y": range(4), "x": range(2)}, dims=("y", "x") ) bad_indexer = xr.DataArray( data=da.random.randint(0, 3, 8).reshape(4, 2).astype(int), coords={"y": range(4), "x": range(2)}, dims=("y", "x") )

In [5]: darr
Out[5]: <xarray.DataArray (z: 3)> array([0.2, 0.4, 0.6]) Coordinates: * z (z) int64 0 1 2

In [6]: good_indexer
Out[6]: <xarray.DataArray (y: 4, x: 2)> array([[0, 1], [2, 2], [1, 2], [1, 0]]) Coordinates: * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [7]: bad_indexer
Out[7]: <xarray.DataArray 'reshape-417766b2035dcb1227ddde8505297039' (y: 4, x: 2)> dask.array<reshape, shape=(4, 2), dtype=int64, chunksize=(4, 2), chunktype=numpy.ndarray> Coordinates: * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [8]: darr[good_indexer]
Out[8]: <xarray.DataArray (y: 4, x: 2)> array([[0.2, 0.4], [0.6, 0.6], [0.4, 0.6], [0.4, 0.2]]) Coordinates: z (y, x) int64 0 1 2 2 1 2 1 0 * y (y) int64 0 1 2 3 * x (x) int64 0 1

In [9]: darr[bad_indexer]

TypeError Traceback (most recent call last) <ipython-input-8-2a57c1a2eade> in <module> ----> 1 darr[bad_indexer]

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in getitem(self, key) 638 else: 639 # xarray-style array indexing --> 640 return self.isel(indexers=self._item_key_to_dict(key)) 641 642 def setitem(self, key: Any, value: Any) -> None:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataarray.py in isel(self, indexers, drop, **indexers_kwargs) 1012 """ 1013 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "isel") -> 1014 ds = self._to_temp_dataset().isel(drop=drop, indexers=indexers) 1015 return self._from_temp_dataset(ds) 1016

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/dataset.py in isel(self, indexers, drop, **indexers_kwargs) 1920 if name in self.indexes: 1921 new_var, new_index = isel_variable_and_index( -> 1922 name, var, self.indexes[name], var_indexers 1923 ) 1924 if new_index is not None:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexes.py in isel_variable_and_index(name, variable, index, indexers) 79 ) 80 ---> 81 new_variable = variable.isel(indexers) 82 83 if new_variable.dims != (name,):

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in isel(self, indexers, **indexers_kwargs) 1052 1053 key = tuple(indexers.get(dim, slice(None)) for dim in self.dims) -> 1054 return self[key] 1055 1056 def squeeze(self, dim=None):

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in getitem(self, key) 700 array x.values directly. 701 """ --> 702 dims, indexer, new_order = self._broadcast_indexes(key) 703 data = as_indexable(self._data)[indexer] 704 if new_order:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes(self, key) 557 if isinstance(k, Variable): 558 if len(k.dims) > 1: --> 559 return self._broadcast_indexes_vectorized(key) 560 dims.append(k.dims[0]) 561 elif not isinstance(k, integer_types):

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/variable.py in _broadcast_indexes_vectorized(self, key) 685 new_order = None 686 --> 687 return out_dims, VectorizedIndexer(tuple(out_key)), new_order 688 689 def getitem(self: VariableType, key) -> VariableType:

~/.virtualenvs/py3/local/lib/python3.7/site-packages/xarray/core/indexing.py in init(self, key) 447 else: 448 raise TypeError( --> 449 f"unexpected indexer type for {type(self).name}: {k!r}" 450 ) 451 new_key.append(k)

TypeError: unexpected indexer type for VectorizedIndexer: dask.array<reshape, shape=(4, 2), dtype=int64, chunksize=(4, 2), chunktype=numpy.ndarray>

In [10]: xr.version
Out[10]: '0.14.1'

In [11]: import dask; dask.version
Out[11]: '2.9.0' ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Array indexing with dask arrays 374025325
551963613 https://github.com/pydata/xarray/issues/3490#issuecomment-551963613 https://api.github.com/repos/pydata/xarray/issues/3490 MDEyOklzc3VlQ29tbWVudDU1MTk2MzYxMw== rafa-guedes 7799184 2019-11-08T19:40:23Z 2019-11-08T19:40:23Z CONTRIBUTOR

Perhaps reflected operators (i.e., __rmul__) could be defined differently somewhere? I cannot see anything obvious within xarray.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Dataset global attributes dropped when performing operations against numpy data type 518966560
513996346 https://github.com/pydata/xarray/issues/1524#issuecomment-513996346 https://api.github.com/repos/pydata/xarray/issues/1524 MDEyOklzc3VlQ29tbWVudDUxMzk5NjM0Ng== rafa-guedes 7799184 2019-07-22T23:47:13Z 2019-07-22T23:47:13Z CONTRIBUTOR

@shoyer does https://github.com/dask/dask/pull/4677 solve those accuracy concerns?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  (trivial) xarray.quantile silently resolves dask arrays 252548859
512663861 https://github.com/pydata/xarray/issues/2501#issuecomment-512663861 https://api.github.com/repos/pydata/xarray/issues/2501 MDEyOklzc3VlQ29tbWVudDUxMjY2Mzg2MQ== rafa-guedes 7799184 2019-07-18T04:51:06Z 2019-07-18T04:52:17Z CONTRIBUTOR

Hi guys, I'm having some issue that looks similar to @rsignell-usgs. Trying to open 413 netcdf files using open_mfdataset with parallel=True. The dataset (successfully opened with parallel=False) has ~300G on disk and looks like:

```ipython In [1] import xarray as xr

In [2]: dset = xr.open_mfdataset("./bom-ww3/bom-ww3_*.nc", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=False)

In [3]: dset Out[3]: <xarray.Dataset> Dimensions: (latitude: 190, longitude: 289, time: 302092) Coordinates: * longitude (longitude) float32 70.0 70.4 70.8 71.2 ... 184.4 184.8 185.2 * latitude (latitude) float32 -55.6 -55.2 -54.8 -54.4 ... 19.2 19.6 20.0 * time (time) datetime64[ns] 1979-01-01 ... 2013-05-31T23:00:00.000013440 Data variables: hs (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> fp (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> dp (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> wl (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> U10 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> V10 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> hs1 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> hs2 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> tp1 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> tp2 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> lp0 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> lp1 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> lp2 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> th0 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> th1 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> th2 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> hs0 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> tp0 (time, latitude, longitude) float32 dask.array<shape=(302092, 190, 289), chunksize=(745, 100, 100)> ```

Trying to read it on a standard python session gives me core dumped:

```ipython In [1]: import xarray as xr

In [2]: dset = xr.open_mfdataset("./bom-ww3/bom-ww3_*.nc", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True) Bus error (core dumped) ```

Trying to read it on a dask cluster I get:

```ipython In [1]: from dask.distributed import Client

In [2]: import xarray as xr

In [3]: client = Client()

In [4]: dset = xr.open_mfdataset("./bom-ww3/bom-ww3_*.nc", chunks={'time': 744, 'latitude': 100, 'longitud ...: e': 100}, parallel=True) free(): double free detected in tcache 2free(): double free detected in tcache 2

free(): double free detected in tcache 2 distributed.nanny - WARNING - Worker process 18744 was killed by signal 11 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Worker process 18740 was killed by signal 6 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Worker process 18742 was killed by signal 7 distributed.nanny - WARNING - Worker process 18738 was killed by signal 6 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Restarting worker free(): double free detected in tcache 2munmap_chunk(): invalid pointer

free(): double free detected in tcache 2 free(): double free detected in tcache 2 distributed.nanny - WARNING - Worker process 19082 was killed by signal 6 distributed.nanny - WARNING - Restarting worker distributed.nanny - WARNING - Worker process 19073 was killed by signal 6 distributed.nanny - WARNING - Restarting worker


KilledWorker Traceback (most recent call last) <ipython-input-4-740561b80fec> in <module>() ----> 1 dset = xr.open_mfdataset("./bom-ww3/bom-ww3_*.nc", chunks={'time': 744, 'latitude': 100, 'longitude': 100}, parallel=True)

/usr/local/lib/python3.7/dist-packages/xarray/backends/api.py in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, lock, data_vars, coords, combine, autoclose, parallel, **kwargs) 772 # calling compute here will return the datasets/file_objs lists, 773 # the underlying datasets will still be stored as dask arrays --> 774 datasets, file_objs = dask.compute(datasets, file_objs) 775 776 # Combine all datasets, closing them in case of a ValueError

/usr/local/lib/python3.7/dist-packages/dask/base.py in compute(args, kwargs) 444 keys = [x.dask_keys() for x in collections] 445 postcomputes = [x.dask_postcompute() for x in collections] --> 446 results = schedule(dsk, keys, kwargs) 447 return repack([f(r, a) for r, (f, a) in zip(results, postcomputes)]) 448

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in get(self, dsk, keys, restrictions, loose_restrictions, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs) 2525 should_rejoin = False 2526 try: -> 2527 results = self.gather(packed, asynchronous=asynchronous, direct=direct) 2528 finally: 2529 for f in futures.values():

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous) 1821 direct=direct, 1822 local_worker=local_worker, -> 1823 asynchronous=asynchronous, 1824 ) 1825

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, args, kwargs) 761 else: 762 return sync( --> 763 self.loop, func, args, callback_timeout=callback_timeout, **kwargs 764 ) 765

/home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, args, kwargs) 330 e.wait(10) 331 if error[0]: --> 332 six.reraise(error[0]) 333 else: 334 return result[0]

/usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb) 691 if value.traceback is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None

/home/oceanum/.local/lib/python3.7/site-packages/distributed/utils.py in f() 315 if callback_timeout is not None: 316 future = gen.with_timeout(timedelta(seconds=callback_timeout), future) --> 317 result[0] = yield future 318 except Exception as exc: 319 error[0] = sys.exc_info()

/home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self) 733 734 try: --> 735 value = future.result() 736 except Exception: 737 exc_info = sys.exc_info()

/home/oceanum/.local/lib/python3.7/site-packages/tornado/gen.py in run(self) 740 if exc_info is not None: 741 try: --> 742 yielded = self.gen.throw(*exc_info) # type: ignore 743 finally: 744 # Break up a reference to itself

/home/oceanum/.local/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker) 1678 exc = CancelledError(key) 1679 else: -> 1680 six.reraise(type(exception), exception, traceback) 1681 raise exc 1682 if errors == "skip":

/usr/lib/python3/dist-packages/six.py in reraise(tp, value, tb) 691 if value.traceback is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None

KilledWorker: ('open_dataset-e7916acb-6d9f-4532-ab76-5b9c1b1a39c2', <Worker 'tcp://10.240.0.5:36019', memory: 0, processing: 63>) ```

Is there anything obviously wrong I'm trying here please?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset usage and limitations. 372848074
323880231 https://github.com/pydata/xarray/issues/1081#issuecomment-323880231 https://api.github.com/repos/pydata/xarray/issues/1081 MDEyOklzc3VlQ29tbWVudDMyMzg4MDIzMQ== rafa-guedes 7799184 2017-08-21T23:44:30Z 2017-08-21T23:56:54Z CONTRIBUTOR

I have also hit this issue, this method could be useful. I'm putting below my workaround in case it is any helpful: python def reorder_dims(darray, dim1, dim2): """ Interchange two dimensions of a DataArray in a similar way as numpy's swap_axes """ dims = list(darray.dims) assert set([dim1,dim2]).issubset(dims), 'dim1 and dim2 must be existing dimensions in darray' ind1, ind2 = dims.index(dim1), dims.index(dim2) dims[ind2], dims[ind1] = dims[ind1], dims[ind2] return darray.transpose(*dims)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Transpose some but not all dimensions 187393785
295993132 https://github.com/pydata/xarray/issues/1379#issuecomment-295993132 https://api.github.com/repos/pydata/xarray/issues/1379 MDEyOklzc3VlQ29tbWVudDI5NTk5MzEzMg== rafa-guedes 7799184 2017-04-21T00:54:28Z 2017-04-21T10:05:27Z CONTRIBUTOR

I realised that some of the Datasets I was trying to concatenate had different coordinate values (for coordinates that I was assuming to be the same) so I guess xr.concat was trying to align these coordinates before concatenating and the resultant Dataset ended up being much larger than it should have been. When I ensure I only concatenate Datasets with consistent coordinates, I can do it.

However still resource consumption is quite high compared to when I so the same thing with numpy arrays. The memory increased by 42% using xr.concat (against 6% using np.concatenate) and the whole processing took about 4 times longer.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.concat consuming too much resources 223231729
295970641 https://github.com/pydata/xarray/issues/1379#issuecomment-295970641 https://api.github.com/repos/pydata/xarray/issues/1379 MDEyOklzc3VlQ29tbWVudDI5NTk3MDY0MQ== rafa-guedes 7799184 2017-04-20T23:41:38Z 2017-04-20T23:41:38Z CONTRIBUTOR

Also, reading all Datasets into a list and then trying to concatenate this list of Datasets at once also blows memory up.

{
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.concat consuming too much resources 223231729
292853553 https://github.com/pydata/xarray/issues/1366#issuecomment-292853553 https://api.github.com/repos/pydata/xarray/issues/1366 MDEyOklzc3VlQ29tbWVudDI5Mjg1MzU1Mw== rafa-guedes 7799184 2017-04-10T05:32:29Z 2017-04-10T05:32:29Z CONTRIBUTOR

That makes sense thanks for explaining @shoyer

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Setting attributes to multi-index coordinate 220533356
289321422 https://github.com/pydata/xarray/issues/1324#issuecomment-289321422 https://api.github.com/repos/pydata/xarray/issues/1324 MDEyOklzc3VlQ29tbWVudDI4OTMyMTQyMg== rafa-guedes 7799184 2017-03-26T22:25:25Z 2017-03-26T22:25:25Z CONTRIBUTOR

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Choose time units in output netcdf 216626776
202631361 https://github.com/pydata/xarray/pull/806#issuecomment-202631361 https://api.github.com/repos/pydata/xarray/issues/806 MDEyOklzc3VlQ29tbWVudDIwMjYzMTM2MQ== rafa-guedes 7799184 2016-03-28T23:52:52Z 2016-03-28T23:52:52Z CONTRIBUTOR

:+1: nice one

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Decorators for registering custom accessors in xarray 143877458
177056825 https://github.com/pydata/xarray/issues/733#issuecomment-177056825 https://api.github.com/repos/pydata/xarray/issues/733 MDEyOklzc3VlQ29tbWVudDE3NzA1NjgyNQ== rafa-guedes 7799184 2016-01-30T03:25:03Z 2016-01-30T03:25:03Z CONTRIBUTOR

I personally find it useful - maybe not too intuitive though that the behaviour changes depending on whether there are attrs defined for that coordinate variable or not. I agree some documentation on this would be definitely helpful!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  coordinate variable not written in netcdf file in some cases 129630652
176542303 https://github.com/pydata/xarray/issues/728#issuecomment-176542303 https://api.github.com/repos/pydata/xarray/issues/728 MDEyOklzc3VlQ29tbWVudDE3NjU0MjMwMw== rafa-guedes 7799184 2016-01-29T02:48:17Z 2016-01-29T02:48:17Z CONTRIBUTOR

Thanks @shoyer that works (:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot inherit DataArray anymore in 0.7 release 128980804
176485011 https://github.com/pydata/xarray/issues/728#issuecomment-176485011 https://api.github.com/repos/pydata/xarray/issues/728 MDEyOklzc3VlQ29tbWVudDE3NjQ4NTAxMQ== rafa-guedes 7799184 2016-01-28T23:44:58Z 2016-01-28T23:44:58Z CONTRIBUTOR

Thanks @shoyer , what do you mean by preserve the signature of DataArray.__init__ please?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Cannot inherit DataArray anymore in 0.7 release 128980804
175528287 https://github.com/pydata/xarray/pull/726#issuecomment-175528287 https://api.github.com/repos/pydata/xarray/issues/726 MDEyOklzc3VlQ29tbWVudDE3NTUyODI4Nw== rafa-guedes 7799184 2016-01-27T10:16:40Z 2016-01-27T10:16:40Z CONTRIBUTOR

Good point, done it

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Make import error of tokenize more explicit 128749355
170173475 https://github.com/pydata/xarray/issues/706#issuecomment-170173475 https://api.github.com/repos/pydata/xarray/issues/706 MDEyOklzc3VlQ29tbWVudDE3MDE3MzQ3NQ== rafa-guedes 7799184 2016-01-09T00:59:14Z 2016-01-09T00:59:14Z CONTRIBUTOR

Cool, thanks @shoyer. Yes @rabernat I totally agree with you and I would be very keen to collaborate on a library like that, I think that would be useful for many people.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Subclassing Dataset and DataArray 124915222
169860884 https://github.com/pydata/xarray/issues/682#issuecomment-169860884 https://api.github.com/repos/pydata/xarray/issues/682 MDEyOklzc3VlQ29tbWVudDE2OTg2MDg4NA== rafa-guedes 7799184 2016-01-08T01:27:52Z 2016-01-08T01:27:52Z CONTRIBUTOR

See #709

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf: not able to set dtype encoding with netCDF4 backend 123384529
165520642 https://github.com/pydata/xarray/issues/681#issuecomment-165520642 https://api.github.com/repos/pydata/xarray/issues/681 MDEyOklzc3VlQ29tbWVudDE2NTUyMDY0Mg== rafa-guedes 7799184 2015-12-17T17:24:11Z 2015-12-17T17:24:11Z CONTRIBUTOR

I had that happening with python2 as well - just for netcdf4 files though, because of the new string type I guess.. when writing as netcdf4-classic that string output was not shown.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  to_netcdf on Python 3: "string" qualifier on attributes  122776511
157576363 https://github.com/pydata/xarray/issues/660#issuecomment-157576363 https://api.github.com/repos/pydata/xarray/issues/660 MDEyOklzc3VlQ29tbWVudDE1NzU3NjM2Mw== rafa-guedes 7799184 2015-11-18T02:24:52Z 2015-11-18T02:24:52Z CONTRIBUTOR

Yes it is @shoyer !

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  time slice cannot be list  117262604
157572531 https://github.com/pydata/xarray/issues/662#issuecomment-157572531 https://api.github.com/repos/pydata/xarray/issues/662 MDEyOklzc3VlQ29tbWVudDE1NzU3MjUzMQ== rafa-guedes 7799184 2015-11-18T02:00:07Z 2015-11-18T02:00:07Z CONTRIBUTOR

Awesome, works here too with netCDF4==1.2.1

Thanks!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem with checking in Variable._parse_dimensions() (xray.core.variable) 117478779
157570185 https://github.com/pydata/xarray/issues/662#issuecomment-157570185 https://api.github.com/repos/pydata/xarray/issues/662 MDEyOklzc3VlQ29tbWVudDE1NzU3MDE4NQ== rafa-guedes 7799184 2015-11-18T01:43:01Z 2015-11-18T01:43:01Z CONTRIBUTOR

Hum... Ok I will try that in another machine too.. The versions are:

pandas==0.17.0 netCDF4==1.1.1 scipy==0.15.1 numpy==1.10.1 xray==0.6.1-15-g5109f4f

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem with checking in Variable._parse_dimensions() (xray.core.variable) 117478779
157567446 https://github.com/pydata/xarray/issues/662#issuecomment-157567446 https://api.github.com/repos/pydata/xarray/issues/662 MDEyOklzc3VlQ29tbWVudDE1NzU2NzQ0Ng== rafa-guedes 7799184 2015-11-18T01:26:01Z 2015-11-18T01:26:01Z CONTRIBUTOR

@shoyer I'm sending you by email (was not able to attach here) a stripped version of one of the files I was using. The code below should reproduce the issue:

import xray dset = xray.open_dataset('hycom_example.nc', decode_times=False) ncvar = 'water_u' dset_sliced = xray.Dataset() slice_dict = {u'lat': [-30], u'lon': [0]} dset_sliced[ncvar] = dset[ncvar].sel(method='nearest', **slice_dict) dset_sliced.to_netcdf()

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem with checking in Variable._parse_dimensions() (xray.core.variable) 117478779
157564974 https://github.com/pydata/xarray/issues/662#issuecomment-157564974 https://api.github.com/repos/pydata/xarray/issues/662 MDEyOklzc3VlQ29tbWVudDE1NzU2NDk3NA== rafa-guedes 7799184 2015-11-18T01:09:00Z 2015-11-18T01:09:00Z CONTRIBUTOR

@maximilianr I have managed to reproduce this with a different file with different number of dimensions (time, latitude, longitude). So I believe the below example should give same problem if you run on some other file and change the variable / dimension names accordingly:

``` ncvar = 'hs' dset_sliced1 = xray.Dataset() dset_sliced2 = xray.Dataset() dset = xray.open_dataset(filename, decode_times=False) slice_dict1 = {u'latitude': [-30], u'longitude': [0], u'time': [2.83996800e+08, 2.84007600e+08]} dset_sliced1[ncvar] = dset[ncvar].sel(method='nearest', slice_dict1) slice_dict2 = {u'latitude': [-30], u'longitude': [0], u'time': [2.84018400e+08, 2.84029200e+08]} dset_sliced2[ncvar] = dset[ncvar].sel(method='nearest', slice_dict2)

dset_sliced1.to_netcdf('test.nc') # This fails xray.concat([dset_sliced1, dset_sliced2], dim='time') # This also fails, same error

Traceback: ----> 1 xray.concat([dset_sliced1, dset_sliced2], dim='time') # This also fails

/source/xray/xray/core/combine.pyc in concat(objs, dim, data_vars, coords, compat, positions, indexers, mode, concat_over) 113 raise TypeError('can only concatenate xray Dataset and DataArray ' 114 'objects') --> 115 return f(objs, dim, data_vars, coords, compat, positions) 116 117

/source/xray/xray/core/combine.pyc in _dataset_concat(datasets, dim, data_vars, coords, compat, positions) 265 for k in concat_over: 266 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) --> 267 combined = Variable.concat(vars, dim, positions) 268 insert_result_variable(k, combined) 269

/source/xray/xray/core/variable.pyc in concat(cls, variables, dim, positions, shortcut) 711 utils.remove_incompatible_items(attrs, var.attrs) 712 --> 713 return cls(dims, data, attrs) 714 715 def _data_equals(self, other):

/source/xray/xray/core/variable.pyc in init(self, dims, data, attrs, encoding, fastpath) 194 """ 195 self._data = _as_compatible_data(data, fastpath=fastpath) --> 196 self._dims = self._parse_dimensions(dims) 197 self._attrs = None 198 self._encoding = None

/source/xray/xray/core/variable.pyc in _parse_dimensions(self, dims) 302 raise ValueError('dimensions %s must have the same length as the ' 303 'number of data dimensions, ndim=%s' --> 304 % (dims, self.ndim)) 305 return dims 306

ValueError: dimensions (u'time', u'latitude', u'longitude') must have the same length as the number of data dimensions, ndim=2 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem with checking in Variable._parse_dimensions() (xray.core.variable) 117478779
157559691 https://github.com/pydata/xarray/issues/662#issuecomment-157559691 https://api.github.com/repos/pydata/xarray/issues/662 MDEyOklzc3VlQ29tbWVudDE1NzU1OTY5MQ== rafa-guedes 7799184 2015-11-18T00:43:32Z 2015-11-18T00:43:32Z CONTRIBUTOR

I was concatenating them as:

dset_concat = xray.concat([ds1, ds2], dim='time')

Trying to dump any of them as netcdf:

ds1.to_netcdf('test.nc')

would also yield the same problem

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Problem with checking in Variable._parse_dimensions() (xray.core.variable) 117478779

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 21.08ms · About: xarray-datasette