home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where user = 10554254 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, reactions, created_at (date), updated_at (date)

issue 6

  • open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 3
  • convert DataArray to DataSet before combine 2
  • Merge fails when sparse Dataset has overlapping dimension values 2
  • combine_by_coords fails with DataArrays 1
  • xr.combine_nested() fails when passed nested DataSets 1
  • Need documentation on sparse / cupy integration 1

user 1

  • friedrichknuth · 10 ✖

author_association 1

  • NONE 10
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions performed_via_github_app issue
549627590 https://github.com/pydata/xarray/issues/3484#issuecomment-549627590 https://api.github.com/repos/pydata/xarray/issues/3484 MDEyOklzc3VlQ29tbWVudDU0OTYyNzU5MA== friedrichknuth 10554254 2019-11-05T01:50:29Z 2020-02-12T02:51:51Z NONE

After reading through the issue tracker and PRs, it looks like sparse arrays can safely be wrapped with xarray, thanks to the work done in PR#3117, but built-in functions are still under development (e.g. PR#3542). As a user, here is what I am seeing when test driving sparse:

Sparse gives me a smaller in-memory array

```python In [1]: import xarray as xr, sparse, sys, numpy as np, dask.array as da

In [2]: x = np.random.random((100, 100, 100))

In [3]: x[x < 0.9] = np.nan

In [4]: s = sparse.COO.from_numpy(x, fill_value=np.nan)

In [5]: sys.getsizeof(s) Out[5]: 3189592

In [6]: sys.getsizeof(x) Out[6]: 8000128 ``` Which I can wrap with dask and xarray

```python In [7]: x = da.from_array(x)

In [8]: s = da.from_array(s)

In [9]: ds_dense = xr.DataArray(x).to_dataset(name='data_variable')

In [10]: ds_sparse = xr.DataArray(s).to_dataset(name='data_variable')

In [11]: ds_dense Out[11]: <xarray.Dataset> Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100) Dimensions without coordinates: dim_0, dim_1, dim_2 Data variables: data_variable (dim_0, dim_1, dim_2) float64 dask.array<chunksize=(100, 100, 100), meta=np.ndarray>

In [12]: ds_sparse Out[12]: <xarray.Dataset> Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100) Dimensions without coordinates: dim_0, dim_1, dim_2 Data variables: data_variable (dim_0, dim_1, dim_2) float64 dask.array<chunksize=(100, 100, 100), meta=sparse.COO> ``` However, computation on a sparse array takes longer than running compute on a dense array (which I think is expected...?)

```python In [13]: %%time ...: ds_sparse.mean().compute() CPU times: user 487 ms, sys: 22.9 ms, total: 510 ms Wall time: 518 ms Out[13]: <xarray.Dataset> Dimensions: () Data variables: data_variable float64 0.9501

In [14]: %%time ...: ds_dense.mean().compute() CPU times: user 10.9 ms, sys: 3.91 ms, total: 14.8 ms Wall time: 13.8 ms Out[14]: <xarray.Dataset> Dimensions: () Data variables: data_variable float64 0.9501 ```

And writing to netcdf, to take advantage of the smaller data size, doesn't work out of the box (yet)

python In [15]: ds_sparse.to_netcdf('ds_sparse.nc') Out[15]: ... RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method.

Additional discussion happening at #3213

@dcherian @shoyer Am I missing any built-in methods that are working and ready for public release? Happy to send in a PR, if any of what is provided here should go into a basic example for the docs.

At this stage, I am not using sparse arrays for my own research just yet, but when I get to that anticipated phase I can dig in more on this and hopefully send in some useful PRs for improved documentation and fixes/features.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Need documentation on sparse / cupy integration 517338735
584975960 https://github.com/pydata/xarray/issues/3315#issuecomment-584975960 https://api.github.com/repos/pydata/xarray/issues/3315 MDEyOklzc3VlQ29tbWVudDU4NDk3NTk2MA== friedrichknuth 10554254 2020-02-12T01:46:00Z 2020-02-12T01:46:00Z NONE

Few observations after looking at the default flags for concat:

python xr.concat( objs, dim, data_vars='all', coords='different', compat='equals', positions=None, fill_value=<NA>, join='outer', )

The description of compat='equals' indicates combining DataArrays with different names should fail: 'equals': all values and dimensions must be the same. (though I am not entirely sure what is meant by values... I assume this perhaps generically means keys?)

Another option is compat='identical' which is described as: 'identical': all values, dimensions and attributes must be the same. Using this flag will cause the operation to fail, as one would expect from the description...

```python objs = [xr.DataArray([0], dims='x', name='a'), xr.DataArray([1], dims='x', name='b')]

xr.concat(objs, dim='x', compat='identical') ```

python ValueError: array names not identical

... and is the case for concat on Datasets, as previously shown by @TomNicholas

``` objs = [xr.Dataset({'a': ('x', [0])}), xr.Dataset({'b': ('x', [0])})]

xr.concat(objs, dim='x') ```

python ValueError: 'a' is not present in all datasets.

However, 'identical': all values, dimensions and **attributes** must be the same. doesn't quite seem to be the case for DataArrays, as

```python objs = [xr.DataArray([0], dims='x', name='a', attrs={'foo':1}), xr.DataArray([1], dims='x', name='a', attrs={'bar':2})]

xr.concat(objs, dim='x', compat='identical') ``` succeeds with

python <xarray.DataArray 'a' (x: 2)> array([0, 1]) Dimensions without coordinates: x Attributes: foo: 1

but again fails on Datasets, as one would expect from the description.

```python ds1 = xr.Dataset({'a': ('x', [0])}) ds1.attrs['foo'] = 'example attribute'

ds2 = xr.Dataset({'a': ('x', [1])}) ds2.attrs['bar'] = 'example attribute'

objs = [ds1,ds2] xr.concat(objs, dim='x',compat='identical') ```

python ValueError: Dataset global attributes not equal.

Also had a look at compat='override', which will override an attrs inconsistency but not a naming one when applied to Datasets. Works as expected on DataArrays. It is described as 'override': skip comparing and pick variable from first dataset.

Potential resolutions:

  1. 'identical' should raise an error when attributes are not the same for DataArrays

  2. 'equals' should raise an error when DataArray names are not identical (unless one is None, which works with Datasets and seems fine to be replaced)

  3. 'override' should override naming inconsistencies when combining DataSets.

Final thought: perhaps promoting to Dataset when all requirements are met for a DataArray to be considered as such, might simplify keeping operations and checks consistent?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  xr.combine_nested() fails when passed nested DataSets 494906646
551359502 https://github.com/pydata/xarray/issues/3445#issuecomment-551359502 https://api.github.com/repos/pydata/xarray/issues/3445 MDEyOklzc3VlQ29tbWVudDU1MTM1OTUwMg== friedrichknuth 10554254 2019-11-08T02:41:13Z 2019-11-08T02:41:13Z NONE

@El-minadero from the sparse API page I'm seeing two methods for combining data:

```python import sparse import numpy as np

A = sparse.COO.from_numpy(np.array([[1, 2], [3, 4]])) B = sparse.COO.from_numpy(np.array([[5, 9], [6, 8]])) sparse.stack([A,B]).todense()

Out[1]: array([[[1, 2], [3, 4]], [[5, 9], [6, 8]]])

sparse.concatenate([A,B]).todense()

Out[2]: array([[1, 2], [3, 4], [5, 9], [6, 8]]) `` Since this is an issue withsparse` and merging data doesn't seem to be supported at this time, you might consider closing this issue out here and raising it over at sparse.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge fails when sparse Dataset has overlapping dimension values 512205079
550516745 https://github.com/pydata/xarray/issues/3445#issuecomment-550516745 https://api.github.com/repos/pydata/xarray/issues/3445 MDEyOklzc3VlQ29tbWVudDU1MDUxNjc0NQ== friedrichknuth 10554254 2019-11-06T21:51:31Z 2019-11-06T21:51:31Z NONE

Note that dataset1 = xr.concat([data_array1,data_array2],dim='source') or dim='receiver' seem to work, however, concat also fails if time is specified as the dimension.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  Merge fails when sparse Dataset has overlapping dimension values 512205079
532754800 https://github.com/pydata/xarray/pull/3312#issuecomment-532754800 https://api.github.com/repos/pydata/xarray/issues/3312 MDEyOklzc3VlQ29tbWVudDUzMjc1NDgwMA== friedrichknuth 10554254 2019-09-18T16:08:09Z 2019-09-18T16:08:09Z NONE

Opened https://github.com/pydata/xarray/issues/3315 regarding combine_nested() failing when being passed nested DataSets.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  convert DataArray to DataSet before combine 494210818
532419859 https://github.com/pydata/xarray/pull/3312#issuecomment-532419859 https://api.github.com/repos/pydata/xarray/issues/3312 MDEyOklzc3VlQ29tbWVudDUzMjQxOTg1OQ== friedrichknuth 10554254 2019-09-17T22:03:23Z 2019-09-17T23:51:13Z NONE

pytest -q xarray/tests/test_combine.py is telling me that

``` def test_concat_name_symmetry(self): """Inspired by the discussion on GH issue #2777"""

da1 = DataArray(name="a", data=[[0]], dims=["x", "y"])
da2 = DataArray(name="b", data=[[1]], dims=["x", "y"])
da3 = DataArray(name="a", data=[[2]], dims=["x", "y"])
da4 = DataArray(name="b", data=[[3]], dims=["x", "y"])

x_first = combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"])

```

fails with: ```


KeyError Traceback (most recent call last) <ipython-input-13-bc4a941bd0c3> in <module> 3 da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"]) 4 da4 = xr.DataArray(name="b", data=[[3]], dims=["x", "y"]) ----> 5 xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"])

~/repos/contribute/xarray/xarray/core/combine.py in combine_nested(objects, concat_dim, compat, data_vars, coords, fill_value, join) 468 ids=False, 469 fill_value=fill_value, --> 470 join=join, 471 ) 472

~/repos/contribute/xarray/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 305 coords=coords, 306 fill_value=fill_value, --> 307 join=join, 308 ) 309 return combined

~/repos/contribute/xarray/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 196 compat=compat, 197 fill_value=fill_value, --> 198 join=join, 199 ) 200 (combined_ds,) = combined_ids.values()

~/repos/contribute/xarray/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 218 datasets = combined_ids.values() 219 new_combined_ids[new_id] = _combine_1d( --> 220 datasets, dim, compat, data_vars, coords, fill_value, join 221 ) 222 return new_combined_ids

~/repos/contribute/xarray/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 246 compat=compat, 247 fill_value=fill_value, --> 248 join=join, 249 ) 250 except ValueError as err:

~/repos/contribute/xarray/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 131 "objects, got %s" % type(first_obj) 132 ) --> 133 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 134 135

~/repos/contribute/xarray/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable)

~/repos/contribute/xarray/xarray/core/concat.py in <listcomp>(.0) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable)

~/repos/contribute/xarray/xarray/core/utils.py in getitem(self, key) 383 384 def getitem(self, key: K) -> V: --> 385 return self.mapping[key] 386 387 def iter(self) -> Iterator[K]:

KeyError: 'a' ```

It looks like the existing combine_nested() routine actually wants a DataArray and fails if passed a DataSet.

The following should work with current master.

da1 = xr.DataArray(name="a", data=[[0]], dims=["x", "y"]) da2 = xr.DataArray(name="b", data=[[1]], dims=["x", "y"]) da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"]) da4 = xr.DataArray(name="b", data=[[3]], dims=["x", "y"]) xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"])

While converting to DataSet will cause the same error expressed by the test.

ds1 = da1.to_dataset() ds2 = da2.to_dataset() ds3 = da3.to_dataset() ds4 = da4.to_dataset() xr.combine_nested([[ds1, ds2], [ds3, ds4]], concat_dim=["x", "y"])

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  convert DataArray to DataSet before combine 494210818
531511177 https://github.com/pydata/xarray/issues/3248#issuecomment-531511177 https://api.github.com/repos/pydata/xarray/issues/3248 MDEyOklzc3VlQ29tbWVudDUzMTUxMTE3Nw== friedrichknuth 10554254 2019-09-14T20:31:22Z 2019-09-14T20:31:22Z NONE

Some additional information on the topic:

Combining named 1D data arrays works. ``` da1 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [1, 2, 3])]) da2 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [5, 6, 7])]) xr.combine_by_coords([da1, da2])

<xarray.Dataset> Dimensions: (x: 6) Coordinates: * x (x) int64 1 2 3 5 6 7 Data variables: foo (x) float64 1.443 0.4889 0.9233 0.1946 -1.639 -1.455 ```

However, when combining 2D gridded data...

``` da1 = xr.DataArray(name='foo', data=np.random.rand(3,3), coords=[('x', [1, 2, 3]), ('y', [1, 2, 3])])

da2 = xr.DataArray(name='foo', data=np.random.rand(3,3), coords=[('x', [5, 6, 7]), ('y', [5, 6, 7])])

xr.combine_by_coords([da1, da2]) ...the method fails, despite passing a data variable name.


ValueError Traceback (most recent call last) <ipython-input-145-77ae89136c1f> in <module> 9 ('y', [5, 6, 7])]) 10 ---> 11 xr.combine_by_coords([da1, da2])

~/xarray/xarray/core/combine.py in combine_by_coords(datasets, compat, data_vars, coords, fill_value, join) 580 581 # Group by data vars --> 582 sorted_datasets = sorted(datasets, key=vars_as_keys) 583 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) 584

~/xarray/xarray/core/combine.py in vars_as_keys(ds) 465 466 def vars_as_keys(ds): --> 467 return tuple(sorted(ds)) 468 469

~/xarray/xarray/core/common.py in bool(self) 119 120 def bool(self: Any) -> bool: --> 121 return bool(self.values) 122 123 def float(self: Any) -> float:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` Again, converting to a dataset bypasses the issue.

``` ds1 = da1.to_dataset() ds2 = da2.to_dataset() xr.combine_by_coords([ds1, ds2])

<xarray.Dataset> Dimensions: (x: 6, y: 6) Coordinates: * x (x) int64 1 2 3 5 6 7 * y (y) int64 1 2 3 5 6 7 Data variables: foo (x, y) float64 0.5078 0.8981 0.8707 nan ... 0.4172 0.7259 0.8431 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  combine_by_coords fails with DataArrays 484270833
344949160 https://github.com/pydata/xarray/issues/1301#issuecomment-344949160 https://api.github.com/repos/pydata/xarray/issues/1301 MDEyOklzc3VlQ29tbWVudDM0NDk0OTE2MA== friedrichknuth 10554254 2017-11-16T15:01:59Z 2017-11-16T15:02:48Z NONE

Looks like it has been resolved! Tested with the latest pre-release v0.10.0rc2 on the dataset linked by najascutellatus above. https://marine.rutgers.edu/~michaesm/netcdf/data/

da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')

xarray==0.10.0rc2-1-g8267fdb dask==0.15.4 ``` 194381 function calls (188429 primitive calls) in 0.869 seconds

Ordered by: internal time List reduced from 469 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 50 0.393 0.008 0.393 0.008 {numpy.core.multiarray.arange} 50 0.164 0.003 0.557 0.011 indexing.py:266(index_indexer_1d) 5 0.083 0.017 0.085 0.017 netCDF4.py:185(open_netcdf4_group) 190 0.024 0.000 0.066 0.000 netCDF4.py:256(open_store_variable) 190 0.022 0.000 0.022 0.000 netCDF4_.py:29(init) 50 0.018 0.000 0.021 0.000 {operator.getitem} 5145/3605 0.012 0.000 0.019 0.000 indexing.py:493(shape) 2317/1291 0.009 0.000 0.094 0.000 _abcoll.py:548(update) 26137 0.006 0.000 0.013 0.000 {isinstance} 720 0.005 0.000 0.006 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}

xarray==0.9.1 dask==0.13.0

     241253 function calls (229881 primitive calls) in 98.123 seconds

Ordered by: internal time List reduced from 659 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 30 87.527 2.918 87.527 2.918 {pandas._libs.tslib.array_to_timedelta64} 65 7.055 0.109 7.059 0.109 {operator.getitem} 80 0.799 0.010 0.799 0.010 {numpy.core.multiarray.arange} 7895/4420 0.502 0.000 0.524 0.000 utils.py:412(shape) 68 0.442 0.007 0.442 0.007 {pandas._libs.algos.ensure_object} 80 0.350 0.004 1.150 0.014 indexing.py:318(_index_indexer_1d) 60/30 0.296 0.005 88.407 2.947 timedeltas.py:158(_convert_listlike) 30 0.284 0.009 0.298 0.010 algorithms.py:719(checked_add_with_arr) 123 0.140 0.001 0.140 0.001 {method 'astype' of 'numpy.ndarray' objects} 1049/719 0.096 0.000 96.513 0.134 {numpy.core.multiarray.array} ```

{
    "total_count": 3,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 2,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
293619896 https://github.com/pydata/xarray/issues/1301#issuecomment-293619896 https://api.github.com/repos/pydata/xarray/issues/1301 MDEyOklzc3VlQ29tbWVudDI5MzYxOTg5Ng== friedrichknuth 10554254 2017-04-12T15:42:18Z 2017-04-12T15:42:18Z NONE

decode_times=False significantly reduces read time, but the proportional performance discrepancy between xarray 0.8.2 and 0.9.1 remains the same.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278
286220522 https://github.com/pydata/xarray/issues/1301#issuecomment-286220522 https://api.github.com/repos/pydata/xarray/issues/1301 MDEyOklzc3VlQ29tbWVudDI4NjIyMDUyMg== friedrichknuth 10554254 2017-03-13T19:41:25Z 2017-03-13T19:41:25Z NONE

Looks like the issue might be that xarray 0.9.1 is decoding all timestamps on load.

xarray==0.9.1, dask==0.13.0

``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')

     167305 function calls (160352 primitive calls) in 59.688 seconds

Ordered by: internal time List reduced from 625 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 18 57.057 3.170 57.057 3.170 {pandas.tslib.array_to_timedelta64} 39 0.860 0.022 0.863 0.022 {operator.getitem} 48 0.402 0.008 0.402 0.008 {numpy.core.multiarray.arange} 4341/2463 0.257 0.000 0.273 0.000 utils.py:412(shape) 88 0.245 0.003 0.245 0.003 {pandas.algos.ensure_object} 48 0.158 0.003 0.561 0.012 indexing.py:318(_index_indexer_1d) 36/18 0.135 0.004 57.509 3.195 timedeltas.py:150(_convert_listlike) 18 0.126 0.007 0.130 0.007 nanops.py:815(_checked_add_with_arr) 51 0.070 0.001 0.070 0.001 {method 'astype' of 'numpy.ndarray' objects} 676/475 0.047 0.000 58.853 0.124 {numpy.core.multiarray.array} ``pandas.tslib.array_to_timedelta64` appears to be the most expensive item on the list, and isn't being run when using xarray 0.8.2.

xarray==0.8.2, dask==0.13.0

``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')

     140668 function calls (136769 primitive calls) in 0.766 seconds

Ordered by: internal time List reduced from 621 to 10 due to restriction <10>

ncalls tottime percall cumtime percall filename:lineno(function) 2571/1800 0.178 0.000 0.184 0.000 utils.py:387(shape) 18 0.174 0.010 0.174 0.010 {numpy.core.multiarray.arange} 16 0.079 0.005 0.079 0.005 {numpy.core.multiarray.concatenate} 483/420 0.077 0.000 0.125 0.000 {numpy.core.multiarray.array} 15 0.054 0.004 0.197 0.013 indexing.py:259(index_indexer_1d) 3 0.041 0.014 0.043 0.014 netCDF4.py:181(init) 105 0.013 0.000 0.057 0.001 netCDF4_.py:196(open_store_variable) 15 0.012 0.001 0.013 0.001 {operator.getitem} 2715/1665 0.007 0.000 0.178 0.000 indexing.py:343(shape) 5971 0.006 0.000 0.006 0.000 collections.py:71(setitem) ``` The version of dask is held constant in each test.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [performed_via_github_app] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
);
CREATE INDEX [idx_issue_comments_issue]
    ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
    ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1199.849ms · About: xarray-datasette