issue_comments
10 rows where user = 10554254 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: issue_url, reactions, created_at (date), updated_at (date)
user 1
- friedrichknuth · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | performed_via_github_app | issue |
---|---|---|---|---|---|---|---|---|---|---|---|
549627590 | https://github.com/pydata/xarray/issues/3484#issuecomment-549627590 | https://api.github.com/repos/pydata/xarray/issues/3484 | MDEyOklzc3VlQ29tbWVudDU0OTYyNzU5MA== | friedrichknuth 10554254 | 2019-11-05T01:50:29Z | 2020-02-12T02:51:51Z | NONE | After reading through the issue tracker and PRs, it looks like sparse arrays can safely be wrapped with xarray, thanks to the work done in PR#3117, but built-in functions are still under development (e.g. PR#3542). As a user, here is what I am seeing when test driving sparse: Sparse gives me a smaller in-memory array ```python In [1]: import xarray as xr, sparse, sys, numpy as np, dask.array as da In [2]: x = np.random.random((100, 100, 100)) In [3]: x[x < 0.9] = np.nan In [4]: s = sparse.COO.from_numpy(x, fill_value=np.nan) In [5]: sys.getsizeof(s) Out[5]: 3189592 In [6]: sys.getsizeof(x) Out[6]: 8000128 ``` Which I can wrap with dask and xarray ```python In [7]: x = da.from_array(x) In [8]: s = da.from_array(s) In [9]: ds_dense = xr.DataArray(x).to_dataset(name='data_variable') In [10]: ds_sparse = xr.DataArray(s).to_dataset(name='data_variable') In [11]: ds_dense Out[11]: <xarray.Dataset> Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100) Dimensions without coordinates: dim_0, dim_1, dim_2 Data variables: data_variable (dim_0, dim_1, dim_2) float64 dask.array<chunksize=(100, 100, 100), meta=np.ndarray> In [12]: ds_sparse Out[12]: <xarray.Dataset> Dimensions: (dim_0: 100, dim_1: 100, dim_2: 100) Dimensions without coordinates: dim_0, dim_1, dim_2 Data variables: data_variable (dim_0, dim_1, dim_2) float64 dask.array<chunksize=(100, 100, 100), meta=sparse.COO> ``` However, computation on a sparse array takes longer than running compute on a dense array (which I think is expected...?) ```python In [13]: %%time ...: ds_sparse.mean().compute() CPU times: user 487 ms, sys: 22.9 ms, total: 510 ms Wall time: 518 ms Out[13]: <xarray.Dataset> Dimensions: () Data variables: data_variable float64 0.9501 In [14]: %%time ...: ds_dense.mean().compute() CPU times: user 10.9 ms, sys: 3.91 ms, total: 14.8 ms Wall time: 13.8 ms Out[14]: <xarray.Dataset> Dimensions: () Data variables: data_variable float64 0.9501 ``` And writing to netcdf, to take advantage of the smaller data size, doesn't work out of the box (yet)
Additional discussion happening at #3213 @dcherian @shoyer Am I missing any built-in methods that are working and ready for public release? Happy to send in a PR, if any of what is provided here should go into a basic example for the docs. At this stage, I am not using sparse arrays for my own research just yet, but when I get to that anticipated phase I can dig in more on this and hopefully send in some useful PRs for improved documentation and fixes/features. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Need documentation on sparse / cupy integration 517338735 | |
584975960 | https://github.com/pydata/xarray/issues/3315#issuecomment-584975960 | https://api.github.com/repos/pydata/xarray/issues/3315 | MDEyOklzc3VlQ29tbWVudDU4NDk3NTk2MA== | friedrichknuth 10554254 | 2020-02-12T01:46:00Z | 2020-02-12T01:46:00Z | NONE | Few observations after looking at the default flags for
The description of Another option is ```python objs = [xr.DataArray([0], dims='x', name='a'), xr.DataArray([1], dims='x', name='b')] xr.concat(objs, dim='x', compat='identical') ```
... and is the case for ``` objs = [xr.Dataset({'a': ('x', [0])}), xr.Dataset({'b': ('x', [0])})] xr.concat(objs, dim='x') ```
However, ```python objs = [xr.DataArray([0], dims='x', name='a', attrs={'foo':1}), xr.DataArray([1], dims='x', name='a', attrs={'bar':2})] xr.concat(objs, dim='x', compat='identical') ``` succeeds with
but again fails on Datasets, as one would expect from the description. ```python ds1 = xr.Dataset({'a': ('x', [0])}) ds1.attrs['foo'] = 'example attribute' ds2 = xr.Dataset({'a': ('x', [1])}) ds2.attrs['bar'] = 'example attribute' objs = [ds1,ds2] xr.concat(objs, dim='x',compat='identical') ```
Also had a look at Potential resolutions:
Final thought: perhaps promoting to Dataset when all requirements are met for a DataArray to be considered as such, might simplify keeping operations and checks consistent? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
xr.combine_nested() fails when passed nested DataSets 494906646 | |
551359502 | https://github.com/pydata/xarray/issues/3445#issuecomment-551359502 | https://api.github.com/repos/pydata/xarray/issues/3445 | MDEyOklzc3VlQ29tbWVudDU1MTM1OTUwMg== | friedrichknuth 10554254 | 2019-11-08T02:41:13Z | 2019-11-08T02:41:13Z | NONE | @El-minadero from the sparse API page I'm seeing two methods for combining data: ```python import sparse import numpy as np A = sparse.COO.from_numpy(np.array([[1, 2], [3, 4]])) B = sparse.COO.from_numpy(np.array([[5, 9], [6, 8]])) sparse.stack([A,B]).todense() Out[1]: array([[[1, 2], [3, 4]], [[5, 9], [6, 8]]]) sparse.concatenate([A,B]).todense() Out[2]:
array([[1, 2],
[3, 4],
[5, 9],
[6, 8]])
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Merge fails when sparse Dataset has overlapping dimension values 512205079 | |
550516745 | https://github.com/pydata/xarray/issues/3445#issuecomment-550516745 | https://api.github.com/repos/pydata/xarray/issues/3445 | MDEyOklzc3VlQ29tbWVudDU1MDUxNjc0NQ== | friedrichknuth 10554254 | 2019-11-06T21:51:31Z | 2019-11-06T21:51:31Z | NONE | Note that |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Merge fails when sparse Dataset has overlapping dimension values 512205079 | |
532754800 | https://github.com/pydata/xarray/pull/3312#issuecomment-532754800 | https://api.github.com/repos/pydata/xarray/issues/3312 | MDEyOklzc3VlQ29tbWVudDUzMjc1NDgwMA== | friedrichknuth 10554254 | 2019-09-18T16:08:09Z | 2019-09-18T16:08:09Z | NONE | Opened https://github.com/pydata/xarray/issues/3315 regarding combine_nested() failing when being passed nested DataSets. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
convert DataArray to DataSet before combine 494210818 | |
532419859 | https://github.com/pydata/xarray/pull/3312#issuecomment-532419859 | https://api.github.com/repos/pydata/xarray/issues/3312 | MDEyOklzc3VlQ29tbWVudDUzMjQxOTg1OQ== | friedrichknuth 10554254 | 2019-09-17T22:03:23Z | 2019-09-17T23:51:13Z | NONE |
``` def test_concat_name_symmetry(self): """Inspired by the discussion on GH issue #2777"""
``` fails with: ``` KeyError Traceback (most recent call last) <ipython-input-13-bc4a941bd0c3> in <module> 3 da3 = xr.DataArray(name="a", data=[[2]], dims=["x", "y"]) 4 da4 = xr.DataArray(name="b", data=[[3]], dims=["x", "y"]) ----> 5 xr.combine_nested([[da1, da2], [da3, da4]], concat_dim=["x", "y"]) ~/repos/contribute/xarray/xarray/core/combine.py in combine_nested(objects, concat_dim, compat, data_vars, coords, fill_value, join) 468 ids=False, 469 fill_value=fill_value, --> 470 join=join, 471 ) 472 ~/repos/contribute/xarray/xarray/core/combine.py in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join) 305 coords=coords, 306 fill_value=fill_value, --> 307 join=join, 308 ) 309 return combined ~/repos/contribute/xarray/xarray/core/combine.py in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join) 196 compat=compat, 197 fill_value=fill_value, --> 198 join=join, 199 ) 200 (combined_ds,) = combined_ids.values() ~/repos/contribute/xarray/xarray/core/combine.py in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join) 218 datasets = combined_ids.values() 219 new_combined_ids[new_id] = _combine_1d( --> 220 datasets, dim, compat, data_vars, coords, fill_value, join 221 ) 222 return new_combined_ids ~/repos/contribute/xarray/xarray/core/combine.py in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join) 246 compat=compat, 247 fill_value=fill_value, --> 248 join=join, 249 ) 250 except ValueError as err: ~/repos/contribute/xarray/xarray/core/concat.py in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join) 131 "objects, got %s" % type(first_obj) 132 ) --> 133 return f(objs, dim, data_vars, coords, compat, positions, fill_value, join) 134 135 ~/repos/contribute/xarray/xarray/core/concat.py in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable) ~/repos/contribute/xarray/xarray/core/concat.py in <listcomp>(.0) 363 for k in datasets[0].variables: 364 if k in concat_over: --> 365 vars = ensure_common_dims([ds.variables[k] for ds in datasets]) 366 combined = concat_vars(vars, dim, positions) 367 assert isinstance(combined, Variable) ~/repos/contribute/xarray/xarray/core/utils.py in getitem(self, key) 383 384 def getitem(self, key: K) -> V: --> 385 return self.mapping[key] 386 387 def iter(self) -> Iterator[K]: KeyError: 'a' ``` It looks like the existing combine_nested() routine actually wants a DataArray and fails if passed a DataSet. The following should work with current master.
While converting to DataSet will cause the same error expressed by the test.
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
convert DataArray to DataSet before combine 494210818 | |
531511177 | https://github.com/pydata/xarray/issues/3248#issuecomment-531511177 | https://api.github.com/repos/pydata/xarray/issues/3248 | MDEyOklzc3VlQ29tbWVudDUzMTUxMTE3Nw== | friedrichknuth 10554254 | 2019-09-14T20:31:22Z | 2019-09-14T20:31:22Z | NONE | Some additional information on the topic: Combining named 1D data arrays works. ``` da1 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [1, 2, 3])]) da2 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [5, 6, 7])]) xr.combine_by_coords([da1, da2]) <xarray.Dataset> Dimensions: (x: 6) Coordinates: * x (x) int64 1 2 3 5 6 7 Data variables: foo (x) float64 1.443 0.4889 0.9233 0.1946 -1.639 -1.455 ``` However, when combining 2D gridded data... ``` da1 = xr.DataArray(name='foo', data=np.random.rand(3,3), coords=[('x', [1, 2, 3]), ('y', [1, 2, 3])]) da2 = xr.DataArray(name='foo', data=np.random.rand(3,3), coords=[('x', [5, 6, 7]), ('y', [5, 6, 7])]) xr.combine_by_coords([da1, da2])
ValueError Traceback (most recent call last) <ipython-input-145-77ae89136c1f> in <module> 9 ('y', [5, 6, 7])]) 10 ---> 11 xr.combine_by_coords([da1, da2]) ~/xarray/xarray/core/combine.py in combine_by_coords(datasets, compat, data_vars, coords, fill_value, join) 580 581 # Group by data vars --> 582 sorted_datasets = sorted(datasets, key=vars_as_keys) 583 grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys) 584 ~/xarray/xarray/core/combine.py in vars_as_keys(ds) 465 466 def vars_as_keys(ds): --> 467 return tuple(sorted(ds)) 468 469 ~/xarray/xarray/core/common.py in bool(self) 119 120 def bool(self: Any) -> bool: --> 121 return bool(self.values) 122 123 def float(self: Any) -> float: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` Again, converting to a dataset bypasses the issue. ``` ds1 = da1.to_dataset() ds2 = da2.to_dataset() xr.combine_by_coords([ds1, ds2]) <xarray.Dataset> Dimensions: (x: 6, y: 6) Coordinates: * x (x) int64 1 2 3 5 6 7 * y (y) int64 1 2 3 5 6 7 Data variables: foo (x, y) float64 0.5078 0.8981 0.8707 nan ... 0.4172 0.7259 0.8431 ``` |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
combine_by_coords fails with DataArrays 484270833 | |
344949160 | https://github.com/pydata/xarray/issues/1301#issuecomment-344949160 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDM0NDk0OTE2MA== | friedrichknuth 10554254 | 2017-11-16T15:01:59Z | 2017-11-16T15:02:48Z | NONE | Looks like it has been resolved! Tested with the latest pre-release v0.10.0rc2 on the dataset linked by najascutellatus above. https://marine.rutgers.edu/~michaesm/netcdf/data/
xarray==0.10.0rc2-1-g8267fdb dask==0.15.4 ``` 194381 function calls (188429 primitive calls) in 0.869 seconds Ordered by: internal time List reduced from 469 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 50 0.393 0.008 0.393 0.008 {numpy.core.multiarray.arange} 50 0.164 0.003 0.557 0.011 indexing.py:266(index_indexer_1d) 5 0.083 0.017 0.085 0.017 netCDF4.py:185(open_netcdf4_group) 190 0.024 0.000 0.066 0.000 netCDF4.py:256(open_store_variable) 190 0.022 0.000 0.022 0.000 netCDF4_.py:29(init) 50 0.018 0.000 0.021 0.000 {operator.getitem} 5145/3605 0.012 0.000 0.019 0.000 indexing.py:493(shape) 2317/1291 0.009 0.000 0.094 0.000 _abcoll.py:548(update) 26137 0.006 0.000 0.013 0.000 {isinstance} 720 0.005 0.000 0.006 0.000 {method 'getncattr' of 'netCDF4._netCDF4.Variable' objects}
Ordered by: internal time List reduced from 659 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 30 87.527 2.918 87.527 2.918 {pandas._libs.tslib.array_to_timedelta64} 65 7.055 0.109 7.059 0.109 {operator.getitem} 80 0.799 0.010 0.799 0.010 {numpy.core.multiarray.arange} 7895/4420 0.502 0.000 0.524 0.000 utils.py:412(shape) 68 0.442 0.007 0.442 0.007 {pandas._libs.algos.ensure_object} 80 0.350 0.004 1.150 0.014 indexing.py:318(_index_indexer_1d) 60/30 0.296 0.005 88.407 2.947 timedeltas.py:158(_convert_listlike) 30 0.284 0.009 0.298 0.010 algorithms.py:719(checked_add_with_arr) 123 0.140 0.001 0.140 0.001 {method 'astype' of 'numpy.ndarray' objects} 1049/719 0.096 0.000 96.513 0.134 {numpy.core.multiarray.array} ``` |
{ "total_count": 3, "+1": 1, "-1": 0, "laugh": 0, "hooray": 2, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
293619896 | https://github.com/pydata/xarray/issues/1301#issuecomment-293619896 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI5MzYxOTg5Ng== | friedrichknuth 10554254 | 2017-04-12T15:42:18Z | 2017-04-12T15:42:18Z | NONE | decode_times=False significantly reduces read time, but the proportional performance discrepancy between xarray 0.8.2 and 0.9.1 remains the same. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 | |
286220522 | https://github.com/pydata/xarray/issues/1301#issuecomment-286220522 | https://api.github.com/repos/pydata/xarray/issues/1301 | MDEyOklzc3VlQ29tbWVudDI4NjIyMDUyMg== | friedrichknuth 10554254 | 2017-03-13T19:41:25Z | 2017-03-13T19:41:25Z | NONE | Looks like the issue might be that xarray 0.9.1 is decoding all timestamps on load. xarray==0.9.1, dask==0.13.0 ``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')
Ordered by: internal time List reduced from 625 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function)
18 57.057 3.170 57.057 3.170 {pandas.tslib.array_to_timedelta64}
39 0.860 0.022 0.863 0.022 {operator.getitem}
48 0.402 0.008 0.402 0.008 {numpy.core.multiarray.arange}
4341/2463 0.257 0.000 0.273 0.000 utils.py:412(shape)
88 0.245 0.003 0.245 0.003 {pandas.algos.ensure_object}
48 0.158 0.003 0.561 0.012 indexing.py:318(_index_indexer_1d)
36/18 0.135 0.004 57.509 3.195 timedeltas.py:150(_convert_listlike)
18 0.126 0.007 0.130 0.007 nanops.py:815(_checked_add_with_arr)
51 0.070 0.001 0.070 0.001 {method 'astype' of 'numpy.ndarray' objects}
676/475 0.047 0.000 58.853 0.124 {numpy.core.multiarray.array}
xarray==0.8.2, dask==0.13.0 ``` da.set_options(get=da.async.get_sync) %prun -l 10 ds = xr.open_mfdataset('./*.nc')
Ordered by: internal time List reduced from 621 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 2571/1800 0.178 0.000 0.184 0.000 utils.py:387(shape) 18 0.174 0.010 0.174 0.010 {numpy.core.multiarray.arange} 16 0.079 0.005 0.079 0.005 {numpy.core.multiarray.concatenate} 483/420 0.077 0.000 0.125 0.000 {numpy.core.multiarray.array} 15 0.054 0.004 0.197 0.013 indexing.py:259(index_indexer_1d) 3 0.041 0.014 0.043 0.014 netCDF4.py:181(init) 105 0.013 0.000 0.057 0.001 netCDF4_.py:196(open_store_variable) 15 0.012 0.001 0.013 0.001 {operator.getitem} 2715/1665 0.007 0.000 0.178 0.000 indexing.py:343(shape) 5971 0.006 0.000 0.006 0.000 collections.py:71(setitem) ``` The version of dask is held constant in each test. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
open_mfdataset() significantly slower on 0.9.1 vs. 0.8.2 212561278 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [performed_via_github_app] TEXT, [issue] INTEGER REFERENCES [issues]([id]) ); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
issue 6